User:IssaRice/Beta distribution
Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? The answer is beta(n+1, m+1). This is basically the distribution version of Laplace's rule of succession (that rule only gives the expected value).
the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf
the one thing i don't really get is why we set and . i.e. if we shift the parameters by 1, then when we look at beta(a,b), the a and b no longer track the number of successes and failures. Looking at https://stats.stackexchange.com/questions/262956/why-is-there-1-in-beta-distribution-density-function there is apparently some deep reasons why this parametrization is chosen, but i don't really understand the explanations given there.
see also https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution for a good example (using batting averages in baseball).