User:IssaRice/Beta distribution

From Machinelearning

Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? The answer is beta(n+1, m+1) (thus, beta(1,1) is equivalent to uniform(0,1)). This is basically the distribution version of Laplace's rule of succession (that rule only gives the expected value of the bias of the coin).

the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf

the one thing i don't really get is why we set and . i.e. if we shift the parameters by 1, then when we look at beta(a,b), the a and b no longer track the number of successes and failures. Looking at https://stats.stackexchange.com/questions/262956/why-is-there-1-in-beta-distribution-density-function there is apparently some deep reasons why this parametrization is chosen, but i don't really understand the explanations given there.

see also https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution for a good example (using batting averages in baseball).

an interesting property is that if we start out with a prior beta(a,b), then see n additional successes and m additional failures, our posterior becomes beta(a+n, b+m) as we would hope.