User:IssaRice/Beta distribution: Difference between revisions

From Machinelearning
No edit summary
No edit summary
Line 6: Line 6:


see also https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution for a good example (using batting averages in baseball).
see also https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution for a good example (using batting averages in baseball).
an interesting property is that if we start out with a prior beta(a,b), then see n additional successes and m additional failures, our posterior becomes beta(a+n, b+m) as we would hope.

Revision as of 23:49, 4 February 2020

Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? The answer is beta(n+1, m+1) (thus, beta(1,1) is equivalent to uniform(0,1)). This is basically the distribution version of Laplace's rule of succession (that rule only gives the expected value).

the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf

the one thing i don't really get is why we set a=α=n+1 and b=β=m+1. i.e. if we shift the parameters by 1, then when we look at beta(a,b), the a and b no longer track the number of successes and failures. Looking at https://stats.stackexchange.com/questions/262956/why-is-there-1-in-beta-distribution-density-function there is apparently some deep reasons why this parametrization is chosen, but i don't really understand the explanations given there.

see also https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution for a good example (using batting averages in baseball).

an interesting property is that if we start out with a prior beta(a,b), then see n additional successes and m additional failures, our posterior becomes beta(a+n, b+m) as we would hope.