User:IssaRice/Beta distribution: Difference between revisions

From Machinelearning
No edit summary
No edit summary
Line 1: Line 1:
Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? This is basically the distribution version of [https://en.wikipedia.org/wiki/Rule_of_succession Laplace's rule of succession] (that rule only gives the expected value).
Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? The answer is beta(n+1, m+1). This is basically the distribution version of [https://en.wikipedia.org/wiki/Rule_of_succession Laplace's rule of succession] (that rule only gives the expected value).


the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf
the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf

Revision as of 23:46, 4 February 2020

Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? The answer is beta(n+1, m+1). This is basically the distribution version of Laplace's rule of succession (that rule only gives the expected value).

the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf

the one thing i don't really get is why we set and . i.e. if we shift the parameters by 1, then when we look at beta(a,b), the a and b no longer track the number of successes and failures. Looking at https://stats.stackexchange.com/questions/262956/why-is-there-1-in-beta-distribution-density-function there is apparently some deep reasons why this parametrization is chosen, but i don't really understand the explanations given there.

see also https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution for a good example (using batting averages in baseball).