User:IssaRice/Beta distribution: Difference between revisions

From Machinelearning
(Created page with "the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf the one thing i don't reall...")
 
No edit summary
Line 1: Line 1:
Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? This is basically the distribution version of [https://en.wikipedia.org/wiki/Rule_of_succession Laplace's rule of succession] (that rule only gives the expected value).
the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf
the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf



Revision as of 23:46, 4 February 2020

Basically if we start out with a uniform prior over the bias of a coin, and see n heads and m tails, then what distribution should we have for the probability of heads? This is basically the distribution version of Laplace's rule of succession (that rule only gives the expected value).

the derivation given here is simple enough to understand: https://web.stanford.edu/class/archive/cs/cs109/cs109.1176/lectureHandouts/15%20Beta.pdf

the one thing i don't really get is why we set a=α=n+1 and b=β=m+1. i.e. if we shift the parameters by 1, then when we look at beta(a,b), the a and b no longer track the number of successes and failures. Looking at https://stats.stackexchange.com/questions/262956/why-is-there-1-in-beta-distribution-density-function there is apparently some deep reasons why this parametrization is chosen, but i don't really understand the explanations given there.

see also https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution for a good example (using batting averages in baseball).