User:IssaRice/Scoring rule

From Machinelearning

how can we formalize the idea of a rule for scoring predictions?

first pass: statements and probabilities

we can start with a list s1,,sn of statements. each statement makes a yes/no prediction about the future, like "the die will show 3 when rolled". then we have a list of probabilities p1,,pn where pj is the probability someone assigns to sj being true. now, reality evaluates each statement, giving us a yes/no answer r(sj){0,1}. our probabilities are scored against this response from reality. so a scoring rule S can be some function of p1,,pn,r(s1),,r(sn). so the type can be S:[0,1]n×{0,1}nR.

if we are an ordinary statistician [1], we might pick a rule like j=1n(pjr(sj))2. (this is actually almost the brier score)

second pass: events

in probability theory, we are used to dealing with events and random variables. in the previous section, we naively stated scoring rules in terms of statements and probabilities. but we might try now to phrase things in terms of events.

instead of statements s1,,sn, we could have a list of events A1,,An. here, Aj is an event expressing the fact that sj is true. then pj=P(Aj), where P is the probability measure which encodes our knowledge of what events are likely. r(sj) is the outcome in some possible world, so r(sj)=1Aj(ω). the idea here is we have some implicit sample space Ω of all "possible worlds". then each ωΩ is a possible world. but this is exactly the idea expressed by our reality function r -- we could have had some other reality r in which our same probabilities would perform differently.

so our second pass is that we can define a scoring rule as something that takes a list of events A1,,An, a probability measure (which encodes the numbers p1,,pn, assuming we have access to the events), and a world ω. so S:(2Ω)n×[0,1]2Ω×ΩR

third pass: non-binary predictions

instead of having all our predictions be about yes/no questions, we could allow more kinds of responses. for instance, instead of six predictions "the next roll of the die will be a i", yes=1/6, no=5/6, for i=1,...,6, we could instead have a single prediction like "the next roll will be a ..." 1: 1/6, 2: 1/6, ..., 6: 1/6.

given one prediction with k options (mutually exclusive and collectively exhaustive), and a second prediction with m options (again mutually exclusive and collectively exhaustive), we could roll them into a single prediction with km options. so in a way, we could work as if there was only one thing to be predicted.