Summary table of probability terms: Difference between revisions

From Machinelearning
Line 56: Line 56:
==See also==
==See also==


* [[Summary table of multivariable derivatives]]
* [[Comparison of machine learning textbooks]]
* [[Comparison of machine learning textbooks]]



Revision as of 05:55, 21 July 2018

This page is a summary table of probability terms.

Table

Term Symbol Type Definition
Reals R
Borel subsets of the reals B
Sample space Ω
Outcome ω Ω
Events or measurable sets F
Probability measure P or Pr or PF F[0,1]
Probability triple or probability space (Ω,F,P)
Distribution μ or D or D or PB or L(X) or PX1 B[0,1] BP(XB)
Induced probability space (R,B,μ)
Cumulative distribution function or CDF FX R[0,1]
Probability density function or PDF fX R[0,)
Random variable X ΩR
Preimage of random variable X1 2R2Ω but all we need is BF
Indicator of A 1A Ω{0,1}
Expectation E or E (ΩR)R

Dependencies

Let (Ω,F,P) be a probability space.

  • Given a random variable X, we can compute its distribution μ. How? Just let μ(B)=PF(XB)
  • Given a random variable, we can compute the probability density function. How?
  • Given a random variable, we can compute the cumulative distribution function. How?
  • Given a distribution, we can retrieve a random variable. But this random variable is not unique? This is why we can say stuff like "let XD".
  • Given a distribution μ, we can compute its density function. How? Just find the derivative of μ((,x]). (?)
  • Given a cumulative distribution function, we can compute the random variable. (Right?)
  • Given a probability density function, can we get everything else? Don't we just have to integrate to get the cdf, which gets us the random variable and the distribution?
  • Given a cumulative distribution function, how do we get the distribution? We have FX(x)=PF(Xx)=PB((,x]), which gets us some of what the distribution PB maps to, but B is bigger than this. What do we do about the other values we need to map? We can compute intervals like FX(b)FX(a)=PF(aXb)=PB([a,b]). And we can apparently do the same for unions and limiting operations.

Philosophical details about the sample space

Given a random variable X:ΩR and any reasonable predicate P about X, we can replace P(X) with its extension {ωΩ:P(X(ω))}={ωΩ:X(ω)B} for some BB. And from then on, we can write PF(XB) as PF(X1(B))=PB(B)=μ(B). In other words, we can just work with Borel sets of the reals (measuring them with the distribution) rather than the original events (measuring them with the original probability measure). Where did X go? PFX1=PB, so you can write PB using X. But once you already have PB, you don't need to know what X is.

See also

External links