Summary table of probability terms: Difference between revisions

From Machinelearning
 
(22 intermediate revisions by the same user not shown)
Line 4: Line 4:


{| class="sortable wikitable"
{| class="sortable wikitable"
! Term !! Symbol !! Type !! Definition
! Term !! Notation !! Type !! Definition !! Notes
|-
|-
| Reals || <math>\mathbf R</math> || ||
| Reals || <math>\mathbf R</math> || ||
Line 12: Line 12:
| A Borel set || <math>B</math> || <math>\mathcal B</math> ||
| A Borel set || <math>B</math> || <math>\mathcal B</math> ||
|-
|-
| Sample space || <math>\Omega</math> || ||
| [[Sample space]] || <math>\Omega</math> || ||
|-
|-
| Outcome || <math>\omega</math> || <math>\Omega</math> ||
| Outcome || <math>\omega</math> || <math>\Omega</math> ||
Line 30: Line 30:
| Probability density function or PDF || <math>f_X</math> || <math>\mathbf R \to [0,\infty)</math> ||
| Probability density function or PDF || <math>f_X</math> || <math>\mathbf R \to [0,\infty)</math> ||
|-
|-
| Random variable || <math>X</math> || <math>\Omega \to \mathbf R</math> ||
| [[Random variable]] || <math>X</math> || <math>\Omega \to \mathbf R</math> ||
|-
|-
| Preimage of random variable || <math>X^{-1}</math> || <math>2^{\mathbf R} \to 2^{\Omega}</math> but all we need is <math>\mathcal B \to \mathcal F</math> ||
| Preimage of random variable || <math>X^{-1}</math> || <math>2^{\mathbf R} \to 2^{\Omega}</math> but all we need is <math>\mathcal B \to \mathcal F</math> ||
|-
|-
| Indicator of <math>A</math> || <math>1_A</math> || <math>\Omega \to \{0,1\}</math> ||
| Indicator of <math>A</math> || <math>1_A</math> || <math>\Omega \to \{0,1\}</math> || <math>1_A(\omega) = \begin{cases}1 & \omega\in A \\ 0 & \omega \not\in A\end{cases}</math>
|-
|-
| Expectation || <math>\mathbf E</math> or <math>\mathrm E</math> || <math>(\Omega \to \mathbf R) \to \mathbf R</math> ||
| [[Expectation]] || <math>\mathbf E</math> or <math>\mathrm E</math> || <math>(\Omega \to \mathbf R) \to \mathbf R</math> ||
|-
|-
| || <math>X \in B</math> || <math>\mathcal F</math> || <math>\{\omega \in \Omega : X(\omega) \in B\}</math>
| || <math>X \in B</math> || <math>\mathcal F</math> || <math>\{\omega \in \Omega : X(\omega) \in B\}</math>
Line 44: Line 44:
| || <math>X\leq x</math> || <math>\mathcal F</math> || <math>\{\omega \in \Omega : X(\omega) \leq x\}</math>
| || <math>X\leq x</math> || <math>\mathcal F</math> || <math>\{\omega \in \Omega : X(\omega) \leq x\}</math>
|-
|-
| Expected value of <math>X</math> || <math>\mathbf E(X)</math> || <math>\mathbf R</math>
| Function of a random variable, where <math>f\colon \mathbf R \to \mathbf R</math> || <math>f(X)</math> || <math>\Omega \to \mathbf R</math> || <math>f\circ X</math> ||
|-
| [[Expected value]] of <math>X</math> || <math>\mathbf E(X)</math> || <math>\mathbf R</math>
|-
|-
| || <math>\mathbf E(X\mid Y=y)</math> || <math>\mathbf R</math> ||
| || <math>\mathbf E(X\mid Y=y)</math> || <math>\mathbf R</math> ||
|-
|-
| || <math>\mathbf E(X\mid Y)</math> || <math>\Omega \to \mathbf R</math> ||
| || <math>\mathbf E(X\mid Y)</math> || <math>\Omega \to \mathbf R</math> || <math>\omega \mapsto \mathbf E(X\mid Y=Y(\omega))</math>?
|-
| Utility function || <math>u</math> || <math>\mathbf R \to \mathbf R</math> || || I ''think'' this is what the type must be, based on how it's used. But we usually think of the utility function as assigning numbers to outcomes; but if that is so, it must be a random variable! What's up with that? (2022-07-14: I think in probability theory, we usually discuss only real random variables, since that allows us to do a lot more with them like take expected value. But in fields like AI, we consider more general random variables <math>\Omega \to \mathcal O</math> that take values in some space of outcomes <math>\mathcal O</math>. We can't "average over" outcomes so we can't really take expected values anymore, but this allows us to make the utility function more general so we get <math>u : \mathcal O \to \mathbf R</math>.)
|-
| Expected utility of <math>X</math> || <math>\mathbf{EU}(X)</math> || <math>\mathbf R</math> || <math>\mathbf E(u(X))</math> || <math>u\circ X</math> is indeed a random variable, so the type check passes.
|}
|}
All the utility stuff isn't really related to machine learning. It's more related to the decision theory stuff I'm learning. I'm putting it here for now for convenience but might move it later.
TODO add "probability distribution over S" and "probability distribution on A" [https://arxiv.org/pdf/1711.00363.pdf]
Li and Vitanyi (''An Introduction to Kolmogorov Complexity and Its Applications'', p. 19) calls the probability measure on <math>\mathcal F</math> a probability distribution over S (the sample space).
TODO: add probability mass function (defined only for discrete random variables)


==Dependencies==
==Dependencies==
Line 59: Line 73:
* Given a random variable, we can compute the cumulative distribution function. How?
* Given a random variable, we can compute the cumulative distribution function. How?
* Given a distribution, we can retrieve a random variable. But this random variable is not unique? This is why we can say stuff like "let <math>X\sim \mathcal D</math>".
* Given a distribution, we can retrieve a random variable. But this random variable is not unique? This is why we can say stuff like "let <math>X\sim \mathcal D</math>".
* Given a distribution <math>\mu</math>, we can compute its density function. How? Just find the derivative of <math>\mu((-\infty,x])</math>. (?)
* Given a distribution <math>\mu</math>, we can compute its density function. How? Just find the derivative of <math>\mu((-\infty,x])</math>. (?) (2022-07-14: something something Radon–Nikodym theorem...)
* Given a cumulative distribution function, we can compute the random variable. (Right?)
* Given a cumulative distribution function, we can compute the random variable. (Right?) (2022-07-14: but a CDF is like a distribution, so the random variable won't be unique.)
* Given a probability density function, can we get everything else? Don't we just have to integrate to get the cdf, which gets us the random variable and the distribution?
* Given a probability density function, can we get everything else? Don't we just have to integrate to get the cdf, which gets us the random variable and the distribution?
* Given a cumulative distribution function, how do we get the distribution? We have <math>F_X(x) = \mathbf P_{\mathcal F}(X\leq x) = \mathbf P_{\mathcal B}((-\infty,x])</math>, which gets us some of what the distribution <math>\mathbf P_{\mathcal B}</math> maps to, but <math>\mathcal B</math> is bigger than this. What do we do about the other values we need to map? We can compute intervals like <math>F_X(b) - F_X(a) = \mathbf P_{\mathcal F}(a \leq X\leq b) = \mathbf P_{\mathcal B}([a,b])</math>. And we can apparently do the same for unions and limiting operations.
* Given a cumulative distribution function, how do we get the distribution? We have <math>F_X(x) = \mathbf P_{\mathcal F}(X\leq x) = \mathbf P_{\mathcal B}((-\infty,x])</math>, which gets us some of what the distribution <math>\mathbf P_{\mathcal B}</math> maps to, but <math>\mathcal B</math> is bigger than this. What do we do about the other values we need to map? We can compute intervals like <math>F_X(b) - F_X(a) = \mathbf P_{\mathcal F}(a \leq X\leq b) = \mathbf P_{\mathcal B}([a,b])</math>. And we can apparently do the same for unions and limiting operations.
Line 75: Line 89:
==External links==
==External links==


* [https://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/ 254A, Notes 0: A review of probability theory] by [[wikipedia:Terence Tao|Terence Tao]]
* [https://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/ 254A, Notes 0: A review of probability theory] and [https://terrytao.wordpress.com/2015/09/29/275a-notes-0-foundations-of-probability-theory/ 275A, Notes 0: Foundations of probability theory] by [[wikipedia:Terence Tao|Terence Tao]]
* [http://dsp.ucsd.edu/~kreutz/PEI-05%20Support%20Files/Basic%20Random%20Variables%20Concepts.pdf Basic Random Variable Concepts] by Kenneth Kreutz-Delgado
* [http://dsp.ucsd.edu/~kreutz/PEI-05%20Support%20Files/Basic%20Random%20Variables%20Concepts.pdf Basic Random Variable Concepts] by Kenneth Kreutz-Delgado
* Various questions on Mathematics Stack Exchange:
* Various questions on Mathematics Stack Exchange:
Line 88: Line 102:
** https://math.stackexchange.com/questions/1073744/distinguishing-probability-measure-function-and-distribution
** https://math.stackexchange.com/questions/1073744/distinguishing-probability-measure-function-and-distribution
** https://math.stackexchange.com/questions/57027/concept-of-probability-distribution
** https://math.stackexchange.com/questions/57027/concept-of-probability-distribution
* Tim Gowers:
** https://gowers.wordpress.com/2010/09/01/icm2010-fourth-day/ (search for "random variable")
** https://mathoverflow.net/questions/12516/a-random-variable-is-it-a-function-or-an-equivalence-class-of-functions
[[Category:Probability]]

Latest revision as of 18:16, 14 July 2022

This page is a summary table of probability terms.

Table

Term Notation Type Definition Notes
Reals
Borel subsets of the reals
A Borel set
Sample space
Outcome
Events or measurable sets
Probability measure or or
Probability triple or probability space
Distribution or or or or or
Induced probability space
Cumulative distribution function or CDF
Probability density function or PDF
Random variable
Preimage of random variable but all we need is
Indicator of
Expectation or
Function of a random variable, where
Expected value of
?
Utility function I think this is what the type must be, based on how it's used. But we usually think of the utility function as assigning numbers to outcomes; but if that is so, it must be a random variable! What's up with that? (2022-07-14: I think in probability theory, we usually discuss only real random variables, since that allows us to do a lot more with them like take expected value. But in fields like AI, we consider more general random variables that take values in some space of outcomes . We can't "average over" outcomes so we can't really take expected values anymore, but this allows us to make the utility function more general so we get .)
Expected utility of is indeed a random variable, so the type check passes.

All the utility stuff isn't really related to machine learning. It's more related to the decision theory stuff I'm learning. I'm putting it here for now for convenience but might move it later.

TODO add "probability distribution over S" and "probability distribution on A" [1]

Li and Vitanyi (An Introduction to Kolmogorov Complexity and Its Applications, p. 19) calls the probability measure on a probability distribution over S (the sample space).

TODO: add probability mass function (defined only for discrete random variables)

Dependencies

Let be a probability space.

  • Given a random variable , we can compute its distribution . How? Just let
  • Given a random variable, we can compute the probability density function. How?
  • Given a random variable, we can compute the cumulative distribution function. How?
  • Given a distribution, we can retrieve a random variable. But this random variable is not unique? This is why we can say stuff like "let ".
  • Given a distribution , we can compute its density function. How? Just find the derivative of . (?) (2022-07-14: something something Radon–Nikodym theorem...)
  • Given a cumulative distribution function, we can compute the random variable. (Right?) (2022-07-14: but a CDF is like a distribution, so the random variable won't be unique.)
  • Given a probability density function, can we get everything else? Don't we just have to integrate to get the cdf, which gets us the random variable and the distribution?
  • Given a cumulative distribution function, how do we get the distribution? We have , which gets us some of what the distribution maps to, but is bigger than this. What do we do about the other values we need to map? We can compute intervals like . And we can apparently do the same for unions and limiting operations.

Philosophical details about the sample space

Given a random variable and any reasonable predicate about , we can replace with its extension for some . And from then on, we can write as . In other words, we can just work with Borel sets of the reals (measuring them with the distribution) rather than the original events (measuring them with the original probability measure). Where did go? , so you can write using . But once you already have , you don't need to know what is.

See also

External links