Translating informal probability statements to formal counterparts: Difference between revisions

From Machinelearning
No edit summary
No edit summary
Line 8: Line 8:
| Let <math>X</math> be a normally distributed random variable. || A random variable is a function <math>X : \Omega \to \mathbf R</math>, but here we have not specified the sample space <math>\Omega</math> ||
| Let <math>X</math> be a normally distributed random variable. || A random variable is a function <math>X : \Omega \to \mathbf R</math>, but here we have not specified the sample space <math>\Omega</math> ||
|-
|-
| Let <math>X_1, X_2, X_3, \ldots</math> be i.i.d. || again, we have not specified a sample space. here we have to also assume that the sample space is big enough to contain enough randomness in order to make all of the X_i independent. ||
| Let <math>X_1, X_2, X_3, \ldots</math> be i.i.d. || again, we have not specified a sample space. here we have to also assume that the sample space is big enough to contain enough randomness in order to make all of the X_i independent. for example, a sample space of <math>\Omega = \{1,2,3,4,5,6\}</math> has "enough randomness" to represent a single roll of a die, it does not have enough randomness to represent two independent rolls of a die. ||
|-
|-
| <math>\mathbf E_{z \sim \mathcal D}[f(z)]</math> || again, here we're giving a random variable <math>z</math> without specifying the sample space, and then we're taking its expected value. the reason why this makes sense is that the expected value does not depend on the sample space, as long as the distribution is the same. ||
| <math>\mathbf E_{z \sim \mathcal D}[f(z)]</math> || again, here we're giving a random variable <math>z</math> without specifying the sample space, and then we're taking its expected value. the reason why this makes sense is that the expected value does not depend on the sample space, as long as the distribution is the same. ||

Revision as of 22:08, 3 March 2020

(inspired by how Aluffi explains universal properties in Algebra: Chapter 0)

I think probability is one of the fields in math where people who have learned the subject often make many "informal" statements, which can be easily formalized given enough experience, but which are difficult to fully understand if one does not have that experience. Unfortunately, the formal details are often left unexplained. On this page, we give some examples of these "informal" statements, with some thoughts on how to go about translating them to precise mathematical statements.

Informal statement Why the statement is informal Formal statement
Let be a normally distributed random variable. A random variable is a function , but here we have not specified the sample space
Let be i.i.d. again, we have not specified a sample space. here we have to also assume that the sample space is big enough to contain enough randomness in order to make all of the X_i independent. for example, a sample space of has "enough randomness" to represent a single roll of a die, it does not have enough randomness to represent two independent rolls of a die.
again, here we're giving a random variable without specifying the sample space, and then we're taking its expected value. the reason why this makes sense is that the expected value does not depend on the sample space, as long as the distribution is the same.
rolling one die, and then another die the confusion here is that we're supposed to have a sample space, but here it seems like we're modifying the sample space. again the informal-ness comes from the fact that we have not fully specified the sample space. here we can think of starting with one sample space, and then "growing" it, or we could just start with the bigger sample space from the start
in reinforcement learning this usually isn't defined formally, but it means something like: if we expand out the definition of expected value, then whenever we see the expression , for any t, we can substitute it instead with .

there's also the thing where random variables "feel" like they are actually real numbers. e.g. when we treat observed data points as random variables.