Translating informal probability statements to formal counterparts

(inspired by how Aluffi explains universal properties in Algebra: Chapter 0)

I think probability is one of the fields in math where people who have learned the subject often make many "informal" statements, which can be easily formalized given enough experience, but which are difficult to fully understand if one does not have that experience. Unfortunately, the formal details are often left unexplained. On this page, we give some examples of these "informal" statements, with some thoughts on how to go about translating them to precise mathematical statements.

Informal statement	Why the statement is informal	Formal statement
Let $X$ be a normally distributed random variable.	A random variable is a function $X:\Omega \to \mathbf {R}$ , but here we have not specified the sample space $\Omega$
Let $X_{1},X_{2},X_{3},\ldots$ be i.i.d.	again, we have not specified a sample space. here we have to also assume that the sample space is big enough to contain enough randomness in order to make all of the X_i independent. for example, a sample space of $\Omega =\{1,2,3,4,5,6\}$ has "enough randomness" to represent a single roll of a die, but it does not have enough randomness to represent two independent rolls of a die.
$\mathbf {E} _{z\sim {\mathcal {D}}}[f(z)]$	again, here we're giving a random variable $z$ without specifying the sample space, and then we're taking its expected value. the reason why this makes sense is that the expected value does not depend on the sample space, as long as the distribution is the same.
rolling one die, and then another die	the confusion here is that we're supposed to have a sample space, but here it seems like we're modifying the sample space. again the informal-ness comes from the fact that we have not fully specified the sample space. here we can think of starting with one sample space, and then "growing" it, or we could just start with the bigger sample space from the start
$\mathbb {E} _{\pi }$ in reinforcement learning	this usually isn't defined formally, but it means something like: if we expand out the definition of expected value, then whenever we see the expression $\Pr(A_{t}=a\mid S_{t}=s)$ , for any t, we can substitute it instead with $\pi (a\mid s)$ .

there's also the thing where random variables "feel" like they are actually real numbers. e.g. when we treat observed data points as random variables.

for example sutton and barto (p. 53) say "At each time step, the reward is a simple number, $R_{t}\in \mathbb {R}$ ." But in this book, they also write (p. xiii) "We also use a slightly different notation than was used in the first edition. In teaching, we have found that the new notation helps to address some common points of confusion. It emphasizes the difference between random variables, denoted with capital letters, and their instantiations, denoted in lower case. For example, the state, action, and reward at time step $t$ are denoted $S_{t}$ , $A_{t}$ , and $R_{t}$ , while their possible values might be denoted $s$ , $a$ , and $r$ ." So which is it?

there is a kind of strong psychological pull, i think, to say things like $R_{t}\in \mathbb {R}$ , despite formally defining $R_{t}:\Omega \to \mathbb {R}$ as a random variable. it is tempting to say that a random variable is a real number whose value is """stochastic""".