Variance

The variance of a random variable $X$ is defined as $\mathrm {Var} (X):=\mathbf {E} [(X-\mathbf {E} X)^{2}]$ , where $\mathbf {E} X$ is the expectation of $X$ .

Notation

Since the square root of the variance is the standard deviation, if we have a simple notation for the standard deviation, such as $\sigma$ , then we can denote the variance as $\sigma ^{2}$ .

Motivation

In several books I have seen the following three-step motivation for the variance. I'm not sure I'm convinced this is all that can be said to motivate the variance, but it seems to be a start.

We want to measure the spread of the data. For each data point, we can subtract the mean from it to see how "deviant" it is. So one a priori reasonable approach is to calculate these deviations and find the average deviation. If we do this, however, we get $\mathbf {E} [X-\mathbf {E} X]=\mathbf {E} X-\mathbf {E} X=0$ which is useless.
We then have the idea to take the absolute values of these deviations, to prevent them from adding up to zero. So we get $\mathbf {E} |X-\mathbf {E} X|$ . But the absolute value is not smooth enough for us to be able to do all the things we would like (e.g. differentiation). However, note that this measure of the spread is also used (what is it called?).
Finally, building off the previous idea, we decide to square things, and we get $\mathbf {E} [(X-\mathbf {E} X)^{2}]$ , which is the definition of variance.

If we can intuitively understand covariance (I'm still working on this understanding), then we can get the variance as $\operatorname {Cov} (X,X)$ .

Questions/things to explain

vector space interpretation [1] see also the beginning of [2]
is there some list of axioms that can completely specify the variance? (similar to how the Riemann integral, plane geometry, various algebraic structures, etc., can all be defined axiomatically)
because of the square in the definition of variance, we have also squared the units. To get back the units we started with, one possibility is to take the square root of the variance. This leads to the standard deviation.
why not divide by the mean (the same data measured in another units system has different variance)?