Variance: Difference between revisions

From Machinelearning
Line 10: Line 10:


# We want to measure the spread of the data. For each data point, we can subtract the mean from it to see how "deviant" it is. So one a priori reasonable approach is to calculate these deviations and find the average deviation. If we do this, however, we get <math>\mathbf E[X - \mathbf E X] = \mathbf E X - \mathbf E X = 0</math> which is useless.
# We want to measure the spread of the data. For each data point, we can subtract the mean from it to see how "deviant" it is. So one a priori reasonable approach is to calculate these deviations and find the average deviation. If we do this, however, we get <math>\mathbf E[X - \mathbf E X] = \mathbf E X - \mathbf E X = 0</math> which is useless.
# We then have the idea to take the absolute values of these deviations, to prevent them from adding up to zero. So we get <math>\mathbf E |X - \mathbf E X|</math>. But the absolute value is not smooth enough for us to be able to do all the things we would like (e.g. differentiation). However, note that this measure of the spread is also used.
# We then have the idea to take the absolute values of these deviations, to prevent them from adding up to zero. So we get <math>\mathbf E |X - \mathbf E X|</math>. But the absolute value is not smooth enough for us to be able to do all the things we would like (e.g. differentiation). However, note that this measure of the spread is also used (what is it called?).
# Finally, building off the previous idea, we decide to square things, and we get <math>\mathbf E [(X - \mathbf E X)^2]</math>, which is the definition of variance.
# Finally, building off the previous idea, we decide to square things, and we get <math>\mathbf E [(X - \mathbf E X)^2]</math>, which is the definition of variance.



Revision as of 00:33, 31 July 2019

The variance of a random variable is defined as , where is the expectation of .

Notation

Since the square root of the variance is the standard deviation, if we have a simple notation for the standard deviation, such as , then we can denote the variance as .

Motivation

In several books I have seen the following three-step motivation for the variance. I'm not sure I'm convinced this is all that can be said to motivate the variance, but it seems to be a start.

  1. We want to measure the spread of the data. For each data point, we can subtract the mean from it to see how "deviant" it is. So one a priori reasonable approach is to calculate these deviations and find the average deviation. If we do this, however, we get which is useless.
  2. We then have the idea to take the absolute values of these deviations, to prevent them from adding up to zero. So we get . But the absolute value is not smooth enough for us to be able to do all the things we would like (e.g. differentiation). However, note that this measure of the spread is also used (what is it called?).
  3. Finally, building off the previous idea, we decide to square things, and we get , which is the definition of variance.

If we can intuitively understand covariance (I'm still working on this understanding), then we can get the variance as .

Questions/things to explain

  • vector space interpretation [1] see also the beginning of [2]