Covariance: Difference between revisions

From Machinelearning
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 4: Line 4:


* A lot of explanations of covariance say things like "if the covariance is high, then the two variables vary together, so that when one is higher than average the other is as well". That sounds more like conditional expectation though, specifically <math>\mathrm E(Y - \mathrm E(Y) \mid X - \mathrm E(X))</math>. Can we express covariance in terms of this conditional language?
* A lot of explanations of covariance say things like "if the covariance is high, then the two variables vary together, so that when one is higher than average the other is as well". That sounds more like conditional expectation though, specifically <math>\mathrm E(Y - \mathrm E(Y) \mid X - \mathrm E(X))</math>. Can we express covariance in terms of this conditional language?
Here is one attempt, though I don't think it helps much:
<math>\begin{align}E[(X - EX)(Y - EY)] &= \sum_x \sum_y P(X=x,Y=x)(x-EX)(y-EY) \\ &= \sum_x (x-EX) \sum_y P(x,y)(y-EY) \\ &= \sum_x P(X=x)(x-EX) \sum_y \frac{P(x,y)}{P(X=x)}(y-EY) \\ &= \sum_x P(X=x)(x-EX) \sum_y P(Y=y|X=x)(y-EY) \\ &= E_X[(X-EX)E[Y-EY|X]]\end{align}</math>
* Explain difference (in units, range of values) with correlation. Can we get positive/negative/large/small covariance and negative/positive/small/large correlation?
* Explain difference (in units, range of values) with correlation. Can we get positive/negative/large/small covariance and negative/positive/small/large correlation?
** This seems like a good example of where correlation vs covariance matters: "Correlation is covariance divided by variance, so if A is highly predictive of B, there can be a strong “correlation” between them even if A is ranging from 0 to 9 and B is only ranging from 50.0001 and 50.0009. Price’s Equation runs on covariance of characteristics with reproduction—not correlation! If you can compress variance in characteristics into a tiny band, the covariance goes way down, and so does the cumulative change in the characteristic." [https://www.readthesequences.com/NoEvolutionsForCorporationsOrNanodevices]
* Visualization as signed area of rectangles: between all points [https://stats.stackexchange.com/a/18200] [http://www.davidchudzicki.com/posts/covariance-as-signed-area-of-rectangles/] and between the axes drawn by the means [https://mbernste.github.io/files/notes/VisualizingVarianceCovariance.pdf] [https://stats.seandolinar.com/covariance-different-ways-to-explain/]
* Visualization as signed area of rectangles: between all points [https://stats.stackexchange.com/a/18200] [http://www.davidchudzicki.com/posts/covariance-as-signed-area-of-rectangles/] and between the axes drawn by the means [https://mbernste.github.io/files/notes/VisualizingVarianceCovariance.pdf] [https://stats.seandolinar.com/covariance-different-ways-to-explain/]
* comparison with independence of random variables
* clarify how the expectation is taken in each of the three appearances


[[Category:Probability]]
[[Category:Probability]]

Latest revision as of 05:21, 31 July 2019

The covariance between two random variables and is defined as .

Questions/things to explain

  • A lot of explanations of covariance say things like "if the covariance is high, then the two variables vary together, so that when one is higher than average the other is as well". That sounds more like conditional expectation though, specifically . Can we express covariance in terms of this conditional language?

Here is one attempt, though I don't think it helps much:

  • Explain difference (in units, range of values) with correlation. Can we get positive/negative/large/small covariance and negative/positive/small/large correlation?
    • This seems like a good example of where correlation vs covariance matters: "Correlation is covariance divided by variance, so if A is highly predictive of B, there can be a strong “correlation” between them even if A is ranging from 0 to 9 and B is only ranging from 50.0001 and 50.0009. Price’s Equation runs on covariance of characteristics with reproduction—not correlation! If you can compress variance in characteristics into a tiny band, the covariance goes way down, and so does the cumulative change in the characteristic." [1]
  • Visualization as signed area of rectangles: between all points [2] [3] and between the axes drawn by the means [4] [5]
  • comparison with independence of random variables
  • clarify how the expectation is taken in each of the three appearances