User:IssaRice/Pareto distribution: Difference between revisions

Latest revision as of 04:49, 6 February 2020

derivation

apparently there are a bunch of different mechanisms/models that can derive the pareto distribution, but the only one that makes sense to me so far is the yule process/preferential attachment.

http://www.cs.cornell.edu/courses/cs6241/2019sp/readings/Newman-2005-distributions.pdf#page=18

chapter 7 (Multiplicative processes) of the heavy tails book also has a derivation.

80/20 rule

What does it mean to say that the top 20% own 80% of the wealth? It means that ${\frac {\int _{c}^{\infty }xf(x)\,dx}{\int _{x_{m}}^{\infty }xf(x)\,dx}}=0.80$ where we choose the cutoff c according to $\int _{c}^{\infty }f(x)\,dx=0.20$ .

Why? $\int _{c}^{\infty }f(x)\,dx$ is the fraction of people with wealth above the cutoff c; we're just integrating the pdf of the distribution.

$\int _{c}^{\infty }xf(x)\,dx$ is the amount of wealth owned by the people with wealth above c. This is because at each wealth level x, such people are a fraction f(x)dx of people, and they each own wealth x, so the wealth owned by them is xf(x)dx. Now just sum over all of them from c to infinity.

The above can also be expressed as ${\frac {\mathbb {E} [X\mid X>c]}{\mathbb {E} [X]}}=0.80$ where the cutoff c is chosen by $\mathbb {P} (X>c)=0.20$ .

Using $\alpha =(\log 5)/(\log 4)$ (the specific value of alpha needed to make the 80/20 rule hold) and $x_{m}=1$ in the Pareto distribution, we get $c={\frac {1}{0.20^{1/\alpha }}}={\frac {1}{0.20^{(\log 4)/(\log 5)}}}$ . The mean is ${\frac {\alpha }{\alpha -1}}\approx 7.213$ [1] and $\int _{c}^{\infty }xf(x)\,dx\approx 5.77$ [2]. Sure enough, 5.77/7.213 is very close to 0.80.

Another way to look at this is to look at the inverse CDF, which takes the percentile of wealth to the actual value of the wealth. We should get ${\frac {\int _{1-0.2}^{1}\mathrm {cdf} ^{-1}}{\int _{0}^{1}\mathrm {cdf} ^{-1}}}=0.8$ [3].

deriving the scaled pareto

from the graph of the inverse cdf, it's clear that the y axis isn't what we want (it doesn't correspond to income). it seems natural to scale the whole graph in this setting, so given the inverse cdf $x={\frac {1}{(1-F)^{1/\alpha }}}$ , scale this by a constant $\beta >0$ to get $x={\frac {\beta }{(1-F)^{1/\alpha }}}$ . Solving this for F gives $F=1-\beta ^{\alpha }x^{-\alpha }$ and differentiating wrt x gives the pdf $f(x)={\frac {\beta ^{\alpha }\alpha }{x^{\alpha +1}}}$ .

since the y-axis of the inverse cdf was shifted uniformly, the "80/20"-ness of the distribution doesn't change.

three ways to picture the pareto distribution

pdf
inverse cdf; examples: [4], [5]
lorenz curve; examples: [6] -- with the lorenz curve, to get the percentage you can just look at the y-axis and subtract things. with the pdf you can't really read off this info at all, and with the inverse cdf, you need to integrate.

@@ Line 1: / Line 1: @@
+==derivation==
+apparently there are a bunch of different mechanisms/models that can derive the pareto distribution, but the only one that makes sense to me so far is the yule process/preferential attachment.
-I still don't know how this is derived, but here's how to make sense of the 80/20 rule:
+http://www.cs.cornell.edu/courses/cs6241/2019sp/readings/Newman-2005-distributions.pdf#page=18
+chapter 7 (Multiplicative processes) of the heavy tails book also has a derivation.
+==80/20 rule==
 What does it mean to say that the top 20% own 80% of the wealth? It means that <math>\frac{\int_c^\infty xf(x)\, dx}{\int_{x_m}^\infty xf(x)\, dx} = 0.80</math> where we choose the cutoff c according to <math>\int_c^\infty f(x)\, dx = 0.20</math>.
@@ Line 11: / Line 17: @@
 The above can also be expressed as <math>\frac{\mathbb E[X \mid X > c]}{\mathbb E[X]} = 0.80</math> where the cutoff c is chosen by <math>\mathbb P(X > c) = 0.20</math>.
-Using <math>\alpha = (\log 5)/(\log 4)</math> in the Pareto distribution, we get <math>c = \frac1{0.20^{1/\alpha} = \frac{1}{0.20^{(\log 4)/(\log 5)}}</math>. The mean is <math>\frac{\alpha}{\alpha - 1} \approx 7.213</math> and <math>\int_c^\infty xf(x)\, dx \approx 5.77</math>. Sure enough, 5.77/7.213 is very close to 0.80.
+Using <math>\alpha = (\log 5)/(\log 4)</math> (the specific value of alpha needed to make the 80/20 rule hold) and <math>x_m = 1</math> in the Pareto distribution, we get <math>c = \frac1{0.20^{1/\alpha}} = \frac{1}{0.20^{(\log 4)/(\log 5)}}</math>. The mean is <math>\frac{\alpha}{\alpha - 1} \approx 7.213</math> [https://www.wolframalpha.com/input/?i=%28log+5%2Flog+4%29%2F%28log5%2Flog+4+-+1%29] and <math>\int_c^\infty xf(x)\, dx \approx 5.77</math> [https://www.wolframalpha.com/input/?i=integral+from+%281%2F0.20%5E%28log+4+%2F+log+5%29%29+to+infinity+of+%28log+5+%2F+log+4%29%2Fx%5E%28log+5%2Flog+4%29+dx]. Sure enough, 5.77/7.213 is very close to 0.80.
+Another way to look at this is to look at the inverse CDF, which takes the percentile of wealth to the actual value of the wealth. We should get <math>\frac{\int_{1-0.2}^1 \mathrm{cdf}^{-1}}{\int_0^1 \mathrm{cdf}^{-1}} = 0.8</math> [https://www.wolframalpha.com/input/?i=%28integral+of+1%2F%281+-+x%29%5E%28log+4%2Flog+5%29+from+0.8+to+1%29%2F%28integral+of+1%2F%281+-+x%29%5E%28log+4%2Flog+5%29+from+0+to+1%29].
+==deriving the scaled pareto==
+from the graph of the inverse cdf, it's clear that the y axis isn't what we want (it doesn't correspond to income). it seems natural to scale the whole graph in this setting, so given the inverse cdf <math>x = \frac1{(1-F)^{1/\alpha}}</math>, scale this by a constant <math>\beta > 0</math> to get <math>x = \frac\beta{(1-F)^{1/\alpha}}</math>. Solving this for F gives <math>F = 1 - \beta^\alpha x^{-\alpha}</math> and differentiating wrt x gives the pdf <math>f(x) = \frac{\beta^\alpha\alpha}{x^{\alpha+1}}</math>.
+since the y-axis of the inverse cdf was shifted uniformly, the "80/20"-ness of the distribution doesn't change.
+==three ways to picture the pareto distribution==
+* pdf
+* inverse cdf; examples: [https://cdn.80000hours.org/wp-content/uploads/2016/06/80K_paretograph-01.png], [https://cdn.80000hours.org/wp-content/uploads/2017/04/80K_articles_worldincome_V5-01.jpg]
+* lorenz curve; examples: [http://2kmv472ndwiv3k5806rleif1-wpengine.netdna-ssl.com/wp-content/uploads/2018/04/LT-Pareto-Distribution-Pareto-2.gif] -- with the lorenz curve, to get the percentage you can just look at the y-axis and subtract things. with the pdf you can't really read off this info at all, and with the inverse cdf, you need to integrate.