User:IssaRice/Distribution of X over Y

From Machinelearning

A distribution of over is any function .

Examples:

  • A probability distribution over a finite sample space is a distribution of probabilities over the sample space . In other words, a probability distribution is a function . In the case of probability distributions, we also require some extra conditions (namely that ).
  • A wealth distribution over people is a distribution of wealth () over a set of people (), that is, a function , where is a set of people. Thus, if then is 's wealth.
  • A gender distribution over people assigns a gender to each person.
  • A gender distribution can also be thought of as a distribution of sets of people over genders. where
  • A random variable is a distribution of real values over a sample space.
  • A frequency distribution of test scores is a distribution of frequencies over test scores, i.e. a function (where test scores are real and the frequency is a natural number)

The definition above runs into trouble when the set is uncountable. In this case, we might not be able to find any function that satisfies the extra conditions we want to place on the distribution (e.g. in the case of probability distributions, we want to assign 0 to every individual outcome, but then the sum is also 0 rather than 1).

It seems like one way to get around this is to change the type of a distribution. Namely, rather than a function we have some collection of subsets of , and we define a function . What constraints should this new function satisfy? In the case of probability, we have a condition that translates between subsets and real numbers. In general, we need some notion of a "merge": if have "stuff" respectively, then how much "stuff" in does the "merged" have? In the case of probabilities and wealth (and other things that take real numbers) we can add, but it seems like we can't ask what the "combined" gender of two people is.

The other way to generalize to uncountable sets seems to be to look at density (like a probability density function). But it seems like we can get density via the subsets method above.

Can we restrict to be finite, then define ? Then we can force the outcome to be a vector. In the case of gender we have , and , so it's like we're just tracking the counts of each gender separately.


One thing i haven't seen discussed elsewhere, but for probability distributions, often "pmf", "probability density", "cdf", etc. are all called "distributions". Then there is the formal definition you can find in places like rosenthal's measure theoretic probability book (a probability measure that assigns a number to each borel set, or whatever). why all these different uses? i think the trick is to realize that we don't really care about the different, they are all the same thing. in programmer's terminology, you can think of "distribution" as the object, and each of pmf, cdf, density, etc., are just methods you can call. e.g. define an object like b = Beta(2,5). then you can call b.density, b.cdf, b.borel, etc. i guess the one other thing to mention is that if you start with a cdf, you can get to the same borel measure thingy, and if you start with a pdf, you can get to the same cdf, etc. (there are probably some measure theoretic subtleties i am botching here..) So all these separate functional "representations" are pointing at/gesturing toward the same underlying object.