Variants of Solomonoff induction
This page lists some variants of Solomonoff induction. Solomonoff induction is an idealized method of sequence prediction
For determinism, I think "deterministic" is the same as "Solomonoff prior" and "stochastic" is the same as "universal mixture". Sterkenburg calls deterministic versions a "bottom-up approach" whereas the universal mixture is a "top-down approach" (p. 30).[1] For deterministic variants, the type of universal machine must be specified. With universal mixtures, one must specify two things: the weighting to use, and the class of distributions to consider.
For discrete vs continuous, I think this just means whether the prior we define is over finite strings or over infinite sequences (where we want to know the probability of an infinite sequence starting with a given finite string). I'm not sure how to tell whether a given formula is discrete or continuous. One difference seems to be that with discrete semimeasures, we only require that the sum is at most 1, whereas with continuous semimeasures we also require that ? (see e.g. p. 5 of [1] and p. 294 of Li and Vitanyi) Apparently one way to think of the discrete version is to think of the sample space as the natural numbers, i.e. one-letter strings from a countably infinite alphabet (see p. 265 of Li and Vitanyi).
NOTE: as of july 2019 i'm no longer sure that discrete vs continuous is actually a distinction. i think all(?) versions of solomonoff induction use continuous semimeasures. i'm not sure why the discrete one can't be used though. e.g. "For applications such as the prediction of growing sequences it is necessary to define a similar distribution on infinite binary sequences" [2] Solomonoff also says similar things in at least one of his papers.
February 2020 update: there is a discrete vs continuous distinction after all. With a discrete semimeasure, for all non-empty strings y, since we are considering ordinary Turing machines that halt. if a TM halts with output x, then it can't also halt with output xy. So the universal discrete semimeasure can't be used for sequence prediction, since all conditional probabilities are zero, hence useless. To do sequence prediction, we need to switch to monotone TMs and allow partial outputs.
Source | Formula | Determinism | Discrete vs continuous | Notes |
---|---|---|---|---|
LessWrong Wiki[2] | ![]() ![]() |
Deterministic; prefix Turing machine | Discrete | |
Scholarpedia discrete universal a priori probability[3] | ![]() |
Deterministic; prefix Turing machine | Discrete | In this notation, ![]() ![]() ![]() ![]() ![]() |
Scholarpedia continuous universal a priori probability[3] | ![]() |
Deterministic; monotone Turing machine | Continuous | |
Sterkenburg (p. 22)[1] | ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Deterministic; ordinary Turing machine | Discrete (I think because we are only considering halting inputs, the sample space is the set of all finite strings, which is a countable/discrete set) | this seems similar to solomonoff's section 3.3. also I think this is different from the scholarpedia ![]() |
Sterkenburg (p. 24)[1] | ![]() ![]() ![]() ![]() ![]() |
deterministic; universal Turing machine, universal prefix machine (to get a probability distribution; see remark on p. 27) | discrete? | this formula does not define a probability distribution over strings ![]() |
Sterkenburg (p. 25)[1] | ![]() ![]() ![]() ![]() ![]() ![]() |
deterministic; universal Turing machine | ![]() ![]() | |
Sterkenburg (p. 26)[1] | ![]() |
deterministic; universal Turing machine, universal prefix machine (to get a probability distribution; see remark on p. 27) | The use of the ![]() | |
Sterkenburg (p. 29)[1] | ![]() ![]() ![]() ![]() ![]() |
deterministic; universal monotone machine | continuous? | |
Sterkenburg (p. 31)[1] | ![]() |
Stochastic; | I think this is the same as Solomonoff's section 3.4, eq. 13, except that Solomonoff takes the limit of the ![]() | |
Sterkenburg (p. 33)[1] | ![]() |
Stochastic; weighting unspecified, except for the requirement that ![]() ![]() ![]() |
||
Arbital[4] | ![]() |
Stochastic; the weighting is 1/2 to the power of the length of the program; the model class is all programs that define a probability measure (?), where the programs are given using a prefix-free code | ||
Solomonoff section 3.1, eq. 1[5] | ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Deterministic; universal Turing machine | Solomonoff originally gave the conditional probability of seeing ![]() ![]() ![]() ![]() | |
Solomonoff section 3.2, eq. 7[5] | ![]() ![]() ![]() ![]() |
Deterministic; universal monotone machine | ||
Solomonoff section 3.3, eq. 9 and 11[5] | ![]() ![]() ![]() ![]() ![]() |
Deterministic; either a universal Turing machine or a universal monotone machine | ||
Solomonoff section 3.3, eq. 10[5] | ![]() |
Deterministic; | Solomonoff says this follows from equation 10, but I'm not sure how. In more modern notation, I think this can be written as ![]() | |
Solomonoff section 3.4, eq. 13[5] | ![]() |
Stochastic | i think solomonoff is using all computable measures (rather than enumerable semimeasures) | |
Legg[6] | ![]() ![]() ![]() ![]() ![]() |
Stochastic | Legg is summing over all the semimeasures rather than programs that compute semimeasures, which is why he needs to define the weights ![]() ![]() |
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Tom Florian Sterkenburg. "The Foundations of Solomonoff Prediction". February 2013.
- ↑ https://wiki.lesswrong.com/wiki/Solomonoff_induction
- ↑ 3.0 3.1 Marcus Hutter; Shane Legg; Paul M.B. Vitanyi. "Algorithmic probability". Scholarpedia. 2007.
- ↑ "Solomonoff induction: Intro Dialogue (Math 2)". Arbital.
- ↑ 5.0 5.1 5.2 5.3 5.4 R. J. Solomonoff. "A formal theory of inductive inference. Part I". 1964.
- ↑ Shane Legg. "Solomonoff Induction". 2004.