Solomonoff induction: Difference between revisions

Revision as of 03:20, 22 March 2019

Solomonoff induction is an idealized method of sequence prediction.

Model selection vs sequence prediction

Variants

There are many variants of Solomonoff induction that seem to all turn out essentially the same, but most resources on Solomonoff induction only talk about one or two of the variants, which makes it pretty confusing for someone trying to learn about it. It's also not obvious that all of these variants are essentially the same, so showing that would be nice.

Comparison dimensions:

Determinism: are your programs probabilistic or deterministic? If deterministic, I think the programs require random coinflips as input.
Kind of Turing machine used: ordinary, prefix, monotone
Prediction length: finite vs infinite sequences
Input (to UTM) length: finite vs infinite
Solomonoff prior vs universal mixture; for universal mixture, you need to also specify the class over which you're mixing
Discrete vs continuous? (is this the same as finite vs infinite?)

List of variants considered in various resources:

Solomonoff's original paper:
- (section 3.1)
- (section 3.2)
- (section 3.3)
- (section 3.4) this uses a universal mixture, but i think solomonoff is using all computable measures (rather than enumerable semimeasures)
Eliezer's Solomonoff induction dialogue:^[1]
Li and Vitanyi:^[2]
- $m$ (defined in theorem 4.3.1 and example 4.3.3) seems like a discrete universal mixture. Input is finite. The "programs" (lower semicomputable discrete semimeasures $P_{j}$ ) are probabilistic.
- universal a priori probability $Q_{U}$ (definition 4.3.5): deterministic prefix machine, random coinflips as input, finite input (program), finite output
- $M$
- universal a priori probability $λ_{ψ} (x)$ (definition 4.5.6, p. 302)
- $M_{n o r m}$ (a version of $M$ that is a measure)
LessWrong Wiki:
Scholarpedia article:^[3]
- $m$ : discrete, deterministic UTM (with random input), finite output, finite input (however, the "arises in three different ways" part gives other ways to view $m$ )
- $M$ : continuous, infinite output, finite input (but maybe only because we consider minimal programs), deterministic UTM (with random coinflips)
Shane Legg's introduction:^[4]
- $P^{U} (μ)$
Sterkenburg's "The Foundations of Solomonoff Prediction":
- $P_{I}$ -- this seems similar to solomonoff's section 3.3
- $P'_{I I}$
- $P^{'}'_{I I}$
- $P_{I I}$
- algorithmic probability $Q_{U} (σ)$ (p. 29): monotone machine
- $P_{I V}$ -- i think this is the same as solomonoff's section 3.4, eq. 13
- $ξ_{w} (σ)$ (p. 33): universal mixture of semicomputable semimeasures
Marcus Hutter's papers...
Herrmann's "An Introduction to Solomonoff Induction With an Application to Scientific Confirmation Theory":
Logical Induction Arbital page:^[5]
- deterministic, regular (rather than prefix-free) Turing machines

Type of hypothesis

What is the type of a hypothesis? The type varies by formalization. For example:

In classical statistical inference, a hypothesis is a constant that happens to be unknown, so e.g. it is a real number. This hypothesis then induces a probability distribution (called a "model").
In Bayesian statistical inference, a hypothesis is an event (i.e. a subset of the sample space). There is a random variable representing the hypothesis (e.g. $Θ$ ), and you can get each individual hypothesis (e.g. $H_{1} \subseteq Ω$ ) by setting it to different values (e.g. $H_{1} = (Θ = θ_{1})$ ).
In Solomonoff induction a hypothesis is a program (?) whose type depends on the variant of Solomonoff induction being used.

Significance of random coin flips

^[6]

^[7] (p. 4)

"""The prior AIXI is based on is the Solomonoff prior. It is defined as the output of a universal Turing machine (UTM) whose inputs are coin-flips.

In other words, feed a UTM a random program. Normally, you’d think of a UTM as only being able to simulate deterministic machines. Here, however, the initial inputs can instruct the UTM to use the rest of the infinite input tape as a source of randomness to simulate a stochastic Turing machine.

Combining this with the previous idea about viewing Bayesian learning as a way of allocating “trust” to “experts” which meets a bounded loss condition, we can see the Solomonoff prior as a kind of ideal machine learning algorithm which can learn to act like any algorithm you might come up with, no matter how clever."""^[8]

Measures vs semimeasures

Multiple ways to think about semimeasures:

"defective" measures
probability distribution over finite and infinite strings^[5]

Turing machines, prefix machines, monotone machines

Kraft's inequality

Applications

there are some standard applications you can find standard references
AIXI
carl's analogy to psychophysical laws
malign prior stuff
comparison to logical induction

References

↑ "Solomonoff induction: Intro Dialogue (Math 2)". Arbital.
↑ An Introduction to Kolmogorov Complexity and Its Applications (3rd ed).
↑ Marcus Hutter; Shane Legg; Paul M.B. Vitanyi. "Algorithmic probability". 2007.
↑ Shane Legg. "Solomonoff Induction". 2004.
↑ ^5.0 ^5.1 Alex Appel. "Logical Induction (incomplete)". 2017.
↑ Marcus Hutter. "On Universal Prediction and Bayesian Confirmation".
↑ Ian Wood; Peter Sunehag; Marcus Hutter. "(Non-)Equivalence of Universal Priors".
↑ https://intelligence.org/2018/11/02/embedded-models/

[eliezer-dialogue-1] "Solomonoff induction: Intro Dialogue (Math 2)". Arbital.

[li-vitanyi-2] An Introduction to Kolmogorov Complexity and Its Applications (3rd ed).

[scholarpedia-3] Marcus Hutter; Shane Legg; Paul M.B. Vitanyi. "Algorithmic probability". 2007.

[shane-legg-4] Shane Legg. "Solomonoff Induction". 2004.

[appel-logical-induction-5] 5.0 ^5.1 Alex Appel. "Logical Induction (incomplete)". 2017.

[hutter-upbc-6] Marcus Hutter. "On Universal Prediction and Bayesian Confirmation".

[wood-sunehag-hutter-equivalence-7] Ian Wood; Peter Sunehag; Marcus Hutter. "(Non-)Equivalence of Universal Priors".

[8] ttps://intelligence.org/2018/11/02/embedded-models/

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

@@ Line 34: / Line 34: @@
 ** <math>m</math>: discrete, deterministic UTM (with random input), finite output, finite input (however, the "arises in three different ways" part gives other ways to view <math>m</math>)
 ** <math>M</math>: continuous, infinite output, finite input (but maybe only because we consider minimal programs), deterministic UTM (with random coinflips)
-* Shane Legg's introduction:
+* Shane Legg's introduction:<ref name="shane-legg">Shane Legg. [http://www.vetta.org/documents/legg-1996-solomonoff-induction.pdf "Solomonoff Induction"]. 2004.</ref>
+** <math>P^U(\mu)</math>
 * Sterkenburg's "The Foundations of Solomonoff Prediction":
 ** <math>P_{\mathrm{I}}</math> -- this seems similar to solomonoff's section 3.3