Variants of Solomonoff induction: Difference between revisions

Revision as of 03:40, 1 April 2019

This page lists some variants of Solomonoff induction.

For determinism, I think "deterministic" is the same as "Solomonoff prior" and "stochastic" is the same as "universal mixture". Sterkenburg calls deterministic versions a "bottom-up approach" whereas the universal mixture is a "top-down approach" (p. 30).^[1] For deterministic variants, the type of universal machine must be specified. With universal mixtures, one must specify two things: the weighting to use, and the class of distributions to consider.

For discrete vs continuous, I think this just means whether the prior we define is over finite strings or over infinite sequences (where we want to know the probability of an infinite sequence starting with a given finite string). I'm not sure how to tell whether a given formula is discrete or continuous. One difference seems to be that with discrete semimeasures, we only require that the sum is at most 1, whereas with continuous semimeasures we also require that $\mu (x)\geq \mu (x0)+\mu (x1)$ ? (see e.g. p. 5 of [1] and p. 294 of Li and Vitanyi) Apparently one way to think of the discrete version is to think of the sample space as the natural numbers, i.e. one-letter strings from a countably infinite alphabet (see p. 265 of Li and Vitanyi).

Source	Formula	Determinism	Discrete vs continuous	Notes
LessWrong Wiki^[2]	$m(y_{0})=\sum _{p\in {\mathcal {P}}:U(p)=y_{0}}2^{-\ell (p)}$ where ${\mathcal {P}}$ is the set of self-delimiting programs	Deterministic; page doesn't say type of machine, but uses self-delimiting programs and it's discrete, so prefix Turing machine?	Discrete because of the $U(p)=y_{0}$ rather than $U(p)=y_{0}*$ ?
Scholarpedia discrete universal a priori probability^[3]	$m(x)=\sum _{p:U(p)=x}2^{-\ell (p)}$ where the sum is over halting programs	deterministic? prefix Turing machine	discrete
Scholarpedia continuous universal a priori probability^[3]	$M(x)=\sum _{p:U(p)=x*}2^{-\ell (p)}$ where the sum is over minimal programs	deterministic? Monotone Turing machine	Continuous
Sterkenburg (p. 22)^[1]	$P_{\mathrm {I} }(\sigma )=\lim _{n\to \infty }{\frac {\|T_{\sigma ,n}\|}{\|T_{n}\|}}$ where $\sigma$ is a finite string, $T_{n}$ is the set of all halting (valid) inputs of length $n$ to the reference machine $U$ , $T_{\sigma ,n}$ is the set of all halting (valid) inputs of length $n$ that output something starting with $\sigma$	deterministic; universal Turing machine (no restrictions on prefix-free-ness)	discrete?	this seems similar to solomonoff's section 3.3
Sterkenburg (p. 24)^[1]	$P'_{\mathrm {II} }(\sigma )=2^{-\|\tau _{\mathrm {min} }\|}$ where $\tau _{\mathrm {min} }$ is the shortest program $\tau$ such that $U(\tau )=\sigma$ (i.e. the shortest program that causes the reference machine to output $\sigma$ and halt)	deterministic; universal Turing machine, universal prefix machine (to get a probability distribution; see remark on p. 27)	discrete?	this formula does not define a probability distribution over strings $\sigma$ because the sum of probabilities does not converge
Sterkenburg (p. 25)^[1]	$P''_{\mathrm {II} }(\sigma )=\lim _{n\to \infty }\sum _{\tau \in T_{\sigma ,n}}2^{-\|\tau \|}$ where $T_{\sigma ,n}$ is the set of all programs $\tau$ of length $n$ such that $U(\tau )$ begins with $\sigma$	deterministic; universal Turing machine		$P''_{\mathrm {II} }(\sigma )$ is divergent even for a single $\sigma$ , so this is not actually a workable version, but is intended as a stepping stone
Sterkenburg (p. 26)^[1]	$P_{\mathrm {II} }(\sigma )=\lim _{\epsilon \to 0}\lim _{n\to \infty }\sum _{\tau \in T_{\sigma ,n}}\left({\frac {1-\epsilon }{2}}\right)^{\|\tau \|}$	deterministic; universal Turing machine, universal prefix machine (to get a probability distribution; see remark on p. 27)		The use of the $\epsilon$ is a hack to get the sum to converge
Sterkenburg (p. 29)^[1]	$Q_{U}(\sigma )=\sum _{\tau \in T_{\sigma }}2^{-\|\tau \|}$ where $T_{\sigma }$ is the set of minimal descriptions of $\sigma$ (i.e. set of programs that output something starting with $\sigma$ such that if one removes one bit from the end of the program, it no longer outputs something starting with $\sigma$ )	deterministic; universal monotone machine	continuous?
Sterkenburg (p. 31)^[1]	$P_{\mathrm {IV} }(\sigma )=\lim _{n\to \infty }\sum _{i}f_{i,n}P_{i}(\sigma )$	Stochastic;		i think this is the same as solomonoff's section 3.4, eq. 13
Sterkenburg (p. 33)^[1]	$\xi _{w}(\sigma )=\sum _{i}w(\mu _{i})\mu _{i}(\sigma )$	Stochastic; weighting unspecified, except for the requirement that $w(\mu _{i})>0$ for all $i$ and that $\sum _{i}w(\mu _{i})\leq 1$ ; the model class is all semicomputable semimeasures
Arbital^[4]	$\mathbb {S} \mathrm {ol} (s_{\preceq n})=\sum _{\mathrm {prog} \in {\mathcal {P}}}2^{-\mathrm {length} (\mathrm {prog} )}\cdot \mathbb {P} _{\mathrm {prog} }(s_{\preceq n})$	Stochastic; the weighting is 1/2 to the power of the length of the program; the model class is all programs that define a probability measure (?), where the programs are given using a prefix-free code
Solomonoff section 3.1, eq. 1^[5]	$\lim _{\epsilon \to 0}\lim _{n\to \infty }\sum _{k=1}^{r^{n}}\sum _{i=1}^{\infty }((1-\epsilon )/2)^{N_{(S_{TC_{n,k}})_{i}}}$ (this might be incorrect). Solomonoff is using the notation $C_{n,k}$ to mean that he doesn't care about what happens as long as the desired string $T$ starts the output. In more modern notation we would say $M(p)=T*$ or $T\preceq M(p)$ rather than $M(p)=TC_{n,k}$ (although note that with Solomonoff's notation, we are also restricting the length of the output, so in the modern notation we would have to also require that $\|M(p)\|=\|T\|+n$ ).	Deterministic; universal Turing machine		Solomonoff originally gave the conditional probability of seeing $a$ given we've already seen $T$ , which he writes $P(a,T,M_{1})$ but which in more common notation would be something like $P_{M_{1}}(Ta\mid T)$ .
Solomonoff section 3.2, eq. 7^[5]	$\sum _{i=1}^{\infty }2^{-N(T,i)}$ where $N(T,i)$ is the $i$ th minimal program for $T$	Deterministic; universal monotone machine
Solomonoff section 3.3, eq. 9 and 11^[5]	$\lim _{R\to \infty }N_{T}/N_{R}$ , where $N_{R}$ is the number of programs of length $R$ that cause the machine to halt eventually, and $N_{T}$ are the subset of those programs that cause the machine to output something starting with $T$	Deterministic; either a universal Turing machine or a universal monotone machine
Solomonoff section 3.3, eq. 10^[5]	$\sum _{n=1}^{\infty }\sum _{k=1}^{r^{n}}\sum _{i=1}^{\infty }((1-\epsilon )/2)^{N_{(S_{TC_{n,k}})_{i}}}$	Deterministic;		Solomonoff says this follows from equation 10, but I'm not sure how. In more modern notation, I think this can be written as $\sum _{p:U(p)=T*}\left({\frac {1-\epsilon }{2}}\right)^{\|p\|}$

References

↑ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 ^1.6 ^1.7 Tom Florian Sterkenburg. "The Foundations of Solomonoff Prediction". February 2013.
↑ https://wiki.lesswrong.com/wiki/Solomonoff_induction
↑ ^3.0 ^3.1 Marcus Hutter; Shane Legg; Paul M.B. Vitanyi. "Algorithmic probability". Scholarpedia. 2007.
↑ "Solomonoff induction: Intro Dialogue (Math 2)". Arbital.
↑ ^5.0 ^5.1 ^5.2 ^5.3 R. J. Solomonoff. "A formal theory of inductive inference. Part I". 1964.

[sterkenburg-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 ^1.6 ^1.7 Tom Florian Sterkenburg. "The Foundations of Solomonoff Prediction". February 2013.

[2] ttps://wiki.lesswrong.com/wiki/Solomonoff_induction

[scholarpedia-3] 3.0 ^3.1 Marcus Hutter; Shane Legg; Paul M.B. Vitanyi. "Algorithmic probability". Scholarpedia. 2007.

[eliezer-dialogue-4] "Solomonoff induction: Intro Dialogue (Math 2)". Arbital.

[solomonoff-5] 5.0 ^5.1 ^5.2 ^5.3 R. J. Solomonoff. "A formal theory of inductive inference. Part I". 1964.

[1]

[2]

[3]

[4]

[5]

@@ Line 35: / Line 35: @@
 | Solomonoff section 3.2, eq. 7<ref name="solomonoff"/> || <math>\sum_{i=1}^\infty 2^{-N(T,i)}</math> where <math>N(T,i)</math> is the <math>i</math>th minimal program for <math>T</math> || Deterministic; universal monotone machine || ||
 |-
-| Solomonoff section 3.3, eq. 9<ref name="solomonoff"/> || <math>N_T/N_R</math>, where <math>N_R</math> is the number of programs of length <math>R</math> that cause the machine to halt eventually, and <math>N_T</math> are the subset of those programs that cause the machine to output something starting with <math>T</math> || Deterministic; either a universal Turing machine or a universal monotone machine || ||
+| Solomonoff section 3.3, eq. 9 and 11<ref name="solomonoff"/> || <math>\lim_{R\to\infty} N_T/N_R</math>, where <math>N_R</math> is the number of programs of length <math>R</math> that cause the machine to halt eventually, and <math>N_T</math> are the subset of those programs that cause the machine to output something starting with <math>T</math> || Deterministic; either a universal Turing machine or a universal monotone machine || ||
 |-
 | Solomonoff section 3.3, eq. 10<ref name="solomonoff"/> || <math>\sum_{n=1}^\infty \sum_{k=1}^{r^n} \sum_{i=1}^\infty ((1-\epsilon)/2)^{N_{(S_{TC_{n,k}})_i}}</math> || Deterministic; || || Solomonoff says this follows from equation 10, but I'm not sure how. In more modern notation, I think this can be written as <math>\sum_{p:U(p)=T*} \left(\frac{1-\epsilon}{2}\right)^{|p|}</math>