Equivalence of random coinflips view and minimal programs view

Two of the several ways of viewing the deterministic variant of Solomonoff induction are:

The "random coinflips view": Given a monotone Turing machine, we feed the machine with random data (such as from unbiased coinflips). The probability of string $x$ is then the probability that the machine outputs something starting with $x$ . This one is somewhat hard to think about, because one usually thinks of a Turing machine as e.g. a Python interpreter that takes Python programs. If one feeds a Python interpreter with randomness, one is likely to end up running a horrible mess that isn't even syntactically valid. I guess the reply here is something like (1) if a program isn't even syntactically valid, it just results in a program with no output; (2) using an encoding, one can filter out all invalid programs so that each input into the UTM does something; (3) more generally, thinking of "coinflips" is meant to suggest the idea that one has no information about what the program looks like.
The "minimal programs view": Given a monotone Turing machine $T$ , the minimal programs producing $x$ are those programs $p$ such that $T (p) = x *$ (i.e. $T$ when run with $p$ produces an output starting with $x$ ) and such that no proper prefix of $p$ produces $x$ (i.e. if $q ≺ p$ then $T (q) \neq x *$ ). This view then says that to get the probability of $x$ , we add up all the minimal programs weighted by length as follows: for each program $p$ we assign the weight $2^{- | p |}$ , where $| p |$ is the length of $p$ in bits. Thus the measure is $\sum_{p} 2^{- | p |}$ , where the sum is taken over all minimal programs $p$ producing $x$ . The idea with the $2^{- | p |}$ is that there are $2^{n}$ programs of length $n$ , so given just the information that $p$ has length $n$ , the probability that we have a specific program is $1 / 2^{n} = 2^{- n}$ .

These two views turn out to be equivalent in the sense that .

Example

Let $T$ be a monotone Turing machine whose behavior is as follows. Note that $T$ isn't a universal machine, so this example isn't actually an instance of the equivalence, but is still illustrative and gives a feel for how these things work.

Line	Input	Behavior
1	0	x
2	1	$ϵ$ (empty string)
3	10	xy, halts
4	11	$ϵ$ , halts
5	01	x, halts
6	00	x, halts

We can also implement $T$ as an interactive algorithm as follows:

def T:
    b := input()
    if b == 0:
        print("x")
        b := input()
    else:
        b := input()
        if b == 0:
            print("xy")

The above table exactly captures the behavior for $T$ since all four length-2 strings lead to a halting state, $T$ does nothing more when fed a longer input.

One can verify that $T$ is monotone: 0 is a prefix for 01 and 00, and indeed x is a prefix for x and x. Similarly, 1 is a prefix for 10 and 11, and indeed $ϵ$ is a prefix for $ϵ$ and xy.

Now let us see what probability each "view" assigns to the output starting with x.

Random coinflips view: the idea is to generate an infinite sequence of coinflips, e.g. 1010010110100001... and feed this into $T$ . But since our machine does nothing more after two bits are fed in, we can get away with flipping the coin just twice. So our possible coinflip outcomes are 00, 01, 10, 11, each with probability 1/4. Given these cases, in three of them we end up with an output starting with x (lines 3, 5, 6 in the table above). So the probability is 3/4.
Minimal programs view: the minimal programs that produce x are 0 (line 1) and 10 (line 3). Note that programs 01 and 00 are not minimal for x because a shorter program (namely 0) also already produces x. So according to this view, the probability is $2^{- | 0 |} + 2^{- | 10 |} = 1 / 2 + 1 / 4 = 3 / 4$ . (Note that $| 0 | = 1$ is the length of the program 0, not the absolute value of the number 0.)

In each case, we end up with the same probability, namely 3/4. This is despite the fact that we are adding different lines. In the coinflips view, we added lines 3, 5, 6, whereas in the in the minimal programs view we added lines 1 and 3. The equivalence "random coinflips view" = "minimal programs view" is the statement that ending up with the same probability through these different methods is not a coincidence.