Equivalence of random coinflips view and minimal programs view

From Machinelearning

Two of the several ways of viewing the deterministic variant of Solomonoff induction are:

  • The "random coinflips view": Given a monotone Turing machine, we feed the machine with random data (such as from unbiased coinflips). The probability of string x is then the probability that the machine outputs something starting with x. This one is somewhat hard to think about, because one usually thinks of a Turing machine as e.g. a Python interpreter that takes Python programs. If one feeds a Python interpreter with randomness, one is likely to end up running a horrible mess that isn't even syntactically valid. I guess the reply here is something like (1) if a program isn't even syntactically valid, it just results in a program with no output; (2) using an encoding, one can filter out all invalid programs so that each input into the UTM does something; (3) more generally, thinking of "coinflips" is meant to suggest the idea that one has no information about what the program looks like.
  • The "minimal programs view": Given a monotone Turing machine T, the minimal programs producing x are those programs p such that T(p)=x* (i.e. T when run with p produces an output starting with x) and such that no proper prefix of p produces x (i.e. if qp then T(q)x*). This view then says that to get the probability of x, we add up all the minimal programs weighted by length as follows: for each program p we assign the weight 2|p|, where |p| is the length of p in bits. Thus the measure is p2|p|, where the sum is taken over all minimal programs p producing x. The idea with the 2|p| is that there are 2n programs of length n, so given just the information that p has length n, the probability that we have a specific program is 1/2n=2n.

These two views turn out to be equivalent in the sense that .

Is this result obvious?

Having the details spelled out, one might believe the equivalence is obvious. I will quote the following, which I wrote down on 2019-03-25, as evidence that this equivalence is not obvious: "It seems to me like the 'random coinflips' probability of getting a string starting with x is not the same as the sum of the probabilities of the minimal programs that print a string starting with x. And yet, everyone seems to equate the two. My intuition is that the minimal programs one ignores programs that print x but aren't minimal, whereas the random coinflips one takes these into account." (My confusion here is that the coinflips one also ignores certain programs, but it just ignores different programs.)

Example

Let T be a monotone Turing machine whose behavior is as follows. Note that T isn't a universal machine, so this example isn't actually an instance of the equivalence, but is still illustrative and gives a feel for how these things work.

Line Input Behavior
1 0 x
2 1 ϵ (empty string)
3 10 xy, halts
4 11 ϵ, halts
5 01 x, halts
6 00 x, halts

We can also implement T as an interactive algorithm as follows:

def T:
    b := input()
    if b == 0:
        print("x")
        b := input()
    else:
        b := input()
        if b == 0:
            print("xy")

The above table exactly captures the behavior for T since all four length-2 strings lead to a halting state, T does nothing more when fed a longer input.

One can verify that T is monotone: 0 is a prefix for 01 and 00, and indeed x is a prefix for x and x. Similarly, 1 is a prefix for 10 and 11, and indeed ϵ is a prefix for ϵ and xy.

Now let us see what probability each "view" assigns to the output starting with x.

  • Random coinflips view: the idea is to generate an infinite sequence of coinflips, e.g. 1010010110100001... and feed this into T. But since our machine does nothing more after two bits are fed in, we can get away with flipping the coin just twice. So our possible coinflip outcomes are 00, 01, 10, 11, each with probability 1/4. Given these cases, in three of them we end up with an output starting with x (lines 3, 5, 6 in the table above). So the probability is 3/4.
  • Minimal programs view: the minimal programs that produce x are 0 (line 1) and 10 (line 3). Note that programs 01 and 00 are not minimal for x because a shorter program (namely 0) also already produces x. So according to this view, the probability is 2|0|+2|10|=1/2+1/4=3/4. (Note that |0|=1 is the length of the program 0, not the absolute value of the number 0.)

In each case, we end up with the same probability, namely 3/4. This is despite the fact that we are adding different lines. In the coinflips view, we added lines 3, 5, 6, whereas in the in the minimal programs view we added lines 1 and 3. The equivalence "random coinflips view" = "minimal programs view" is the statement that ending up with the same probability through these different methods is not a coincidence.

Example 2

Let T be a monotone Turing machine whose behavior is as follows.

Line Input Behavior
1 0 x
2 1 halts
3 00 x, halts
4 01 xy, halts

The table specifies the behavior in all cases since 1, 00, 01 covers all ways in which the input could begin, and the machine halts in all of them (so for longer input, it produces no more output).

Let us find the prior probability that the machine outputs something starting with x.

  • random coinflips view: 1/4 + 1/4 = 1/2 (add lines 3, 4)
  • minimal programs view: 1/2 (line 1)

A misconception I had when I first thought up this example is that I confused "is minimal" and "is minimal for x". The program 01 (line 4) is minimal for xy but it isn't minimal for x. Having this mistake would lead to adding up 1/2 + 1/4 = 3/4 (lines 1, 4). Instead, one must specifically search for programs that are minimal for the desired output, rather than just minimal at all. A hint that something is not right with my misconception is that the set {0,01} is not prefix-free.

References