Proportion of valid programs view of Solomonoff induction
Proportion of valid programs view of Solomonoff induction is one of several ways of viewing the deterministic variant of Solomonoff induction. This view roughly says that to get the prior probability of seeing some output string , one should take all programs in some relevant class and find the fraction of programs that output string . For example, if we consider 500 programs in our class and 10 of them output , the prior probability of seeing is 10/500 = 1/50.
This view is intuitive, assuming that one accepts the wikipedia:Principle of indifference. The idea is that one knows that the actual program is in the class, but one has no insight as to which program within that class is the actual one. Since the programs represent mutually exclusive and collectively exhaustive possibilities, if there are programs the prior probability assigned to each program is . If we want to know the prior probability of seeing the output , we can just add up the probabilities of the programs that output . If there are such programs, we end up with .
The terminology "proportion of valid programs view" is not standard in the literature; this "view" does not seem to have a name (none of the other views seem to have a name either).
How to decide on the class of programs
- all programs of length equal to n? -- this assumption is natural if we assume there is some deterministic program "out there" that the world is running. Then it must have some length n (since all programs are finite in length), even though we don't know what that n is. This turns out to correspond to the "random coinflips" view.
- all programs of length up to n? -- this assumption is natural if we assume there is some finite upper bound on how complicated our program can be. This turns out to correspond to the "add up the minimal programs" view
- all programs?
- all programs with valid syntax?
- all programs that halt and produce some output?
Let be a monotone Turing machine whose behavior is as follows. Note that isn't a universal machine, so this example isn't actually an instance of the equivalence, but is still illustrative and gives a feel for how these things work.
We can also implement as an interactive algorithm as follows:
def T: b := input() if b == 0: print("x") b := input() else: b := input() if b == 0: print("xy")
The above table exactly captures the behavior for since all four length-2 strings lead to a halting state, does nothing more when fed a longer input.
Suppose we are trying to look at all programs of length n=2. Here we have a total of four programs (lines 3--6) to consider. Three of them produce x, so the prior probability of string x is 3*(1/4) = 3/4.
Now suppose we are looking at all programs with length at most n=2. The program 0 (line 1) produces x. But now we have one extra bit left before we hit our upper bound, so we can "pad out" the program in all possible ways, namely by adding either a 0 or 1 to the end of the program. This is similar to how in a Python program, we can make a program longer without changing its behavior by adding comments or filler statements like