User:IssaRice/Computability and logic/Diagonalization lemma (Yanofsky method)
This page is just a rewritten version of the proof of the diagonalization lemma from Yanofsky's paper. There are some parts I found difficult to understand/gaps that I thought should be filled, so this version will hopefully be much more complete.
Theorem statement and proof
Theorem (Diagonalization lemma). Let be some nice theory (I don't remember the exact conditions, but Peano Arithmetic or Robinson Arithmetic would work). For any single-place formula with as its only free variable, there exists a sentence (closed formula) such that .
Proof. We will use what's called the wikipedia:Lindenbaum–Tarski algebra in our proof, which the Yanofsky paper just calls Lindenbaum classes. The idea is to define an equivalence an equivalence relation on formulas, so that iff our theory can prove that and are logically equivalent. In other words, we define iff . This is easily verified to be an equivalence relation. One twist is that it doesn't make sense to compare two formulas with differing arity. For instance, what does it even mean to say "2+2=5" and "x+5=7" are or are not provably equivalent? For this reason, we make a separate equivalence relation for each arity, and so we end up with separate quotients for each arity. For , we let be the set of Lindenbaum classes of formulas with free variables.
That means that e.g. each element of is not a sentence, but rather a set of sentences that are all provably equivalent. But every sentence is either true or false, so doesn't this mean has just two element, and (i.e. the class of true sentences and the class of false sentences)? No! That's because even if two sentences are both true, our theory may be unable to prove that it is so! That's the whole point of Gödel's first incompleteness theorem, which shows us that there can be true sentences (such as the famous "I am not provable" sentence ) which are not provable (so that even though is true, ).
In the paper, Yanofsky toggles fluidly between thinking of a formula like as a formula vs a set of formulas whose representative is (i.e. the class ). To avoid confusion, we will take care to always denote sets of formulas using the equivalence class notation .
Now following the paper, we define a function as follows:
This is just the act of substituting the Gödel number of first formula into the second formula. Notice a problem here! Since we have defined the function on the Lindenbaum classes rather than the bare set of formulas themselves, we must check that the result is not dependent on the choice of representatives. For we are good, because if we picked some provably equivalent instead, the results and are provably equivalent, so the two classes will be equal. But what about the choice of ? Since could be literally any one-place formula at all, and different choices of representatives will produce different Gödel numbers, substituting in these different Gödel numbers could produce totally different sentences. Imagine has Gödel number 2 and is provably equivalent to and has Gödel number 11. If is a sentence like "x+5=7", then this will give two sentences "2+5=7" and "11+5=7", which are not provably equivalent!
So the function given above is not well-defined. To get around this problem, we can say that whenever we need to make a choice of formula from an equivalence class, we always pick the one with the smallest Gödel number. Now no matter which representative we choose, there will be a further step of standardizing on which Gödel number to use, so the function ends up well-defined. Picking the smallest Gödel number is well-defined by the wikipedia:well-ordering principle. So our new definition of looks like this:
Now each single-place formula induces a mapping by . We again have to standardize by taking the smallest Gödel number in the class in order to ensure the mapping is well-defined.
Drawing on Theorem 3 (diagonal theorem) earlier in the paper, we want to perform the familiar construction to get the function .
We define as in the paper. Since this means we have by replacing inside . And now expanding the definition of we get . That is definitely way more complicated-looking than the that Yanofsky gets at this point in the paper! But it also ensures everything is well-defined.
To get the proof to work, we must show is representable. Recall that this means we must find some such that . How do we figure out which G(x) works? It's a lot easier to work informally by eliding the distinction between formulas and equivalence classes of formulas, so let's be a bit loose and be in "exploration mode" here, then we will come back later and phrase things formally.
well, we want to choose G(x) such that g(B(x)) = f(B(x), G(x)) for every B(x). Let's try to expand both LHS and RHS separately, and see where we get.
g(B(x)) = \Phi_E(f(B(x), B(x))) = E(B(B(x)))
and f(B(x), G(x)) = G(B(x))
so this means we need E(B(B(x))) = G(B(x)). So G just needs to diagonalize B(x) and then wrap it around E: G(x) = E(D(x)).
Formally, we define to be as well.
Now to really show that g is representable, we must show that .
We have
and
expanding the definition of G, we get .
To do the next step, it is convenient to put , and let be the formula such that . Note that .
So we want to compute D(n). Using our new notation, this is .
So in the end we get .
Now by definition of D, we get .
[TODO] so actually to get the proof to work, we need . But doesn't this make D uncomputable?
[TODO] Here we need to explain what D(x) means and why it's valid.
[TODO] another thing we ought to discuss in the proof: since we are looking for a fixed point for any Lind^0 -> Lind^0, that fixes what the set Y could be. But it doesn't decide what the set T could be. How do we know that T=Lind^1?
Intuitions
Acknowledgments
Thanks to Rupert McCallum for helping me work through the proof.