User:IssaRice/Schröder–Bernstein theorem

From Machinelearning

questions:

  • what is the proof that uses axiom of choice, and how does choice simplify the proof? It's described here, but I don't know enough about ordinals/cardinals to understand it: https://math.stackexchange.com/questions/223587/looking-for-cantors-original-proof-of-the-cantor-bernstein-theorem-that-relies
  • why does the wikipedia proof (and some of the other proofs) assume that A and B are disjoint? what does this buy us? I still don't understand this, but it seems like a minor point.
  • what is the point of having the set C in exercise 8.3.2? i don't see how we need this extra level of generality when applying this result in the following exercise, and the picture is simpler to draw in 8.3.2 itself. (I have now commented on Tao's blog asking about this.)

In my proof at https://taoanalysis.wordpress.com/2020/09/01/exercise-8-3-2/ i think the proof of injectivity can be improved. consider dictinct x,y, and then split cases depending whether both are green, both are white, or one is green and the other is white. Also in the surjectivity proof, i am already doing this green/white split, so it would be good to mention those colors in the proof itself (just to help the reader).

Comparison of chains/ancestors vs union proofs

The two most common proofs of the theorem do one of the following:

  • We look at "chains" like f(x), g(f(x)), f(g(f(x))), ... and then extend these backwards as well, classifying elements of A and B depending on whether a chain going to the left stops in A, stops in B, loops around, or goes backwards forever without looping. Then we define the bijection h by picking different functions depending on the chain type. (Examples: Wikipedia, Cornell page)
  • We define some strange subset of A by taking the union of weird image sets of compositions of g and f, and then we define h differently depending on whether an element is in this union or not. (Examples: Velleman, Book of Proof, Tao)

It turns out that these proofs are essentially the same, and it is only in the notation/exposition style/visualization style that they differ. See the "Unified notation" section below for how the two kinds of proofs correspond to each other.

Unified notation

Let be sets. (In Tao's book we assume but this isn't the case in the other proofs.)

(In Tao's book we assume that is the inclusion map that sends each to itself.)

for all

for all (In Tao's book, since is just the inclusion map and maps into , we have which is why Tao can write . Note that Tao uses "f" instead of "g" because his notation is different from everybody else's.)

Chain types

Type 1: Loop

Type 2: infinitely goes backwards without repeating

Type 3: Chain stops in A, with eventually undefined.

  • 3a: Elements of ( immediately undefined)
  • 3b: Elements of ( is defined, but eventually if you keep going will be undefined)
  • proof of equivalence: In general, if then there is some and natural number k such that , so the chain will look like and will be undefined. This shows that all elements in the union of the C's are of chain type 3. To show the other direction, suppose x is of chain type 3. Then is eventually undefined, so there is some element such that for some natural number k. This shows that so x is in the union as well.

Type 4: Chain stops in B, with eventually undefined.

  • 4a: Elements of ( immediately undefined)
  • 4b: Elements of ( is defined, but eventually if you keep going will be undefined)

Definitions of the bijection in various books

  • Book of Proof: h(x) = f(x) if x is of type 3; if x is of type 1,2,4.
  • Cornell page: h(x) = f(x) if x is of type 1,2,3; if x is of type 4.
  • Tao: h(x) = f(x) if x is of type 1,2,3,4a (actually case 4a doesn't even exist in the Tao book since A is a subset of B); if x is of type 4b
  • Wikipedia: h(x) = f(x) if x is of type 3; if x is of type 4; h(x) can be defined either way if x is of type 1 or 2 (I think this is the key insight, and explains a lot of the variation between different proofs, since they just pick either f or g^-1 for types 1 and 2)

Viewing the theorem as a fixed point result

Is the result a fixed point result? (if so how can we phrase it as such?) or is it just that some of the proofs make use of the fixed point ideas? Yes, this can be viewed as a fixed point result. This page cleared things up for me: https://ncatlab.org/nlab/show/Cantor-Schroeder-Bernstein+theorem#proof

basically when we write the definition of the bijection h, we have two cases, and if we call one of those cases X (specifically, the case where we just apply f), then we can write X = g(f(X)^c)^c where ^c is set complementation, so X is a fixed point of the operation x -> g(f(x)^c)^c

This page explains the relevance of knaster-tarski better (HT Satira): https://proofwiki.org/wiki/Cantor-Bernstein-Schr%C3%B6der_Theorem/Proof_6

So it's something like:

  • first we notice that SB can be stated as a fixed point theorem
  • the operation that has a fixed point happens to fit some preconditions of knaster-tarski lemma, so we can use this lemma to produce a fixed point
  • given the fixed point, we can "unwrap" the operations to produce the subsets needed to define the bijection h

Exercise 9 here https://www.greaterwrong.com/posts/9a2asxypuNjCmga3p/iteration-fixed-point-exercises writes the same fixed point instead as B' = f(A') and A-A' = g(B-B') (but if you substitute in B' into the second equation and then take the complement of both sides, you get exactly the same fixed point.)

References