User:IssaRice/Extreme value theorem

From Machinelearning

Working through the proof in Pugh's book by filling in the parts he doesn't talk about.

For x[a,b], define Vx=f([a,x])={f(t):atx} to be the values that f takes on as the input ranges from a to x (inclusive).

Let M=sup{f(x):axb}=supVb (this number exists by the boundedness theorem) and X={x[a,b]:supVx<M}.[note 1]

Our goal now is to find some x such that f(x)=M. The idea now is to locate the leftmost point where f attains M by taking the supremum of X. But we have a small problem, which is that X might be empty (it is however always bounded, so we don't need to worry about that part). This can happen if f(a)=M, in which case supVa=M. But if that's the case, we have already found a point where f equals M, so we're actually done!

So now suppose f(a)<M. Then aX. We already know that X is bounded above, for instance by the number b. We can thus take the least upper bound of X, say c=supX. We already know f(c)M, so if we can just eliminate the possibility that f(c)<M, we will be done.

So suppose for sake of contradiction that f(c)<M. We want to find M<M such that f(t)M for all t[a,c]. That would mean that supVcM<M. To do this, we split the interval into two parts. Choose ϵ>0 with ϵ<Mf(c).[note 2] By continuity at c, there exists a δ>0 such that |tc|<δ implies |f(t)f(c)|<ϵ. So now pick a point like cδ/2, and split the interval into [a,cδ/2] and [cδ/2,c].

  • Since cδ/2<c, there exists xX such that cδ/2<x (otherwise cδ/2 would be a smaller upper bound for X). So supVcδ/2supVx<M. This means that f(t)supVcδ/2<M for all t[a,cδ/2].
  • But now if t[cδ/2,c], then |tc|<δ, so |f(t)f(c)|<ϵ. This means f(t)<f(c)+ϵ<M.

Now we can choose M=max{supVcδ/2,f(c)+ϵ}. Then whatever t[a,c] happens to be, we can say f(t)M.[note 3]

If c<b then by continuity we can find points t to the right of c where supVt<M, which contradicts the fact that c is an upper bound of such points.

Therefore, c=b, which implies that M=supVb=supVc<M, a contradiction. So the assumption that f(c)<M was false, and we conclude f(c)=M.

Takeaways

  • "less than" vs "bounded away from"

Notes

  1. If we had used "" in the definition of X, then when we take the supremum we would just end up with b, regardless of where f achieves the maximum.
  2. It is important here that ϵ does not equal Mf(c); choosing this ϵ would be too weak and we would not be able to conclude supVc<M, rather only that supVcM.
  3. This part of the proof uses quite a bit of "low-level" argumentation, so it can be easy to miss the broader point. The reason we split the interval [a,c] into two parts is that we know two facts about f: (1) near c, continuity shows that f must be close to the value of f(c); since we assumed f(c)<M, this means we can find a neighborhood around c where f is bounded away from M. (2) up to c, our choice of c means the value of f is bounded away from M. Then we pick cδ/2 as a "handing off point" to pass from one side to the other.