User:IssaRice/Extreme value theorem: Difference between revisions

From Machinelearning
No edit summary
 
(37 intermediate revisions by the same user not shown)
Line 1: Line 1:
Working through the proof in Pugh's book by filling in the parts he doesn't talk about.
Working through the proof in Pugh's book by filling in the parts he doesn't talk about.


Let <math>M = \sup\{f(x) : x \in [a,b]\}</math> and <math>X = \{x \in [a,b] : \sup V_x < M\}</math>.
For <math>x \in [a,b]</math>, define <math>V_x = f([a,x]) = \{f(t) : a \leq t \leq x\}</math> to be the values that <math>f</math> takes on as the input ranges from <math>a</math> to <math>x</math> (inclusive).


Now suppose <math>f(a) < M</math>. Then <math>a \in X</math>. We already know that <math>X</math> is bounded above, for instance by the number <math>b</math>. We can thus take the least upper bound of <math>X</math>, say <math>c = \sup X</math>. We already know <math>f(c) \leq M</math>, so if we can just eliminate the possibility that <math>f(c) < M</math>, we will be done.
Let <math>M = \sup\{f(x) : a \leq x \leq b\} = \sup V_b</math> (this number exists by the boundedness theorem) and <math>X = \{x \in [a,b] : \sup V_x < M\}</math>.<ref group="note">If we had used "<math>\leq</math>" in the definition of <math>X</math>, then when we take the supremum we would just end up with <math>b</math>, regardless of where <math>f</math> achieves the maximum.</ref>


So suppose <math>f(c) < M</math>. We can choose <math>\epsilon > 0</math> with <math>\epsilon < M - f(c)</math>.<ref group="note">it is important here that <math>\epsilon</math> does not equal <math>M - f(c)</math>; choosing this <math>\epsilon</math> would be too weak and we would not be able to conclude <math>\sup V_c < M</math>, rather only that <math>\sup V_c \leq M</math>.</ref> By continuity at <math>c</math>, there exists a <math>\delta > 0</math> such that <math>|t-c|<\delta</math> implies <math>|f(t)-f(c)|<\epsilon</math>. This means <math>f(t) < f(c) + \epsilon < M</math>. If <math>t \leq c-\delta/2</math> then there exists some <math>x \in X</math> such that <math>t < x</math>. This means <math>\sup V_t \leq \sup V_x < M</math> so <math>t \in X</math>. Otherwise if <math>c-\delta/2 \leq t \leq c</math> then <math>|t-c|<\delta</math> so <math>f(t) < f(c) + \epsilon < M</math>. So now what can we say about <math>\sup V_c</math>? We want to say <math>\sup V_c < M</math>. We can do this by showing that there exists a number <math>M' < M</math> such that <math>f(t) < M'</math> for all <math>t \in [a,c]</math>. That way, <math>\sup V_c \leq M' < M</math>. But <math>M' = \sup V_c</math> works.
Our goal now is to find some <math>x</math> such that <math>f(x) = M</math>. The idea now is to locate the leftmost point where <math>f</math> attains <math>M</math> by taking the supremum of <math>X</math>. But we have a small problem, which is that <math>X</math> might be empty (it is however always bounded, so we don't need to worry about that part). This can happen if <math>f(a) = M</math>, in which case <math>\sup V_a = M</math>. But if that's the case, we have already found a point where <math>f</math> equals <math>M</math>, so we're actually done!


<math>\sup V_{c-\delta/2} < M</math> so let <math>M' = (M+\sup V_{c-\delta/2})/2</math>. Then <math>\sup V_{c-\delta/2} < M' < M</math> and if <math>v \in V_{c-\delta/2}</math> we have <math>v \leq \sup V_{c-\delta/2} < M'</math>.
So now suppose <math>f(a) < M</math>. Then <math>a \in X</math>. We already know that <math>X</math> is bounded above, for instance by the number <math>b</math>. We can thus take the least upper bound of <math>X</math>, say <math>c = \sup X</math>. We already know <math>f(c) \leq M</math>, so if we can just eliminate the possibility that <math>f(c) < M</math>, we will be done.


Therefore, <math>c=b</math>, which implies that <math>M = \sup V_b = \sup V_c < M</math>, a contradiction.
So suppose for sake of contradiction that <math>f(c) < M</math>. Choose <math>\epsilon > 0</math> with <math>\epsilon < M - f(c)</math>.<ref group="note">It is important here that <math>\epsilon</math> does not equal <math>M - f(c)</math>; choosing this <math>\epsilon</math> would be too weak and we would not be able to conclude <math>\sup V_c < M</math>, rather only that <math>\sup V_c \leq M</math>.</ref> Continuity of <math>f</math> at <math>c</math> implies that there exists <math>\delta > 0</math> such that <math>x \in (c-\delta, c+\delta)\cap [a,b]</math> implies <math>|f(x)-f(c)|<\epsilon</math>. Now, <math>c-\delta</math> cannot be an upper bound of <math>X</math>, so there exists some <math>x_0 \in X</math> such that <math>c-\delta < x_0 \leq c</math>. Thus for <math>x \in [a,x_0]</math> we have <math>f(x) \leq \sup V_{x_0} < M</math>. Also, for <math>x \in [x_0, c] \subseteq (c-\delta, c+\delta)\cap [a,b]</math> we have <math>f(x) < f(c) + \epsilon < M</math>. So <math>f</math> is bounded above by <math>\max\{\sup V_{x_0}, f(c) + \epsilon\} < M</math> on <math>[a,c]</math>, which means <math>\sup V_c < M</math>.<ref group="note">This part of the proof uses quite a bit of "low-level" argumentation, so it can be easy to miss the broader point. The reason we split the interval <math>[a,c]</math> into two parts is that we know two facts about <math>f</math>: (1) near <math>c</math>, continuity shows that <math>f</math> must be close to the value of <math>f(c)</math>; since we assumed <math>f(c) < M</math>, this means we can find a neighborhood around <math>c</math> where <math>f</math> is bounded away from <math>M</math>. (2) up to <math>c</math>, our choice of <math>c</math> means the value of <math>f</math> is bounded away from <math>M</math>. Then we pick <math>x_0</math> as a "handing off point" to pass from one side to the other.</ref> If <math>c < b</math>, then there are points to the right of <math>c</math> where <math>f</math> continues to stay below <math>f(c) + \epsilon < M</math>, which contradicts the fact that <math>c</math> is an upper bound of <math>X</math>. So <math>c=b</math>, which means <math>M = \sup V_b = \sup V_c < M</math>, a contradiction. The assumption that <math>f(c) < M</math> was false, and we conclude <math>f(c) = M</math>.
 
==Takeaways==
 
* "less than" vs "bounded away from". If we have a sequence like <math>0.1, 0.01, 0.001, \ldots</math>, then we can say this is greater than 0. but we cannot say it is "bounded away from zero", because it gets closer and closer to it in the limit. similarly, in the proof above, we wanted the values of f to be less than M even after taking the supremum of values of f, which meant that we had to bound f from above, not by M, but by something less than M (say m). That way, we could say that <math>\sup \{\text{some set}\} \leq m < M</math>, so that even after taking sup, we are ''strictly'' less than M.
* there are "easier" ways to prove this theorem, for instance by using the machinery of sequences. but the nice thing about this proof is that it uses only the least upper bound property of real numbers. so you get to see how facts about continuity are derived "by hand".


==Notes==
==Notes==


<references group="note" />
<references group="note" />

Latest revision as of 05:18, 29 July 2019

Working through the proof in Pugh's book by filling in the parts he doesn't talk about.

For x[a,b], define Vx=f([a,x])={f(t):atx} to be the values that f takes on as the input ranges from a to x (inclusive).

Let M=sup{f(x):axb}=supVb (this number exists by the boundedness theorem) and X={x[a,b]:supVx<M}.[note 1]

Our goal now is to find some x such that f(x)=M. The idea now is to locate the leftmost point where f attains M by taking the supremum of X. But we have a small problem, which is that X might be empty (it is however always bounded, so we don't need to worry about that part). This can happen if f(a)=M, in which case supVa=M. But if that's the case, we have already found a point where f equals M, so we're actually done!

So now suppose f(a)<M. Then aX. We already know that X is bounded above, for instance by the number b. We can thus take the least upper bound of X, say c=supX. We already know f(c)M, so if we can just eliminate the possibility that f(c)<M, we will be done.

So suppose for sake of contradiction that f(c)<M. Choose ϵ>0 with ϵ<Mf(c).[note 2] Continuity of f at c implies that there exists δ>0 such that x(cδ,c+δ)[a,b] implies |f(x)f(c)|<ϵ. Now, cδ cannot be an upper bound of X, so there exists some x0X such that cδ<x0c. Thus for x[a,x0] we have f(x)supVx0<M. Also, for x[x0,c](cδ,c+δ)[a,b] we have f(x)<f(c)+ϵ<M. So f is bounded above by max{supVx0,f(c)+ϵ}<M on [a,c], which means supVc<M.[note 3] If c<b, then there are points to the right of c where f continues to stay below f(c)+ϵ<M, which contradicts the fact that c is an upper bound of X. So c=b, which means M=supVb=supVc<M, a contradiction. The assumption that f(c)<M was false, and we conclude f(c)=M.

Takeaways

  • "less than" vs "bounded away from". If we have a sequence like 0.1,0.01,0.001,, then we can say this is greater than 0. but we cannot say it is "bounded away from zero", because it gets closer and closer to it in the limit. similarly, in the proof above, we wanted the values of f to be less than M even after taking the supremum of values of f, which meant that we had to bound f from above, not by M, but by something less than M (say m). That way, we could say that sup{some set}m<M, so that even after taking sup, we are strictly less than M.
  • there are "easier" ways to prove this theorem, for instance by using the machinery of sequences. but the nice thing about this proof is that it uses only the least upper bound property of real numbers. so you get to see how facts about continuity are derived "by hand".

Notes

  1. If we had used "" in the definition of X, then when we take the supremum we would just end up with b, regardless of where f achieves the maximum.
  2. It is important here that ϵ does not equal Mf(c); choosing this ϵ would be too weak and we would not be able to conclude supVc<M, rather only that supVcM.
  3. This part of the proof uses quite a bit of "low-level" argumentation, so it can be easy to miss the broader point. The reason we split the interval [a,c] into two parts is that we know two facts about f: (1) near c, continuity shows that f must be close to the value of f(c); since we assumed f(c)<M, this means we can find a neighborhood around c where f is bounded away from M. (2) up to c, our choice of c means the value of f is bounded away from M. Then we pick x0 as a "handing off point" to pass from one side to the other.