Bellman equation derivation

From Machinelearning
Revision as of 00:58, 1 September 2019 by IssaRice (talk | contribs)

Bellman equation for .

We want to show for all states .

The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.

The law of total probability states that if is an event, and are events that partition the sample space, then .

For fixed event with non-zero probability, the mapping is another valid probability measure. In other words, define by for all events . Now the law of total probability for states that . We also have

So the law of total probability states that .

Now we see how the law of total probability interacts with conditional expectation. Let be a random variable. Then . Here the event is playing the role of in the statement of the conditional law of total probability.