Bellman equation derivation: Difference between revisions

From Machinelearning
(Created page with "Bellman equation for <math>v_\pi</math>. We want to show <math>v_\pi(s) = \sum_a \pi(a \mid s) \sum_{s',r} p(s',r\mid s,a) [r + \gamma v_\pi(s')]</math> for all states <math>...")
 
No edit summary
Line 5: Line 5:
The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.
The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.


The law of total probability states that if <math>B</math> is an event and <math>C_1, \ldots, C_n</math> are a partition of the sample space, then <math>\Pr(B) = \sum_{j=1}^n \Pr(B \mid C_j)\Pr(C_j)</math>.
The law of total probability states that if <math>B</math> is an event, and <math>C_1, \ldots, C_n</math> are events that partition the sample space, then <math>\Pr(B) = \sum_{j=1}^n \Pr(B \mid C_j)\Pr(C_j)</math>.
 
For fixed event <math>A</math>, the mapping <math>B \mapsto \Pr(B \mid A)</math> is another valid probability measure. So the law of total probability states that <math>\Pr(B \mid A) = \sum_{j=1}^n \Pr(B \mid C_j,A)\Pr(C_j\mid A)</math>.

Revision as of 00:21, 1 September 2019

Bellman equation for .

We want to show for all states .

The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.

The law of total probability states that if is an event, and are events that partition the sample space, then .

For fixed event , the mapping is another valid probability measure. So the law of total probability states that .