Bellman equation derivation

From Machinelearning
Revision as of 00:19, 1 September 2019 by IssaRice (talk | contribs) (Created page with "Bellman equation for <math>v_\pi</math>. We want to show <math>v_\pi(s) = \sum_a \pi(a \mid s) \sum_{s',r} p(s',r\mid s,a) [r + \gamma v_\pi(s')]</math> for all states <math>...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Bellman equation for .

We want to show for all states .

The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.

The law of total probability states that if is an event and are a partition of the sample space, then .