Bellman equation derivation

Bellman equation for $v_{π}$ .

We want to show $v_{π} (s) = \sum_{a} π (a ∣ s) \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) [r + γ v_{π} (s^{'})]$ for all states $s$ .

The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.

The law of total probability states that if $B$ is an event, and $C_{1}, \dots, C_{n}$ are events that partition the sample space, then $Pr (B) = \sum_{j = 1}^{n} Pr (B ∣ C_{j}) Pr (C_{j})$ .

For fixed event $A$ , the mapping $B \mapsto Pr (B ∣ A)$ is another valid probability measure. So the law of total probability states that $Pr (B ∣ A) = \sum_{j = 1}^{n} Pr (B ∣ C_{j}, A) Pr (C_{j} ∣ A)$ .