Bellman equation derivation: Difference between revisions

Revision as of 00:56, 1 September 2019

Bellman equation for $v_{π}$ .

We want to show $v_{π} (s) = \sum_{a} π (a ∣ s) \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) [r + γ v_{π} (s^{'})]$ for all states $s$ .

The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.

The law of total probability states that if $B$ is an event, and $C_{1}, \dots, C_{n}$ are events that partition the sample space, then $Pr (B) = \sum_{j = 1}^{n} Pr (B ∣ C_{j}) Pr (C_{j})$ .

For fixed event $A$ with non-zero probability, the mapping $B \mapsto Pr (B ∣ A)$ is another valid probability measure. In other words, define ${Pr}_{A}$ by ${Pr}_{A} (B) : = Pr (B ∣ A)$ for all events $B$ . Now the law of total probability for $P_{A}$ states that ${Pr}_{A} (B) = \sum_{j = 1}^{n} {Pr}_{A} (B ∣ C_{j}) {Pr}_{A} (C_{j})$ . We also have

{Pr}_{A} (B ∣ C_{j}) = \frac{{Pr}_{A} (B \cap C_{j})}{{Pr}_{A} (C_{j})} = \frac{Pr (B \cap C_{j} ∣ A)}{Pr (C_{j} ∣ A)} = \frac{Pr (B \cap C_{j} \cap A) / Pr (A)}{Pr (C_{j} \cap A) / Pr (A)} = \frac{Pr (B \cap (C_{j} \cap A))}{Pr (C_{j} \cap A)} = Pr (B ∣ C_{j}, A)

So the law of total probability states that $Pr (B ∣ A) = \sum_{j = 1}^{n} Pr (B ∣ C_{j}, A) Pr (C_{j} ∣ A)$ .

@@ Line 7: / Line 7: @@
 The law of total probability states that if <math>B</math> is an event, and <math>C_1, \ldots, C_n</math> are events that partition the sample space, then <math display="inline">\Pr(B) = \sum_{j=1}^n \Pr(B \mid C_j)\Pr(C_j)</math>.
-For fixed event <math>A</math>, the mapping <math>B \mapsto \Pr(B \mid A)</math> is another valid probability measure. So the law of total probability states that <math display="inline">\Pr(B \mid A) = \sum_{j=1}^n \Pr(B \mid C_j,A)\Pr(C_j\mid A)</math>.
+For fixed event <math>A</math> with non-zero probability, the mapping <math>B \mapsto \Pr(B \mid A)</math> is another valid probability measure. In other words, define <math>\Pr_A</math> by <math>\Pr_A(B) := \Pr(B \mid A)</math> for all events <math>B</math>. Now the law of total probability for <math>P_A</math> states that <math display="inline">\Pr_A(B) = \sum_{j=1}^n \Pr_A(B \mid C_j)\Pr_A(C_j)</math>. We also have
+:<math>\Pr_A(B \mid C_j) = \frac{\Pr_A(B \cap C_j)}{\Pr_A(C_j)} = \frac{\Pr(B\cap C_j \mid A)}{\Pr(C_j\mid A)} = \frac{\Pr(B \cap C_j \cap A)/\Pr(A)}{\Pr(C_j \cap A)/\Pr(A)} = \frac{\Pr(B\cap (C_j\cap A))}{\Pr(C_j \cap A)} = \Pr(B \mid C_j,A)</math>
+So the law of total probability states that <math display="inline">\Pr(B \mid A) = \sum_{j=1}^n \Pr(B \mid C_j,A)\Pr(C_j\mid A)</math>.