Bellman equation derivation: Difference between revisions

Revision as of 00:58, 1 September 2019

Bellman equation for $v_{\pi }$ .

We want to show $v_{\pi }(s)=\sum _{a}\pi (a\mid s)\sum _{s',r}p(s',r\mid s,a)[r+\gamma v_{\pi }(s')]$ for all states $s$ .

The core idea of the proof is to use the law of total probability to go from marginal to conditional probabilities, and then invoke the Markov assumption.

The law of total probability states that if $B$ is an event, and $C_{1},\ldots ,C_{n}$ are events that partition the sample space, then ${\textstyle \Pr(B)=\sum _{j=1}^{n}\Pr(B\mid C_{j})\Pr(C_{j})}$ .

For fixed event $A$ with non-zero probability, the mapping $B\mapsto \Pr(B\mid A)$ is another valid probability measure. In other words, define $\Pr _{A}$ by $\Pr _{A}(B):=\Pr(B\mid A)$ for all events $B$ . Now the law of total probability for $P_{A}$ states that ${\textstyle \Pr _{A}(B)=\sum _{j=1}^{n}\Pr _{A}(B\mid C_{j})\Pr _{A}(C_{j})}$ . We also have

\Pr _{A}(B\mid C_{j})={\frac {\Pr _{A}(B\cap C_{j})}{\Pr _{A}(C_{j})}}={\frac {\Pr(B\cap C_{j}\mid A)}{\Pr(C_{j}\mid A)}}={\frac {\Pr(B\cap C_{j}\cap A)/\Pr(A)}{\Pr(C_{j}\cap A)/\Pr(A)}}={\frac {\Pr(B\cap (C_{j}\cap A))}{\Pr(C_{j}\cap A)}}=\Pr(B\mid C_{j},A)

So the law of total probability states that ${\textstyle \Pr(B\mid A)=\sum _{j=1}^{n}\Pr(B\mid C_{j},A)\Pr(C_{j}\mid A)}$ .

Now we see how the law of total probability interacts with conditional expectation. Let $X$ be a random variable. Then $\mathbb {E} [X\mid A]=\sum _{x}x\cdot \Pr(X=x\mid A)=\sum _{x}x\cdot \sum _{j}\Pr(X=x\mid C_{j},A)\Pr(C_{j}\mid A)$ . Here the event $X=x$ is playing the role of $B$ in the statement of the conditional law of total probability.

Revision as of 00:56, 1 September 2019 (view source) IssaRice (talk \| contribs) No edit summary ← Older edit		Revision as of 00:58, 1 September 2019 (view source) IssaRice (talk \| contribs) No edit summary Newer edit →
Line 12:		Line 12:

	So the law of total probability states that <math display="inline">\Pr(B \mid A) = \sum_{j=1}^n \Pr(B \mid C_j,A)\Pr(C_j\mid A)</math>.		So the law of total probability states that <math display="inline">\Pr(B \mid A) = \sum_{j=1}^n \Pr(B \mid C_j,A)\Pr(C_j\mid A)</math>.

			Now we see how the law of total probability interacts with conditional expectation. Let <math>X</math> be a random variable. Then <math>\mathbb E[X \mid A] = \sum_x x \cdot \Pr(X = x \mid A) = \sum_x x \cdot \sum_j \Pr(X =x \mid C_j,A)\Pr(C_j\mid A)</math>. Here the event <math>X=x</math> is playing the role of <math>B</math> in the statement of the conditional law of total probability.