Bellman equation

Plain English Explanation

In reinforcement learning, the following identity satisfied by the optimal Q-function: \[Q(s, a) = r(s, a) + \gamma \mathbb{E}_{s'|s,a} \max_{a'} Q(s', a')\] Reinforcement learning algorithms apply this identity to create Q-learning using the following update rule: \[Q(s,a) \gets Q(s,a) + \alpha \left[r(s,a) + \gamma \displaystyle\max_{\substack{a_1}} Q(s',a') - Q(s,a) \right] \] Beyond reinforcement learning, the Bellman equation has applications to dynamic programming. See the Wikipedia entry for Bellman equation.

How is it used?

Practitioners refer to bellman equation when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.