AI Agents Reinforcement Learning Intermediate

return

Plain English Explanation

In reinforcement learning, given a certain policy and a certain state, the return is the sum of all rewards that the agent expects to receive when following the policy from the state to the end of the episode. The agent accounts for the delayed nature of expected rewards by discounting rewards according to the state transitions required to obtain the reward. Therefore, if the discount factor is \(\gamma\), and \(r_0, \ldots, r_{N}\) denote the rewards until the end of the episode, then the return calculation is as follows:

How is it used?

Practitioners refer to return when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.