What is a reinforcement learning?
A family of algorithms that learn an optimal policy, whose goal is to maximize return when interacting with an environment.
reinforcement learning explained in plain English
A family of algorithms that learn an optimal policy, whose goal is to maximize return when interacting with an environment. For example, the ultimate reward of most games is victory. Reinforcement learning systems can become expert at playing complex games by evaluating sequences of previous game moves that ultimately led to wins and sequences that ultimately led to losses.
Example
Practitioners refer to reinforcement learning when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- action
In reinforcement learning, the mechanism by which the agent transitions between states of the environment.
- Bellman equation
In reinforcement learning, the following identity satisfied by the optimal Q-function: \[Q(s, a) = r(s, a) + \gamma \mathbb{E}_{s'|s,a} \max_{a'} Q(s', a')\] Reinforcement learning algorithms apply this identity to create Q-learning using the following update rule: \[Q(s,a) \gets
- candidate sampling
A training-time optimization that calculates a probability for all the positive labels, using, for example, softmax, but only for a random sample of negative labels.
- Deep Q-Network
In Q-learning, a deep neural network that predicts Q-functions.
- environment
In reinforcement learning, the world that contains the agent and allows the agent to observe that world's state.
- episode
In reinforcement learning, each of the repeated attempts by the agent to learn an environment.
- epsilon greedy policy
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise.
- experience replay
In reinforcement learning, a DQN technique used to reduce temporal correlations in training data.
- greedy policy
In reinforcement learning, a policy that always chooses the action with the highest expected return.
- Markov decision process
A graph representing the decision-making model where decisions (or actions) are taken to navigate a sequence of states under the assumption that the Markov property holds.