What is a Q-learning?
In reinforcement learning, an algorithm that allows an agent to learn the optimal Q-function of a Markov decision process by applying the Bellman equation.
Q-learning explained in plain English
In reinforcement learning, an algorithm that allows an agent to learn the optimal Q-function of a Markov decision process by applying the Bellman equation. The Markov decision process models an environment.
Example
Practitioners refer to q-learning when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- action
In reinforcement learning, the mechanism by which the agent transitions between states of the environment.
- environment
In reinforcement learning, the world that contains the agent and allows the agent to observe that world's state.
- episode
In reinforcement learning, each of the repeated attempts by the agent to learn an environment.
- epsilon greedy policy
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise.
- experience replay
In reinforcement learning, a DQN technique used to reduce temporal correlations in training data.
- policy
In reinforcement learning, an agent's probabilistic mapping from states to actions.
- replay buffer
In DQN-like algorithms, the memory used by the agent to store state transitions for use in experience replay.
- return
In reinforcement learning, given a certain policy and a certain state, the return is the sum of all rewards that the agent expects to receive when following the policy from the state to the end of the episode.
- state
In reinforcement learning, the parameter values that describe the current configuration of the environment, which the agent uses to choose an action.
- termination condition
In agentic AI, the predefined criteria that tell the agent to stop iterating.