What is an epsilon greedy policy?
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise.
epsilon greedy policy explained in plain English
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise. For example, if epsilon is 0.9, then the policy follows a random policy 90% of the time and a greedy policy 10% of the time. Over successive episodes, the algorithm reduces epsilon's value in order to shift from following a random policy to following a greedy policy. By shifting the policy, the agent first randomly explores the environment and then greedily exploits the results of random exploration.
Example
Practitioners refer to epsilon greedy policy when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- action
In reinforcement learning, the mechanism by which the agent transitions between states of the environment.
- candidate sampling
A training-time optimization that calculates a probability for all the positive labels, using, for example, softmax, but only for a random sample of negative labels.
- environment
In reinforcement learning, the world that contains the agent and allows the agent to observe that world's state.
- episode
In reinforcement learning, each of the repeated attempts by the agent to learn an environment.
- experience replay
In reinforcement learning, a DQN technique used to reduce temporal correlations in training data.
- policy
In reinforcement learning, an agent's probabilistic mapping from states to actions.
- Q-learning
In reinforcement learning, an algorithm that allows an agent to learn the optimal Q-function of a Markov decision process by applying the Bellman equation.
- replay buffer
In DQN-like algorithms, the memory used by the agent to store state transitions for use in experience replay.
- return
In reinforcement learning, given a certain policy and a certain state, the return is the sum of all rewards that the agent expects to receive when following the policy from the state to the end of the episode.
- state
In reinforcement learning, the parameter values that describe the current configuration of the environment, which the agent uses to choose an action.