What is a greedy policy?
In reinforcement learning, a policy that always chooses the action with the highest expected return.
greedy policy explained in plain English
In reinforcement learning, a policy that always chooses the action with the highest expected return.
Example
Practitioners refer to greedy policy when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- action
In reinforcement learning, the mechanism by which the agent transitions between states of the environment.
- Bellman equation
In reinforcement learning, the following identity satisfied by the optimal Q-function: \[Q(s, a) = r(s, a) + \gamma \mathbb{E}_{s'|s,a} \max_{a'} Q(s', a')\] Reinforcement learning algorithms apply this identity to create Q-learning using the following update rule: \[Q(s,a) \gets
- candidate sampling
A training-time optimization that calculates a probability for all the positive labels, using, for example, softmax, but only for a random sample of negative labels.
- Deep Q-Network
In Q-learning, a deep neural network that predicts Q-functions.
- environment
In reinforcement learning, the world that contains the agent and allows the agent to observe that world's state.
- episode
In reinforcement learning, each of the repeated attempts by the agent to learn an environment.
- epsilon greedy policy
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise.
- experience replay
In reinforcement learning, a DQN technique used to reduce temporal correlations in training data.
- Markov decision process
A graph representing the decision-making model where decisions (or actions) are taken to navigate a sequence of states under the assumption that the Markov property holds.
- Neural Architecture Search
A technique for automatically designing the architecture of a neural network.