What is a random policy?
In reinforcement learning, a policy that chooses an action at random.
random policy explained in plain English
In reinforcement learning, a policy that chooses an action at random.
Example
Practitioners refer to random policy when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- action
In reinforcement learning, the mechanism by which the agent transitions between states of the environment.
- Bellman equation
In reinforcement learning, the following identity satisfied by the optimal Q-function: \[Q(s, a) = r(s, a) + \gamma \mathbb{E}_{s'|s,a} \max_{a'} Q(s', a')\] Reinforcement learning algorithms apply this identity to create Q-learning using the following update rule: \[Q(s,a) \gets
- candidate sampling
A training-time optimization that calculates a probability for all the positive labels, using, for example, softmax, but only for a random sample of negative labels.
- Deep Q-Network
In Q-learning, a deep neural network that predicts Q-functions.
- environment
In reinforcement learning, the world that contains the agent and allows the agent to observe that world's state.
- episode
In reinforcement learning, each of the repeated attempts by the agent to learn an environment.
- epsilon greedy policy
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise.
- experience replay
In reinforcement learning, a DQN technique used to reduce temporal correlations in training data.
- greedy policy
In reinforcement learning, a policy that always chooses the action with the highest expected return.
- Markov decision process
A graph representing the decision-making model where decisions (or actions) are taken to navigate a sequence of states under the assumption that the Markov property holds.