What is a Q-function?
In reinforcement learning, the function that predicts the expected return from taking an action in a state and then following a given policy.
Q-function explained in plain English
In reinforcement learning, the function that predicts the expected return from taking an action in a state and then following a given policy. Q-function is also known as state-action value function.
Example
Practitioners refer to q-function when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- action
In reinforcement learning, the mechanism by which the agent transitions between states of the environment.
- Bellman equation
In reinforcement learning, the following identity satisfied by the optimal Q-function: \[Q(s, a) = r(s, a) + \gamma \mathbb{E}_{s'|s,a} \max_{a'} Q(s', a')\] Reinforcement learning algorithms apply this identity to create Q-learning using the following update rule: \[Q(s,a) \gets
- candidate sampling
A training-time optimization that calculates a probability for all the positive labels, using, for example, softmax, but only for a random sample of negative labels.
- Deep Q-Network
In Q-learning, a deep neural network that predicts Q-functions.
- environment
In reinforcement learning, the world that contains the agent and allows the agent to observe that world's state.
- episode
In reinforcement learning, each of the repeated attempts by the agent to learn an environment.
- epsilon greedy policy
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise.
- experience replay
In reinforcement learning, a DQN technique used to reduce temporal correlations in training data.
- greedy policy
In reinforcement learning, a policy that always chooses the action with the highest expected return.
- Markov decision process
A graph representing the decision-making model where decisions (or actions) are taken to navigate a sequence of states under the assumption that the Markov property holds.