What is an activation function?
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
activation function explained in plain English
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label. Popular activation functions include: - ReLU - Sigmoid The plots of activation functions are never single straight lines. For example, the plot of the ReLU activation function consists of two straight lines: A plot of the sigmoid activation function looks as follows:
In a neural network, activation functions manipulate the weighted sum of all the inputs to a neuron. To calculate a weighted sum, the neuron adds up the products of the relevant values and weights. For example, suppose the relevant input to a neuron consists of the following:
Suppose the designer of this neural network chooses the sigmoid function to be the activation function. In that case, the neuron calculates the sigmoid of -2.0, which is approximately 0.12. Therefore, the neuron passes 0.12 (rather than -2.0) to the next layer in the neural network. The following figure illustrates the relevant part of the process: --- See Neural networks: Activation functions in Machine Learning Crash Course for more information.
Example
Practitioners refer to activation function when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- batch
The set of examples used in one training iteration.
- batch normalization
Normalizing the input or output of the activation functions in a hidden layer.
- batch size
The number of examples in a batch.
- Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs.
- co-adaptation
An undesirable behavior in which neurons predict patterns in training data by relying almost exclusively on outputs of specific other neurons instead of relying on the network's behavior as a whole.
- convergence
A state reached when loss values change very little or not at all with each iteration.
- deep model
A neural network containing more than one hidden layer.
- depth
The sum of the following in a neural network: - the number of hidden layers - the number of output layers, which is typically 1 - the number of any embedding layers For example, a neural network with five hidden layers and one output layer has a depth of 6.
- dropout regularization
A form of regularization useful in training neural networks.