What is a Mean Squared Error?
The average loss per example when L2 loss is used.
Mean Squared Error explained in plain English
The average loss per example when L2 loss is used. Calculate Mean Squared Error as follows: 1. Calculate the L2 loss for a batch. 2. Divide the L2 loss by the number of examples in the batch.
where: - $n$ is the number of examples. - $y$ is the actual value of the label. - $\hat{y}$ is the model's prediction for $y$. --- For example, consider the loss on the following batch of five examples: Loss | Squared loss | --- | --- | 1 | 1 | 1 | 1 | 3 | 9 | 2 | 4 | 1 | 1 | | 16 = L2 loss | Therefore, the Mean Squared Error is:
Mean Squared Error is a popular training optimizer, particularly for linear regression. Contrast Mean Squared Error with Mean Absolute Error and Root Mean Squared Error. TensorFlow Playground uses Mean Squared Error to calculate loss values.
Example
Outliers strongly influence Mean Squared Error. For example, a loss of 1 is a squared loss of 1, but a loss of 3 is a squared loss of 9. In the preceding table, the example with a loss of 3 accounts for ~56% of the Mean Squared Error, while each of the examples with a loss of 1 accounts for only 6% of the Mean Squared Error. Outliers don't influence Mean Absolute Error as strongly as Mean Squared Error. For example, a loss of 3 accounts for only ~38% of the Mean Absolute Error. Clipping is one way to prevent extreme outliers from damaging your model's predictive ability. ---
People also read
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- feature engineering
A process that involves the following steps: 1.
- layer
A set of neurons in a neural network.
- probabilistic regression model
A regression model that uses not only the weights for each feature, but also the uncertainty of those weights.
- retrieval-augmented generation
A technique for improving the quality of large language model (LLM) output by grounding it with sources of knowledge retrieved after the model was trained.
- scalar
A single number or a single string that can be represented as a tensor of rank 0.
- sparse feature
A feature whose values are predominately zero or empty.
- A/B testing
A statistical way of comparing two (or more) techniques—the A and the B.
- ablation
A technique for evaluating the importance of a feature or component by temporarily removing it from a model.
- accuracy
The number of correct classification predictions divided by the total number of predictions.