What is an one-hot encoding?
Representing categorical data as a vector in which: - One element is set to 1.
one-hot encoding explained in plain English
Representing categorical data as a vector in which: - One element is set to 1. - All other elements are set to 0. One-hot encoding is commonly used to represent strings or identifiers that have a finite set of possible values. For example, suppose a certain categorical feature named`Scandinavia` has five possible values: - "Denmark" - "Sweden" - "Norway" - "Finland" - "Iceland" One-hot encoding could represent each of the five values as follows:
0 0 | 1 0 | 0 0 | 0 0 | 0 1 | Thanks to one-hot encoding, a model can learn different connections based on each of the five countries. Representing a feature as numerical data is an alternative to one-hot encoding. Unfortunately, representing the Scandinavian countries numerically is not a good choice. For example, consider the following numeric representation: - "Denmark" is 0 - "Sweden" is 1 - "Norway" is 2 - "Finland" is 3 - "Iceland" is 4 With numeric encoding, a model would interpret the raw numbers mathematically and would try to train on those numbers. However, Iceland isn't actually twice as much (or half as much) of something as Norway, so the model would come to some strange conclusions. See Categorical data: Vocabulary and one-hot encoding in Machine Learning Crash Course for more information.
Example
Practitioners refer to one-hot encoding when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- AUC
A number between 0.
- Backpropagation
The process that tells a neural network which internal settings caused an error and how to adjust them, working backwards through layers.
- Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs.
- Bayesian optimization
A probabilistic regression model technique for optimizing computationally expensive objective functions by instead optimizing a surrogate that quantifies the uncertainty using a Bayesian learning technique.
- classification threshold
In a binary classification, a number between 0 and 1 that converts the raw output of a logistic regression model into a prediction of either the positive class or the negative class.
- configuration
The process of assigning the initial property values used to train a model, including: hyperparameters such as: - learning rate - iterations - optimizer - loss function In machine learning projects, c
- confusion matrix
An NxN table that summarizes the number of correct and incorrect predictions that a classification model made.
- cross-entropy
A generalization of Log Loss to multi-class classification problems.
- discriminative model
A model that predicts labels from a set of one or more features.
- embedding layer
A special hidden layer that trains on a high-dimensional categorical feature to gradually learn a lower dimension embedding vector.