What is a bucketing?
Converting a single feature into multiple binary features called buckets or bins, typically based on a value range.
bucketing explained in plain English
Converting a single feature into multiple binary features called buckets or bins, typically based on a value range. The chopped feature is typically a continuous feature. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete buckets, such as: - <= 10 degrees Celsius would be the "cold" bucket. - 11 - 24 degrees Celsius would be the "temperate" bucket. - >= 25 degrees Celsius would be the "warm" bucket. The model will treat every value in the same bucket identically. For example, the values`13` and`22` are both in the temperate bucket, so the model treats the two values identically.
If you represent temperature as a continuous feature, then the model treats temperature as a single feature. If you represent temperature as three buckets, then the model treats each bucket as a separate feature. That is, a model can learn separate relationships of each bucket to the label. For example, a linear regression model can learn separate weights for each bucket. Increasing the number of buckets makes your model more complicated by increasing the number of relationships that your model must learn. For example, the cold, temperate, and warm buckets are essentially three separate features for your model to train on. If you decide to add two more buckets--for example, freezing and hot--your model would now have to train on five separate features. How do you know how many buckets to create, or what the ranges for each bucket should be? The answers typically require a fair amount of experimentation. --- See Numerical data: Binning in Machine Learning Crash Course for more information.
Example
Practitioners refer to bucketing when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- A/B testing
A statistical way of comparing two (or more) techniques—the A and the B.
- ablation
A technique for evaluating the importance of a feature or component by temporarily removing it from a model.
- accuracy
The number of correct classification predictions divided by the total number of predictions.
- activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
- active learning
A training approach in which the algorithm chooses some of the data it learns from.
- adaptation
Synonym for tuning or fine-tuning.
- agglomerative clustering
See hierarchical clustering.
- anomaly detection
The process of identifying outliers.
- area under the PR curve
See PR AUC (Area under the PR Curve).
- area under the ROC curve
See AUC (Area under the ROC curve).