AIExplainer
AI Hardware Intermediate

quantization

Overloaded term that could be used in any of the following ways: Reducing the number of bits used to store a model's parameters.

Overloaded term that could be used in any of the following ways: Reducing the number of bits used to store a model's parameters. For example, suppose a model's parameters are stored as 32-bit floating-point numbers. Quantization converts those parameters from 32 bits down to 4, 8, or 16 bits. Quantization reduces the following: - Compute, memory, disk, and network usage - Time to infer a predication - Power consumption

Practitioners refer to quantization when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.