AI Hardware Intermediate

quantization

Overloaded term that could be used in any of the following ways: Reducing the number of bits used to store a model's parameters.

Plain English Explanation

Overloaded term that could be used in any of the following ways: Reducing the number of bits used to store a model's parameters. For example, suppose a model's parameters are stored as 32-bit floating-point numbers. Quantization converts those parameters from 32 bits down to 4, 8, or 16 bits. Quantization reduces the following: - Compute, memory, disk, and network usage - Time to infer a predication - Power consumption

How is it used?

Practitioners refer to quantization when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.