AI Hardware Intermediate 1 min read

What is a quantization?

Overloaded term that could be used in any of the following ways: Reducing the number of bits used to store a model's parameters.

quantization explained in plain English

Overloaded term that could be used in any of the following ways: Reducing the number of bits used to store a model's parameters. For example, suppose a model's parameters are stored as 32-bit floating-point numbers. Quantization converts those parameters from 32 bits down to 4, 8, or 16 bits. Quantization reduces the following: - Compute, memory, disk, and network usage - Time to infer a predication - Power consumption

Example

Practitioners refer to quantization when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.

quantization explained in plain English

Example

People also read