Large Language Models Prompt Engineering Mathematics Intermediate

distillation

The process of reducing the size of one model (known as the teacher) into a smaller model (known as the student) that emulates the original model's predictions as faithfully as possible.

Plain English Explanation

The process of reducing the size of one model (known as the teacher) into a smaller model (known as the student) that emulates the original model's predictions as faithfully as possible. Distillation is useful because the smaller model has two key benefits over the larger model (the teacher): - Faster inference time - Reduced memory and energy usage However, the student's predictions are typically not as good as the teacher's predictions. Distillation trains the student model to minimize a loss function based on the difference between the outputs of the predictions of the student and teacher models. Compare and contrast distillation with the following terms: - fine-tuning - prompt-based learning See LLMs: Fine-tuning, distillation, and prompt engineering in Machine Learning Crash Course for more information.

How is it used?

Practitioners refer to distillation when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.