Low-Rank Adaptability

Plain English Explanation

A parameter-efficient technique for fine tuning that "freezes" the model's pre-trained weights (such that they can no longer be modified) and then inserts a small set of trainable weights into the model. This set of trainable weights (also known as "update matrixes") is considerably smaller than the base model and is therefore much faster to train. LoRA provides the following benefits: - Improves the quality of a model's predictions for the domain where the fine tuning is applied. - Fine-tunes faster than techniques that require fine-tuning all of a model's parameters. - Reduces the computational cost of inference by enabling concurrent serving of multiple specialized models sharing the same base model.

The update matrixes used in LoRA consist of rank decomposition matrixes, which are derived from the base model to help filter out noise and focus training on the most important features of the model. ---

How is it used?

Practitioners refer to low-rank adaptability when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.