Machine Learning Large Language Models Prompt Engineering Intermediate

average precision at k

A metric for summarizing a model's performance on a single prompt that generates ranked results, such as a numbered list of book recommendations.

Plain English Explanation

A metric for summarizing a model's performance on a single prompt that generates ranked results, such as a numbered list of book recommendations. Average precision at k is, well, the average of the precision at k values for each relevant result. The formula for average precision at k is therefore: \[{\text{average precision at k}} = \frac{1}{n} \sum_{i=1}^n {\text{precision at k for each relevant item} } \] where: - \(n\) is the number of relevant items in the list. Contrast with recall at k.

Suppose a large language model is given the following query:

And the large language model returns the following list: 1. The General 2. Mean Girls 3. Platoon 4. Bridesmaids 5. Citizen Kane 6. This is Spinal Tap Four of the movies in the returned list are very funny (that is, they are relevant) but two movies are dramas (not relevant). The following table details the results: Relevant? | Precision at k | --- | --- | Yes | 1.0 | Yes | 1.0 | No | not relevant | Yes | 0.75 | No | not relevant | Yes | 0.67 | The number of relevant results is 4. Therefore, you can calculate the average precision at 6 as follows: ---

How is it used?

Practitioners refer to average precision at k when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.