AIExplainer

What is a skip-gram?

An n-gram which may omit (or "skip") words from the original context, meaning the N words might not have been originally adjacent.

An n-gram which may omit (or "skip") words from the original context, meaning the N words might not have been originally adjacent. More precisely, a "k-skip-n-gram" is an n-gram for which up to k words may have been skipped. For example, "the quick brown fox" has the following possible 2-grams: - "the quick" - "quick brown" - "brown fox" A "1-skip-2-gram" is a pair of words that have at most 1 word between them. Therefore, "the quick brown fox" has the following 1-skip 2-grams: - "the brown" - "quick fox" In addition, all the 2-grams are also 1-skip-2-grams, since fewer than one word may be skipped. Skip-grams are useful for understanding more of a word's surrounding context. In the example, "fox" was directly associated with "quick" in the set of 1-skip-2-grams, but not in the set of 2-grams. Skip-grams help train word embedding models.

Practitioners refer to skip-gram when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.