bag of words
A representation of the words in a phrase or passage, irrespective of order.
Plain English Explanation
A representation of the words in a phrase or passage, irrespective of order. For example, bag of words represents the following three phrases identically: - the dog jumps - jumps the dog - dog jumps the Each word is mapped to an index in a sparse vector, where the vector has an index for every word in the vocabulary. For example, the phrase the dog jumps is mapped into a feature vector with non-zero values at the three indexes corresponding to the words the, dog, and jumps. The non-zero value can be any of the following: - A 1 to indicate the presence of a word. - A count of the number of times a word appears in the bag. For example, if the phrase were the maroon dog is a dog with maroon fur, then both maroon and dog would be represented as 2, while the other words would be represented as 1. - Some other value, such as the logarithm of the count of the number of times a word appears in the bag.
How is it used?
Practitioners refer to bag of words when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.