Machine Learning Ethics & Safety Intermediate

toxicity

The degree to which content is abusive, threatening, or offensive.

Plain English Explanation

The degree to which content is abusive, threatening, or offensive. Many machine learning models can identify, measure, and classify toxicity. Most of these models identify toxicity along multiple parameters, such as the level of abusive language and the level of threatening language.

How is it used?

Practitioners refer to toxicity when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.