What is a guardrails?
Any software or process that prevents harm to humans or systems.
guardrails explained in plain English
Any software or process that prevents harm to humans or systems. Harm can take many forms, including preventing data leaks or unauthorized access, or ensuring that an LLM's responses don't contain offensive material.
Example
Practitioners refer to guardrails when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- Confabulation
When an AI produces a confident, fluent answer that sounds true but is factually wrong — generating plausible language without a reliable link to reality.
- evaluation
The process of measuring a model's quality or comparing different models against each other.
- hallucination
The production of plausible-seeming but factually incorrect output by a generative AI model that purports to be making an assertion about the real world.
- preprocessing
Processing data before it's used to train a model.
- prompt set
A group of prompts for evaluating a large language model.
- reporting bias
The fact that the frequency with which people write about actions, outcomes, or properties is not a reflection of their real-world frequencies or the degree to which a property is characteristic of a class of individuals.
- agent orchestration
The centralized management and routing of tasks across multiple sub-agents or LLM calls.
- AI slop
Output from a generative AI system that favors quantity over quality.
- Attention
A mechanism that lets a model focus on the most relevant parts of its input when producing an output, weighting what matters most in context.
- attribute
Synonym for feature.