guardrails
Any software or process that prevents harm to humans or systems.
Plain English Explanation
Any software or process that prevents harm to humans or systems. Harm can take many forms, including preventing data leaks or unauthorized access, or ensuring that an LLM's responses don't contain offensive material.
How is it used?
Practitioners refer to guardrails when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.