Skip to content
Navigation

Core guardrail types: risk severity levels, risk assessments, guardrail results, error handling, and the abstract backend interface.

Module Path

python
from exo.guardrail.types import (
    RiskLevel,
    RiskAssessment,
    GuardrailResult,
    GuardrailError,
    GuardrailBackend,
)

RiskLevel

Severity level of a detected risk. Inherits from StrEnum.

python
class RiskLevel(StrEnum):
    SAFE = "safe"
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"
ValueDescription
SAFENo risk detected
LOWMinor concern, does not trigger blocking
MEDIUMModerate concern, does not trigger blocking
HIGHSerious risk, triggers automatic blocking
CRITICALSevere risk, triggers automatic blocking

Blocking behavior: BaseGuardrail automatically raises GuardrailError for HIGH and CRITICAL levels. LOW and MEDIUM results are logged but do not block execution.


RiskAssessment

Result of a backend’s risk analysis. Returned by GuardrailBackend.analyze().

Base class: pydantic.BaseModel (frozen)

Constructor

python
RiskAssessment(
    has_risk: bool,
    risk_level: RiskLevel,
    risk_type: str | None = None,
    confidence: float = 1.0,
    details: dict[str, Any] = {},
)
FieldTypeDefaultDescription
has_riskbool(required)Whether any risk was detected
risk_levelRiskLevel(required)Severity of the detected risk
risk_typestr | NoneNoneCategory of risk (e.g., "prompt_injection", "pii_leak")
confidencefloat1.0Backend’s confidence in the assessment (0.0—1.0)
detailsdict[str, Any]{}Additional metadata for logging and auditing

Example

python
from exo.guardrail import RiskAssessment, RiskLevel

# Safe assessment
safe = RiskAssessment(has_risk=False, risk_level=RiskLevel.SAFE)

# Risk detected
risky = RiskAssessment(
    has_risk=True,
    risk_level=RiskLevel.HIGH,
    risk_type="prompt_injection",
    confidence=0.95,
    details={"matched_patterns": ["instruction_override"]},
)

GuardrailResult

Outcome of a guardrail check, including an optional sanitised data modification. Returned by BaseGuardrail.detect().

Base class: pydantic.BaseModel (frozen)

Constructor

python
GuardrailResult(
    is_safe: bool,
    risk_level: RiskLevel,
    risk_type: str | None = None,
    details: dict[str, Any] = {},
    modified_data: dict[str, Any] | None = None,
)
FieldTypeDefaultDescription
is_safebool(required)Whether the data passed the guardrail check
risk_levelRiskLevel(required)Severity of the detected risk
risk_typestr | NoneNoneCategory of risk (e.g., "prompt_injection", "pii_leak")
detailsdict[str, Any]{}Additional metadata for logging and auditing
modified_datadict[str, Any] | NoneNoneOptionally sanitised version of the original data

Class Methods

safe()

python
@classmethod
def safe(cls) -> GuardrailResult

Create a result indicating the data is safe. Returns a GuardrailResult with is_safe=True and risk_level=RiskLevel.SAFE.

block()

python
@classmethod
def block(
    cls,
    risk_level: RiskLevel,
    risk_type: str,
    details: dict[str, Any] | None = None,
) -> GuardrailResult

Create a result indicating the data should be blocked.

ParameterTypeDefaultDescription
risk_levelRiskLevel(required)Severity of the detected risk
risk_typestr(required)Category of the detected risk
detailsdict[str, Any] | NoneNoneAdditional context for logging and auditing

Returns: GuardrailResult with is_safe=False and the given risk info.

Example

python
from exo.guardrail import GuardrailResult, RiskLevel

# Safe result
result = GuardrailResult.safe()
assert result.is_safe is True
assert result.risk_level == RiskLevel.SAFE

# Blocked result
result = GuardrailResult.block(
    risk_level=RiskLevel.HIGH,
    risk_type="prompt_injection",
    details={"matched_patterns": ["instruction_override"]},
)
assert result.is_safe is False

GuardrailError

Exception raised when a guardrail blocks an operation. Inherits from ExoError.

Constructor

python
GuardrailError(
    message: str,
    *,
    risk_level: RiskLevel,
    risk_type: str | None = None,
    details: dict[str, Any] | None = None,
)
ParameterTypeDefaultDescription
messagestr(required)Human-readable error message
risk_levelRiskLevel(required)The risk level that triggered the block
risk_typestr | NoneNoneCategory of the detected risk
detailsdict[str, Any] | NoneNoneAdditional context from the risk assessment

Attributes

AttributeTypeDescription
risk_levelRiskLevelThe risk level that triggered the block
risk_typestr | NoneCategory of the detected risk
detailsdict[str, Any]Additional context (defaults to {} if None)

Example

python
from exo.guardrail import GuardrailError, RiskLevel

try:
    raise GuardrailError(
        "Prompt injection detected",
        risk_level=RiskLevel.HIGH,
        risk_type="prompt_injection",
        details={"matched_patterns": ["instruction_override"]},
    )
except GuardrailError as e:
    print(e.risk_level)   # RiskLevel.HIGH
    print(e.risk_type)    # "prompt_injection"
    print(e.details)      # {"matched_patterns": ["instruction_override"]}

GuardrailBackend (ABC)

Abstract base class for pluggable guardrail detection backends. Subclasses implement analyze() to inspect data and return a RiskAssessment.

Abstract Methods

analyze()

python
async def analyze(self, data: dict[str, Any]) -> RiskAssessment

Analyze data for potential risks.

ParameterTypeDescription
datadict[str, Any]Arbitrary data to inspect (e.g., messages, tool arguments)

Returns: A RiskAssessment describing the detected risk level.

Example

python
from exo.guardrail import GuardrailBackend, RiskAssessment, RiskLevel

class MyBackend(GuardrailBackend):
    async def analyze(self, data: dict[str, Any]) -> RiskAssessment:
        text = str(data.get("messages", ""))
        if "forbidden" in text.lower():
            return RiskAssessment(
                has_risk=True,
                risk_level=RiskLevel.HIGH,
                risk_type="forbidden_content",
            )
        return RiskAssessment(has_risk=False, risk_level=RiskLevel.SAFE)