Rails System Design — Typed Agent Lifecycle Guards
Status: Proposed Epic: 6 — Typed Agent Rails Date: 2026-03-10
Status: Proposed Epic: 6 — Typed Agent Rails Date: 2026-03-10
1. Motivation
Exo’s HookManager provides lifecycle interception via async callables
registered at HookPoint enum values. While functional, it lacks:
- Typed event inputs — hooks receive untyped
**kwargs, making it easy to mishandle data or miss available fields. - Priority ordering — hooks execute in registration order only.
- Cross-hook state — no mechanism for hooks in the same invocation to share intermediate state.
- Retry mechanism — a failing hook aborts the run; there is no way for a guard to request a retry with a delay.
Agent-core’s rail system (openjiuwen/core/single_agent/rail/) addresses all
four gaps with 10 lifecycle events, typed input models, RetryRequest,
priority-based execution, and a shared extra dict.
This document proposes porting the rail concept into Exo as an extension of the existing hook system, not a replacement.
2. Key Decision: Rails Extend Hooks
Option A — Rails as a parallel system (rejected)
A separate RailManager with its own lifecycle, independent of HookManager.
This duplicates the dispatch logic and forces consumers to choose between
hooks and rails.
Option B — Rails extend hooks (chosen)
Rails are a new optional typed interface layered on top of HookManager:
Railis an abstract class with ahandle(ctx: RailContext)method.RailManagercollects rails, sorts by priority, and exposes an async callable with theHooksignature.RailManagerregisters itself as a single hook on the agent’sHookManagerfor each relevantHookPoint.- Existing
hook_manager.add(HookPoint.X, my_func)calls continue to work unchanged — rails are just another hook in the list.
Why Option B:
- Zero breaking changes to the existing hook API.
- Rails and plain hooks coexist on the same
HookManager. HookManagerremains the single source of truth for lifecycle dispatch.- Rails are opt-in — agents without rails behave identically to today.
3. Typed Event Input Models
Three Pydantic models capture the data available at each lifecycle point. Models are not frozen because hooks/rails may need to mutate inputs (e.g., redacting messages before an LLM call).
class InvokeInputs(BaseModel):
"""Data for START / FINISHED events."""
input: str
messages: list[Any] | None = None
result: Any | None = None
class ModelCallInputs(BaseModel):
"""Data for PRE_LLM_CALL / POST_LLM_CALL events."""
messages: list[Any]
tools: list[dict] | None = None
response: Any | None = None
usage: Any | None = None
class ToolCallInputs(BaseModel):
"""Data for PRE_TOOL_CALL / POST_TOOL_CALL events."""
tool_name: str
arguments: dict[str, Any]
result: Any | None = None
metadata: Any | None = NoneA RailContext model bundles the agent reference, event type, typed inputs,
and the shared extra dict:
class RailContext(BaseModel):
"""Context passed to each rail's handle() method."""
model_config = {"arbitrary_types_allowed": True}
agent: Any # Agent instance (Any to avoid circular imports)
event: HookPoint
inputs: InvokeInputs | ModelCallInputs | ToolCallInputs
extra: dict[str, Any] # Shared cross-rail stateMapping HookPoints to Input Models
| HookPoint | Input Model | Key fields populated |
|---|---|---|
START | InvokeInputs | input, messages |
FINISHED | InvokeInputs | input, result |
ERROR | InvokeInputs | input, result (exception) |
PRE_LLM_CALL | ModelCallInputs | messages, tools |
POST_LLM_CALL | ModelCallInputs | messages, response, usage |
PRE_TOOL_CALL | ToolCallInputs | tool_name, arguments |
POST_TOOL_CALL | ToolCallInputs | tool_name, result |
4. Rail ABC
class RailAction(StrEnum):
CONTINUE = "continue" # Proceed normally
SKIP = "skip" # Skip this step (e.g., skip a tool call)
RETRY = "retry" # Retry the operation (with RetryRequest)
ABORT = "abort" # Abort the agent run
@dataclass
class RetryRequest:
delay: float = 0.0
max_retries: int = 1
reason: str = ""
class RailAbortError(ExoError):
"""Raised when a rail returns ABORT."""
class Rail(ABC):
name: str
priority: int = 50 # Lower = runs first
@abstractmethod
async def handle(self, ctx: RailContext) -> RailAction | None:
"""Process the event. Return None or CONTINUE to proceed."""
...Priority Ordering
Rails are sorted by priority ascending (lower value = higher priority).
Default priority is 50. Security rails (guardrails) should use priority
10-20; logging/observability rails should use 80-90.
Action Semantics
| Action | Behavior |
|---|---|
None / CONTINUE | Proceed to next rail, then normal execution |
SKIP | Stop remaining rails, skip the operation |
RETRY | Attach a RetryRequest to context, re-execute the operation |
ABORT | Raise RailAbortError immediately |
5. RailManager
class RailManager:
def __init__(self) -> None:
self._rails: list[Rail] = []
def add(self, rail: Rail) -> None: ...
def remove(self, rail: Rail) -> None: ...
def clear(self) -> None: ...
async def run(self, event: HookPoint, **data: Any) -> RailAction:
"""Build RailContext, run rails in priority order, return first
non-CONTINUE action (or CONTINUE if all pass)."""
...Cross-Rail State
Each run() invocation creates a fresh extra: dict[str, Any] on the
RailContext. All rails in that invocation share the same dict, allowing
upstream rails to pass data to downstream ones.
Example: a rate-limit rail sets extra["rate_limit_remaining"] = 5 and a
logging rail reads it.
HookManager Compatibility
RailManager can be registered as a plain hook on HookManager:
agent.hook_manager.add(HookPoint.PRE_LLM_CALL, rail_manager.run)Because RailManager.run matches the Hook signature
(async (**data) -> None), it integrates seamlessly. The RailManager
internally handles priority ordering, typed inputs, and action dispatch.
When a RailManager is registered this way:
- It runs alongside any other hooks at that
HookPoint. - Registration order determines when the
RailManagerruns relative to other hooks (typically registered first). - If a rail returns
ABORT, theRailAbortErrorpropagates throughHookManageras any exception would.
6. Agent Integration
class Agent:
def __init__(
self,
*,
# ... existing params ...
rails: list[Rail] | None = None, # NEW
) -> None:
# ... existing init ...
# Rails (optional typed lifecycle guards)
if rails:
self._rail_manager = RailManager()
for rail in rails:
self._rail_manager.add(rail)
# Register rail_manager.run as a hook for all HookPoints
for point in HookPoint:
self.hook_manager.add(point, self._rail_manager.run)Backward Compatibility Guarantees
-
No rails = no change. If
railsisNone(default), noRailManageris created, no hooks are added. Behavior is identical to the current implementation. -
Existing hooks preserved. Rails register via
hook_manager.add(), so they append to the hook list. Previously registered hooks continue to run in their original order. -
Existing tests pass unchanged. The
Agentconstructor signature is additive-only (new optional keyword arg). No existing call sites break. -
Serialization. Agents with rails cannot be serialized (same as agents with hooks today). This is consistent with existing behavior.
7. Event Flow Diagram
Agent.run(input)
│
├─ hook_manager.run(START, ...)
│ ├─ [plain hooks]
│ └─ [RailManager.run → sorted rails → action]
│
├─ Agent._call_llm()
│ ├─ hook_manager.run(PRE_LLM_CALL, ...)
│ │ ├─ [plain hooks]
│ │ └─ [RailManager.run → sorted rails → action]
│ │ ├─ CONTINUE → proceed to LLM call
│ │ ├─ SKIP → skip this LLM call
│ │ ├─ RETRY → re-attempt with delay
│ │ └─ ABORT → raise RailAbortError
│ │
│ ├─ provider.complete(...)
│ │
│ └─ hook_manager.run(POST_LLM_CALL, ...)
│ ├─ [plain hooks]
│ └─ [RailManager.run → sorted rails]
│
├─ Agent._execute_tools()
│ ├─ hook_manager.run(PRE_TOOL_CALL, ...)
│ │ └─ [RailManager.run → sorted rails → action]
│ │ ├─ ABORT → raise RailAbortError
│ │ └─ SKIP → skip tool execution
│ │
│ ├─ tool.execute(...)
│ │
│ └─ hook_manager.run(POST_TOOL_CALL, ...)
│ └─ [RailManager.run → sorted rails]
│
└─ hook_manager.run(FINISHED, ...)
├─ [plain hooks]
└─ [RailManager.run → sorted rails]8. File Layout
All new files live in packages/exo-core/src/exo/:
| File | Contents |
|---|---|
rail_types.py | InvokeInputs, ModelCallInputs, ToolCallInputs, RailContext |
rail.py | Rail, RailAction, RetryRequest, RailAbortError, RailManager |
Tests in packages/exo-core/tests/:
| File | Contents |
|---|---|
test_rail_types.py | Model creation and validation |
test_rail.py | Rail actions, priority ordering, cross-rail state, abort propagation |
9. Open Questions
-
RETRY semantics in _call_llm. The agent already has
max_retrieson_call_llm. Should rail-requested retries decrement the same counter or have their own? Recommendation: Separate counter viaRetryRequest.max_retries. -
SKIP semantics for LLM calls. Skipping an LLM call means no response is generated. Should this return a sentinel output or raise? Recommendation: Return
AgentOutput(text="", tool_calls=[])— the loop will terminate since there are no tool calls. -
Exception events. Agent-core has
ON_MODEL_EXCEPTIONandON_TOOL_EXCEPTION. Exo hasERRORbut it is not currently fired. Recommendation: Defer exception-specific events to a follow-up story; wire upERRORas a general exception hook first.
10. Summary
- Rails extend the existing hook system — they do not replace it.
RailManagerregisters as a single hook onHookManager.- Typed inputs provide structured data; priority ordering ensures
deterministic execution;
extradict enables cross-rail coordination. - Zero breaking changes to existing APIs, tests, or behavior.
- Implementation spans 2 new files (~400 lines total) + tests.