Deep Agent Toolkit Design — Built-in Agent Tools
Status: Proposed Epic: 8 — Deep Agent Toolkit Date: 2026-03-10
Status: Proposed Epic: 8 — Deep Agent Toolkit Date: 2026-03-10
1. Motivation
Exo agents can invoke arbitrary tools via the Tool ABC, but there is
no curated toolkit for autonomous task execution — the pattern where an
agent independently reads files, runs commands, executes code, and tracks
progress without human intervention.
Agent-core’s deepagents/tools/ provides four built-in tools for this
purpose:
| Agent-Core Tool | Purpose |
|---|---|
TodoTool | CRUD todo list with status tracking |
FileSystemTool | File I/O (read, write, edit, glob, list, grep) |
ShellTool | Shell command execution |
CodeTool | Python code execution in isolated environment |
Exo already has equivalents for three of these, spread across two packages:
| Capability | Exo Package | Module |
|---|---|---|
| Todo / Planning | exo-context | context/tools.py — get_planning_tools() |
| File I/O | exo-sandbox | sandbox/tools.py — FilesystemTool |
| Shell | exo-sandbox | sandbox/tools.py — TerminalTool |
| Code execution | — | Not yet implemented |
The gap is a CodeExecutionTool for sandboxed Python execution, plus a
convenience API that bundles all four tool categories into a single
get_deep_agent_tools() call. Additionally, the existing FilesystemTool
lacks agent-core’s edit, glob, and grep operations, and the
exo-context planning tools lack status tracking (PENDING /
IN_PROGRESS / COMPLETED).
2. Key Decision: Extend Existing Packages, No New Package
Option A — New package exo-toolkit (rejected)
A separate package adds dependency overhead and forces a third import for what is fundamentally a composition of existing tools. The tools themselves belong logically to their current packages.
Option B — All tools in exo-core (rejected)
Moving sandbox tools into core would create a circular dependency (core
depends on sandbox’s Sandbox for execution context). It also collapses
the separation of concerns between core abstractions and execution
environments.
Option C — Extend exo-sandbox + exo-context, aggregate in core (chosen)
- Code execution → new
CodeExecutionToolinexo-sandbox/tools.py(alongsideFilesystemToolandTerminalTool), backed byLocalSandbox. - File tool extensions (edit, glob, grep) → extend
FilesystemToolinexo-sandbox/tools.py. - Todo status tracking → extend planning tools in
exo-context/tools.pywith status field. - Aggregator →
get_deep_agent_tools()function inexo-core/src/exo/toolkit.pythat imports and bundles all four tool categories.
Why Option C:
- Keeps tools co-located with their execution environments (sandbox tools with sandbox, context tools with context).
exo-corealready depends on both packages, so the aggregator introduces no new dependency edges.- Follows the established pattern:
get_planning_tools(),get_context_tools(),get_file_tools()are already per-package convenience functions.
3. Component Design
3.1 CodeExecutionTool (new)
class CodeExecutionTool(Tool):
"""Execute Python code in a sandboxed environment."""
name = "execute_code"
description = "Execute Python code and return stdout, stderr, and result."
parameters = {
"code": {"type": "string", "description": "Python code to execute"},
"timeout": {"type": "number", "description": "Max seconds (default 30)"},
}
def __init__(
self,
*,
sandbox: Sandbox | None = None,
allowed_imports: frozenset[str] | None = None,
timeout: float = 30.0,
) -> None: ...
async def execute(self, **kwargs) -> dict[str, Any]:
# Returns: {"stdout": str, "stderr": str, "result": str | None,
# "success": bool, "execution_time_ms": float}Execution strategy:
- If a
Sandboxinstance is provided, delegate tosandbox.run_tool(). - Otherwise, use
asyncio.create_subprocess_exec()with a temporary script file, import allowlist enforcement via AST inspection, andasyncio.wait_for()timeout. - The tool does not manage sandbox lifecycle — the caller (agent or builder) is responsible for starting/stopping sandboxes.
3.2 FilesystemTool Extensions
Add three new actions to the existing FilesystemTool:
| Action | Parameters | Description |
|---|---|---|
edit | path, old_text, new_text | Replace first occurrence of old_text |
glob | pattern, path (optional) | Find files matching glob pattern |
grep | pattern, path (optional) | Search file contents with regex |
These mirror agent-core’s FileSystemTool operations and follow the
existing action-dispatch pattern in FilesystemTool.execute().
3.3 Planning Tool Status Extension
Extend the todo state model in exo-context/tools.py:
# Current: {"item": str, "done": bool}
# Extended: {"item": str, "done": bool, "status": str}
# status ∈ {"pending", "in_progress", "completed"}add_todosetsstatus = "pending"by default.- New
update_todo_status(ctx, index, status)tool transitions status. complete_todosets bothdone = Trueandstatus = "completed".- Single-active constraint: at most one item may be
"in_progress". - Backward compatible: existing callers that only check
doneare unaffected.
3.4 Toolkit Aggregator
# exo-core/src/exo/toolkit.py
def get_deep_agent_tools(
*,
sandbox: Sandbox | None = None,
workspace: list[str] | None = None,
allowed_imports: frozenset[str] | None = None,
terminal_blacklist: frozenset[str] | None = None,
terminal_timeout: float = 30.0,
code_timeout: float = 30.0,
include_planning: bool = True,
include_filesystem: bool = True,
include_terminal: bool = True,
include_code: bool = True,
) -> list[Tool]:
"""Return a curated set of tools for autonomous agent execution."""This function assembles tools from both packages, passing through configuration. Agents opt in with:
agent = Agent(
name="deep-agent",
tools=[*get_deep_agent_tools(sandbox=my_sandbox), custom_tool],
)4. Security Model
4.1 Filesystem Security
- Path traversal prevention:
FilesystemTool._validate_path()resolves symlinks and rejects paths outsideallowed_directories. This already exists and applies to all new actions (edit, glob, grep). - Glob restriction: patterns are resolved relative to allowed directories only; absolute patterns that escape are rejected.
- Write protection: optional
read_only=Truemode disables write and edit actions.
4.2 Terminal Security
- Command blacklist:
TerminalToolblocks destructive commands (rm,mkfs,dd,shutdown,kill, etc.) via configurableblacklistparameter. Default blacklist covers 14 dangerous commands. - Timeout enforcement:
asyncio.wait_for()kills long-running processes. Default 30s, configurable per-tool. - No shell expansion: commands run via
asyncio.create_subprocess_exec()(notshell=True) to prevent injection.
4.3 Code Execution Security
- Import allowlist: AST inspection of the code before execution blocks
imports outside the allowed set. Default allowlist includes safe stdlib
modules (
math,json,datetime,re,collections,itertools,functools,typing,dataclasses,pathlib,os.path). - No network by default:
socket,urllib,http,requestsare excluded from the default allowlist. - Subprocess isolation: code runs in a separate process with its own memory space; parent process is unaffected by crashes.
- Timeout: hard timeout via
asyncio.wait_for()on the subprocess. - Sandbox delegation: when a
Sandbox(e.g.,KubernetesSandbox) is provided, all isolation is delegated to the sandbox environment, which may enforce container-level restrictions.
4.4 Planning Tools Security
- No security concerns — planning tools operate on in-memory state only, with no I/O side effects.
5. File Layout
packages/exo-sandbox/src/exo/sandbox/
├── tools.py # + CodeExecutionTool, FilesystemTool extensions
└── ...
packages/exo-context/src/exo/context/
├── tools.py # + update_todo_status, status field
└── ...
packages/exo-core/src/exo/
├── toolkit.py # NEW — get_deep_agent_tools() aggregator
└── ...
packages/exo-sandbox/tests/
├── test_sandbox_tools.py # + tests for CodeExecutionTool, edit/glob/grep
└── ...
packages/exo-context/tests/
├── test_context_tools.py # + tests for todo status tracking
└── ...
packages/exo-core/tests/
├── test_toolkit.py # NEW — aggregator tests
└── ...6. Backward Compatibility
- FilesystemTool: new actions are additive. Existing
read,write,listactions are unchanged. Theparametersschema adds new enum values — callers that only use existing actions are unaffected. - TerminalTool: no changes to the existing interface.
- Planning tools: the
statusfield is added with a default value. Existing code that checkstodo["done"]continues to work. Thecomplete_todofunction continues to setdone = True. - Public API:
get_deep_agent_tools()is a new export. No existing exports are modified or removed. - Agent class: no changes. Tools are passed via the existing
tools=[]parameter.
7. Implementation Plan
| Story | Title | Est. LOC |
|---|---|---|
| US-059 | CodeExecutionTool implementation | ~150 |
| US-060 | FilesystemTool edit/glob/grep | ~120 |
| US-061 | Planning tool status tracking | ~60 |
| US-062 | Toolkit aggregator | ~80 |
| US-063 | Integration tests | ~120 |
8. Open Questions
-
Code execution output capture: should
CodeExecutionToolcapture the return value of the last expression (like a REPL) or only stdout/stderr? Recommendation: capture both — useexec()for statements andeval()for expressions, with stdout capture viacontextlib.redirect_stdout. -
File size limits: should
FilesystemTool.readandgrepenforce maximum file size? Recommendation: yes, configurable with a sensible default (e.g., 1 MB for read, 10 MB for grep). This prevents agents from accidentally loading huge files into context. -
Sandbox requirement for CodeExecutionTool: should code execution require an explicit
Sandboxinstance or allow bare subprocess execution? Recommendation: allow both, with a warning log when running without a sandbox in production mode.
9. Summary
The deep agent toolkit extends Exo’s existing tool infrastructure rather than introducing a parallel system:
- CodeExecutionTool joins
FilesystemToolandTerminalToolinexo-sandbox, providing sandboxed Python execution. - FilesystemTool gains
edit,glob, andgrepactions to match agent-core’s file tool capabilities. - Planning tools gain status tracking for autonomous task management.
get_deep_agent_tools()inexo-coreaggregates all tools behind a single convenience function.
Security is enforced at every layer: path validation for files, command
blacklists for terminal, import allowlists for code, and timeouts
everywhere. When a Sandbox is provided, container-level isolation adds
a further defense boundary.