exo.search
AI-powered search engine with deep research, citation verification, contradiction detection, confidence scoring, and multi-turn conversation support. Supports four quality modes (speed, balanced, q...
AI-powered search engine with deep research, citation verification, contradiction detection, confidence scoring, and multi-turn conversation support. Supports four quality modes (speed, balanced, quality, deep) with adaptive research strategies.
Installation
Part of the exo-ai monorepo:
git clone https://github.com/Midsphere-AI/exo-ai.git && cd exo-ai
uv syncOptional extras can be enabled from the packages/exo-search directory:
uv sync --extra server # FastAPI server
uv sync --extra embeddings # Embedding-based reranking (Gemini, Vertex AI, OpenAI)
uv sync --extra pdf # PDF content extractionModule Path
import exo.searchPublic Exports (8)
| Export | Type | Description |
|---|---|---|
search | async function | Run a search query and return the answer string with inline citations |
search_with_details | async function | Run a search query and return the full SearchResponse with sources, suggestions, verification, and confidence |
stream | function | Stream the search pipeline, yielding PipelineEvent, StreamEvent, and a final SearchResponse |
SearchConfig | dataclass | Configuration for models, backends, tuning parameters, and verification settings |
SearchResponse | BaseModel | Final structured response with answer, sources, suggestions, verification stats, contradictions, and confidence score |
PipelineEvent | BaseModel | Pipeline stage transition event (stage name + started/completed status) |
ResearchMode | StrEnum | Quality mode enum: SPEED, BALANCED, QUALITY, DEEP |
ConversationManager | class | Multi-turn conversation history tracker with context injection |
Import Patterns
# High-level search (most common)
from exo.search import search, search_with_details, stream
# Configuration
from exo.search import SearchConfig, ResearchMode
# Streaming events
from exo.search import PipelineEvent, SearchResponse
# Multi-turn conversation
from exo.search import ConversationManagerArchitecture
exo.search
__init__.py Public API: search(), search_with_details(), stream()
config.py SearchConfig, context budget computation
types.py All Pydantic models and enums
pipeline.py Orchestration: run_search_pipeline(), stream_search_pipeline()
conversation.py ConversationManager, format_chat_history()
focus_modes.py Focus mode to source-type mapping
server.py FastAPI server with /search, /stream, /chat endpoints
__main__.py CLI: python -m exo.search
agents/
classifier.py Query classification (skip/academic/social/sequential/complexity)
researcher.py Iterative tool-calling research loop
query_planner.py Adaptive hybrid research with sub-question decomposition
deep_researcher.py Sequential multi-step research for complex queries (DAG-based)
writer.py Citation-aware answer generation with claim-first writing
suggestion_generator.py Follow-up suggestion generation
tools/
searxng.py web_search(), academic_search(), social_search() via Serper/SearXNG
web_fetcher.py Page scraping, content enrichment, Jina Reader integration
embeddings.py Semantic reranking via cosine similarity (Gemini/Vertex/OpenAI)
citation_verifier.py Three-phase citation verification (keyword + LLM spot-check)
contradiction_detector.py Cross-source contradiction detection
confidence.py Heuristic confidence scoring (citation rate, authority, richness, coverage)
researcher_tools.py Agent tools: done() signal, reasoning_preamble()
jina.py Jina Reader/Search API integration
serper.py Serper (Google Search) API client
compute.py Shared computation utilitiesQuick Example
import asyncio
from exo.search import search
async def main():
answer = await search("What is quantum computing?")
print(answer)
asyncio.run(main())Detailed Examples
Search with Full Details
import asyncio
from exo.search import search_with_details
async def main():
result = await search_with_details(
"What are the latest advances in CRISPR gene editing?",
mode="quality",
)
print(result.answer)
print("\nSources:")
for i, source in enumerate(result.sources, 1):
print(f" [{i}] {source.title} ({source.url})")
print(f"\nConfidence: {result.confidence:.0%}")
if result.verification:
v = result.verification
print(f"Citations: {v.verified}/{v.total_citations} verified, {v.removed} removed")
if result.contradictions and result.contradictions.has_contradictions:
print(f"Contradictions found: {len(result.contradictions.contradictions)}")
print("\nSuggested follow-ups:")
for s in result.suggestions:
print(f" - {s}")
asyncio.run(main())Streaming Pipeline
import asyncio
from exo.search import stream, PipelineEvent, SearchResponse
from exo.types import TextEvent
async def main():
async for event in stream("latest AI news", mode="balanced"):
if isinstance(event, PipelineEvent):
print(f"[{event.stage}] {event.status}")
elif isinstance(event, TextEvent):
print(event.text, end="", flush=True)
elif isinstance(event, SearchResponse):
print(f"\n\n{len(event.sources)} sources cited")
asyncio.run(main())Multi-turn Conversation
import asyncio
from exo.search import search_with_details, ConversationManager
async def main():
conv = ConversationManager(max_turns=10)
# First query
result = await search_with_details(
"What is transformer architecture?",
chat_history=conv.turns,
)
conv.add_turn("What is transformer architecture?", result.answer)
print(result.answer[:200])
# Follow-up (classifier resolves "it" from history)
result = await search_with_details(
"How does it compare to RNNs?",
chat_history=conv.turns,
)
conv.add_turn("How does it compare to RNNs?", result.answer)
print(result.answer[:200])
asyncio.run(main())Custom Configuration
from exo.search import search, SearchConfig
config = SearchConfig(
model="anthropic:claude-sonnet-4",
fast_model="openai:gpt-4o-mini",
max_results=15,
max_iterations=10,
max_writer_sources=20,
llm_verification=True,
max_revision_rounds=3,
serper_api_key="sk-...", # Google Search via Serper
jina_api_key="jina_...", # Full-page content extraction
)
answer = await search("complex multi-step question", mode="quality", config=config)Research Modes
| Mode | Iterations | Enrichment | Verification | Use Case |
|---|---|---|---|---|
speed | 2 | None | Keyword only | Quick factual lookups, simple questions |
balanced | 6 | 80% of sources | Keyword only | General research, default for most queries |
quality | 25 | 90% of sources | Keyword + LLM | In-depth research, complex topics |
deep | 25 | 90% of sources | Keyword + LLM + contradiction detection | Multi-step questions requiring sequential reasoning |
The pipeline automatically routes to deep sequential research when the classifier detects queries that require finding information in step N before knowing what to search for in step N+1.
Pipeline Stages
The search pipeline executes these stages in order:
-
Classifier — Analyzes the query to determine search intent, generates sub-questions, detects if sequential research is needed, and rewrites follow-up queries using conversation context.
-
Researcher — Iterative tool-calling loop that searches the web, academic databases, and social media. Uses adaptive query planning with sub-question decomposition. Deep mode uses a DAG-based multi-step research planner.
-
Reranking — Semantic reranking of search results using embedding cosine similarity (Gemini, Vertex AI, or OpenAI). Falls back to keyword overlap when no embedding API is available.
-
Enrichment — Fetches full page content for top results via Jina Reader or direct scraping. Budget-aware: computes how many sources and characters fit in the model’s context window.
-
Writer — Generates the cited answer. Quality/deep modes use claim-first writing: extract claims from sources, then compose the answer grounded in verified claims.
-
Citation Verification — Three-phase post-hoc verification: keyword filter, LLM spot-check for passed citations, LLM second-chance for failed citations. Removes unsupported citations.
-
Revision Loop — If too many citations were removed (above
revision_threshold), the answer is revised and re-verified, up tomax_revision_rounds. -
Contradiction Detection — Cross-checks factual claims against source content to surface genuine disagreements (quality/deep modes only).
-
Confidence Scoring — Heuristic 0—1 score from four components: citation verification rate (40%), source authority (20%), content richness (20%), sub-question coverage (20%).
-
Suggestions — Generates follow-up question suggestions (runs concurrently with writing).
Configuration Reference
SearchConfig fields with their defaults and environment variable overrides:
| Field | Type | Default | Env Var | Description |
|---|---|---|---|---|
model | str | "openai:gpt-4o" | EXO_SEARCH_MODEL | Primary LLM for research and writing |
fast_model | str | "openai:gpt-4o-mini" | EXO_SEARCH_FAST_MODEL | Fast LLM for classification and suggestions |
embedding_model | str | "gemini-embedding-2-preview" | EXO_SEARCH_EMBEDDING_MODEL | Model for semantic reranking |
searxng_url | str | "http://localhost:8888" | SEARXNG_URL | SearXNG instance URL (fallback search backend) |
serper_api_key | str | "" | SERPER_API_KEY | Serper API key for Google Search (preferred backend) |
jina_api_key | str | "" | JINA_API_KEY | Jina Cloud API key for page content extraction |
jina_reader_url | str | "http://127.0.0.1:3000" | JINA_READER_URL | Local Jina Reader URL |
max_results | int | 10 | — | Maximum search results per query |
max_iterations | int | None | None (auto) | — | Override per-mode iteration limit |
max_writer_sources | int | 15 | — | Cap on sources passed to writer |
max_writer_words | int | None | None | — | Override quality mode 2000-word target |
max_content_chars | int | 10000 | — | Max characters per page for enrichment |
max_deep_research_steps | int | 7 | — | Max steps in sequential research plan |
llm_verification | bool | False | — | Enable LLM-based claim verification |
claim_first_writing | bool | True | — | Use claim-first writing in quality/deep modes |
max_revision_rounds | int | 2 | — | Max write-verify-revise rounds |
revision_threshold | float | 0.3 | — | Revise if removed/total citations exceeds this |
context_window_tokens | int | None | None (auto) | — | Override model context window size |
system_instructions | str | "" | — | Custom instructions injected into the writer prompt |
CLI Usage
# Single query
python -m exo.search "What is quantum computing?"
# Quality mode
python -m exo.search --quality quality "CRISPR gene editing"
# Streaming output
python -m exo.search --stream "latest AI news"
# Interactive chat
python -m exo.search --chat
# Start FastAPI server
python -m exo.search --serve --port 8000
# With verbose logging
python -m exo.search -v "test query"Server Endpoints
When running with --serve or exo-search --serve, the FastAPI server exposes:
| Method | Path | Description |
|---|---|---|
GET | /search | Run a search query, returns SearchResponse JSON |
GET | /stream | Stream search pipeline as server-sent events |
POST | /chat | Multi-turn chat with session management |
Key Types
SearchResponse
class SearchResponse(BaseModel):
answer: str # Cited answer text with [N] markers
sources: list[Source] # Cited sources (title, url, content)
suggestions: list[str] # Follow-up question suggestions
query: str # Effective query (after rewriting)
mode: str # Research mode used
verification: CitationVerification | None # Citation verification stats
contradictions: ContradictionReport | None # Cross-source contradictions
confidence: float | None # 0-1 confidence score
confidence_breakdown: dict | None # Component-level breakdownPipelineEvent
class PipelineEvent(BaseModel):
type: str = "pipeline"
stage: str # "classifier", "researcher", "deep_research", "enrichment",
# "writer", "revision", "suggestions"
status: str # "started", "completed"
message: str # Optional detail messageCitationVerification
class CitationVerification(BaseModel):
total_citations: int # Total [N] markers found
verified: int # Citations confirmed by source content
removed: int # Citations removed as unsupported
flagged: int # Citations flagged but kept
llm_verified: int # Citations verified via LLM spot-check
revision_count: int # Number of revision rounds performedClassification
class Classification(BaseModel):
skip_search: bool # True for greetings, math, etc.
personal_search: bool # User-specific queries
academic_search: bool # Scholarly content needed
discussion_search: bool # Social/forum content needed
show_weather_widget: bool # Weather intent detected
show_stock_widget: bool # Stock/finance intent detected
show_calculation_widget: bool # Calculation intent detected
requires_sequential_research: bool # Multi-step research needed
estimated_complexity: str # "simple", "moderate", "complex"ResearchMode
class ResearchMode(StrEnum):
SPEED = "speed"
BALANCED = "balanced"
QUALITY = "quality"
DEEP = "deep"