Skip to content
Navigation

Epic: 3 — RAG/Retrieval Pipeline Date: 2026-03-11

This document maps agent-core’s (openJiuwen) RAG pipeline to Exo’s exo-retrieval package, helping contributors familiar with either framework navigate both.


1. Agent-Core Overview

Agent-core’s RAG system lives in openjiuwen/core/rag/ and provides a full retrieval-augmented generation pipeline covering document processing, embedding, vector storage, multiple retrieval strategies, reranking, and knowledge graph construction.

Key Components

EmbeddingProvider ABC — Base class for turning text into dense vectors. Three built-in implementations:

ProviderBackend
OpenAIEmbeddingProviderOpenAI embeddings API
VertexEmbeddingProviderGoogle Vertex AI text-embedding
HTTPEmbeddingProviderAny HTTP endpoint returning vectors
python
# agent-core pattern
class EmbeddingProvider(ABC):
    @abstractmethod
    async def embed(self, text: str) -> list[float]: ...
    @abstractmethod
    async def embed_batch(self, texts: list[str]) -> list[list[float]]: ...
    @property
    @abstractmethod
    def dimension(self) -> int: ...

VectorStore ABC — Persistence layer for chunks and their embeddings. Supports add, search (by embedding), delete, and clear. Backends include pgvector (PostgreSQL) and ChromaDB.

5 Retriever Types:

RetrieverStrategy
VectorRetrieverDense semantic search via embedding similarity
SparseRetrieverBM25 keyword matching with TF-IDF scoring
HybridRetrieverReciprocal Rank Fusion of dense + sparse results
AgenticRetrieverMulti-round LLM-driven query refinement
KnowledgeGraphRetrieverGraph traversal expanding initial results via triples

Reranker ABC — Reorders retrieval results by relevance. The built-in LLMReranker sends passages to an LLM that returns a ranked ordering.

Document Processing Pipeline — Converts raw documents into indexed chunks:

  1. Parser — extracts text from source formats (plain text, Markdown, JSON, PDF)
  2. Chunker — splits text into overlapping or paragraph-aligned segments
  3. EmbeddingProvider — vectorizes each chunk
  4. VectorStore — persists chunks with their embeddings

QueryRewriter — LLM-based query expansion that adds synonyms, resolves pronouns from conversation history, and disambiguates terms before retrieval.

Knowledge Graph SupportTripleExtractor uses an LLM to extract subject–predicate–object triples from chunks, enabling the KnowledgeGraphRetriever to traverse relationships.


2. Exo Equivalent

Exo’s RAG system lives in the exo-retrieval package (packages/exo-retrieval/) as a separate installable package.

Mapping Summary

Agent-CoreExoNotes
EmbeddingProvider ABCEmbeddings ABCRenamed; same embed(), embed_batch(), dimension interface
OpenAIEmbeddingProviderOpenAIEmbeddingsUses httpx directly (no SDK dependency)
VertexEmbeddingProviderVertexEmbeddingsConfigurable location & output dimensionality
HTTPEmbeddingProviderHTTPEmbeddingsDot-path field extraction for flexible API shapes
VectorStore ABCVectorStore ABCSame interface: add(), search(), delete(), clear()
pgvector backendPgVectorStoreasyncpg-based; in backends/pgvector.py
ChromaDB backendChromaVectorStoreSupports persistent and ephemeral modes; in backends/chroma.py
InMemoryVectorStoreNew: pure-Python store for dev/testing
VectorRetrieverVectorRetrieverSame pattern: embed query → search store
SparseRetriever / BM25SparseRetrieverPure-Python BM25 with configurable k1 and b
HybridRetriever / RRFHybridRetrieverConcurrent dense+sparse with weighted RRF fusion
AgenticRetrieverAgenticRetrieverMulti-round with sufficiency threshold judging
KnowledgeGraphRetrieverGraphRetrieverRenamed; beam-search traversal with hop decay
Reranker ABCReranker ABCSame rerank() interface
LLMRerankerLLMRerankerJSON index parsing with fallback handling
QueryRewriterQueryRewriterHistory-aware rewriting via LLM
Document / Chunk typesDocument / ChunkPydantic models with metadata dicts
retrieval resultRetrievalResultImmutable: chunk, score, metadata
retrieval errorRetrievalErrorCarries operation and details
(inline in retrievers)Chunker ABCNew: extracted text chunking hierarchy
CharacterChunkerFixed-size with overlap
ParagraphChunkerSplits at blank lines
TokenChunkertiktoken-based token counting
(inline parsers)Parser ABCNew: extracted document parsing hierarchy
TextParser, MarkdownParser, JSONParser, PDFParserFormat-specific extractors
Triple extractionTripleExtractor + TripleLLM-based knowledge graph construction
(no equivalent)retrieve_tool() / index_tool()New: agent FunctionTool factories for retrieval

Architecture Difference

Agent-core bundles RAG inside the core framework. Exo extracts it into a standalone package (exo-retrieval) that depends only on exo-core for the FunctionTool type used by the agent integration helpers. All LLM calls go through exo.models.get_provider(), keeping the retrieval package model-agnostic.

All embedding and retriever methods are async-first. Chunkers and parsers are synchronous since they operate on local data.


3. Side-by-Side Code Examples

Building a Hybrid Retrieval Pipeline

Agent-core:

python
from openjiuwen.core.rag import (
    OpenAIEmbeddingProvider,
    VectorStore,
    VectorRetriever,
    SparseRetriever,
    HybridRetriever,
)

embeddings = OpenAIEmbeddingProvider(api_key="sk-...", model="text-embedding-3-small")
store = VectorStore.create("pgvector", dsn="postgresql://...")
dense = VectorRetriever(embeddings=embeddings, store=store)
sparse = SparseRetriever()
sparse.index(chunks)
hybrid = HybridRetriever(dense=dense, sparse=sparse, vector_weight=0.6)

results = await hybrid.retrieve("How does authentication work?", top_k=10)

Exo:

python
from exo.retrieval import (
    OpenAIEmbeddings,
    HybridRetriever,
    VectorRetriever,
    SparseRetriever,
)
from exo.retrieval.backends.pgvector import PgVectorStore

embeddings = OpenAIEmbeddings(api_key="sk-...", model="text-embedding-3-small")
store = PgVectorStore(dsn="postgresql://...", dimensions=1536)
await store.initialize()

dense = VectorRetriever(embeddings=embeddings, store=store)
sparse = SparseRetriever()
await sparse.index(chunks)
hybrid = HybridRetriever(
    vector_retriever=dense,
    sparse_retriever=sparse,
    vector_weight=0.6,
)

results = await hybrid.retrieve("How does authentication work?", top_k=10)

Document Ingestion Pipeline

Agent-core:

python
from openjiuwen.core.rag import (
    OpenAIEmbeddingProvider,
    VectorStore,
    Document,
)

embeddings = OpenAIEmbeddingProvider(api_key="sk-...")
store = VectorStore.create("chroma", path="./chroma_data")

doc = Document(id="doc-1", content=open("paper.txt").read())
chunks = doc.chunk(size=500, overlap=50)  # inline chunking
vectors = await embeddings.embed_batch([c.content for c in chunks])
await store.add(chunks, vectors)

Exo:

python
from exo.retrieval import (
    OpenAIEmbeddings,
    Document,
    CharacterChunker,
)
from exo.retrieval.backends.chroma import ChromaVectorStore

embeddings = OpenAIEmbeddings(api_key="sk-...")
store = ChromaVectorStore(collection_name="papers", path="./chroma_data")
chunker = CharacterChunker(chunk_size=500, chunk_overlap=50)

doc = Document(id="doc-1", content=open("paper.txt").read(), metadata={})
chunks = chunker.chunk(doc)
vectors = await embeddings.embed_batch([c.content for c in chunks])
await store.add(chunks, vectors)

Agentic Retrieval with Query Rewriting

Agent-core:

python
from openjiuwen.core.rag import (
    AgenticRetriever,
    QueryRewriter,
    VectorRetriever,
)

rewriter = QueryRewriter(model="gpt-4o")
base = VectorRetriever(embeddings=emb, store=store)
agentic = AgenticRetriever(
    base_retriever=base,
    rewriter=rewriter,
    model="gpt-4o",
    max_rounds=3,
)
results = await agentic.retrieve("What are the auth options?")

Exo:

python
from exo.retrieval import (
    AgenticRetriever,
    QueryRewriter,
    VectorRetriever,
)

rewriter = QueryRewriter(model="openai:gpt-4o")
base = VectorRetriever(embeddings=emb, store=store)
agentic = AgenticRetriever(
    base_retriever=base,
    rewriter=rewriter,
    model="openai:gpt-4o",  # provider:model format
    max_rounds=3,
    sufficiency_threshold=0.7,
)
results = await agentic.retrieve("What are the auth options?")

Giving an Agent Retrieval Tools

python
from exo.retrieval import (
    retrieve_tool,
    index_tool,
    VectorRetriever,
    CharacterChunker,
    OpenAIEmbeddings,
    InMemoryVectorStore,
)
from exo.agent import Agent

embeddings = OpenAIEmbeddings(api_key="sk-...")
store = InMemoryVectorStore()
retriever = VectorRetriever(embeddings=embeddings, store=store)
chunker = CharacterChunker()

agent = Agent(
    name="rag-agent",
    model="openai:gpt-4o",
    tools=[
        retrieve_tool(retriever),
        index_tool(chunker, store, embeddings),
    ],
)

4. Migration Table

Agent-Core PathExo ImportSymbol
openjiuwen.core.rag.EmbeddingProviderexo.retrieval.embeddings.EmbeddingsABC: embed(), embed_batch(), dimension
openjiuwen.core.rag.OpenAIEmbeddingProviderexo.retrieval.openai_embeddings.OpenAIEmbeddingshttpx-based, no SDK
openjiuwen.core.rag.VertexEmbeddingProviderexo.retrieval.vertex_embeddings.VertexEmbeddingsGCP Vertex AI
openjiuwen.core.rag.HTTPEmbeddingProviderexo.retrieval.http_embeddings.HTTPEmbeddingsGeneric HTTP endpoint
openjiuwen.core.rag.VectorStoreexo.retrieval.vector_store.VectorStoreABC: add(), search(), delete(), clear()
openjiuwen.core.rag.PgVectorStoreexo.retrieval.backends.pgvector.PgVectorStoreasyncpg + pgvector
openjiuwen.core.rag.ChromaStoreexo.retrieval.backends.chroma.ChromaVectorStorePersistent or ephemeral
(no equivalent)exo.retrieval.vector_store.InMemoryVectorStorePure-Python dev/test store
openjiuwen.core.rag.VectorRetrieverexo.retrieval.retriever.VectorRetrieverDense semantic search
openjiuwen.core.rag.SparseRetrieverexo.retrieval.sparse_retriever.SparseRetrieverBM25 keyword matching
openjiuwen.core.rag.HybridRetrieverexo.retrieval.hybrid_retriever.HybridRetrieverRRF fusion
openjiuwen.core.rag.AgenticRetrieverexo.retrieval.agentic_retriever.AgenticRetrieverMulti-round LLM-driven
openjiuwen.core.rag.KnowledgeGraphRetrieverexo.retrieval.graph_retriever.GraphRetrieverBeam-search graph traversal
openjiuwen.core.rag.Rerankerexo.retrieval.reranker.RerankerABC: rerank()
openjiuwen.core.rag.LLMRerankerexo.retrieval.reranker.LLMRerankerLLM-based passage ranking
openjiuwen.core.rag.QueryRewriterexo.retrieval.query_rewriter.QueryRewriterLLM query expansion
openjiuwen.core.rag.Documentexo.retrieval.types.DocumentPydantic model
openjiuwen.core.rag.Chunkexo.retrieval.types.ChunkImmutable chunk slice
(inline result type)exo.retrieval.types.RetrievalResultScored chunk with metadata
(inline error)exo.retrieval.types.RetrievalErroroperation + details
(inline chunking)exo.retrieval.chunker.ChunkerABC: chunk(document)
(inline chunking)exo.retrieval.chunker.CharacterChunkerFixed-size with overlap
(inline chunking)exo.retrieval.chunker.ParagraphChunkerBlank-line splitting
(inline chunking)exo.retrieval.chunker.TokenChunkertiktoken-based
(inline parsing)exo.retrieval.parsers.ParserABC: parse(source)
(inline parsing)exo.retrieval.parsers.TextParserPassthrough
(inline parsing)exo.retrieval.parsers.MarkdownParserStrip formatting
(inline parsing)exo.retrieval.parsers.JSONParserFlatten to key-paths
(inline parsing)exo.retrieval.parsers.PDFParserpymupdf extraction
openjiuwen.core.rag.TripleExtractorexo.retrieval.triple_extractor.TripleExtractorLLM knowledge-graph extraction
openjiuwen.core.rag.Tripleexo.retrieval.triple_extractor.TripleFrozen dataclass
(no equivalent)exo.retrieval.tools.retrieve_toolFunctionTool factory for retrieval
(no equivalent)exo.retrieval.tools.index_toolFunctionTool factory for indexing

All public symbols are also re-exported from exo.retrieval (the package __init__.py), so from exo.retrieval import VectorRetriever works as a convenience import.