NestedRAG: A Hierarchical Retrieval Architecture for AI-Powered Video Call Analysis
Traditional RAG systems face a critical trade-off: larger chunks provide more context but include irrelevant information, while smaller chunks are more focused but may lack necessary context. NestedRAG solves this by dynamically selecting optimally-sized chunks through hierarchical semantic chunking and graph-based context exclusion.
View on GitHubInternal Testing Results
Compared to industry standard RAG implementations
Improved accuracy in answering user queries
The Problem with Traditional RAG
Traditional RAG systems face a critical trade-off: retrieving larger chunks provides more context but includes irrelevant information, while smaller chunks are more focused but may lack necessary context. This architecture was developed specifically to serve as an engine for the AI-powered Q&A bot for video calls analysis at Darwin AI Lab.
Key Features
Hierarchical Tree Structure
Documents are recursively split into semantic units, creating a tree where each branch represents nested text segments at different granularities.
Branch-Level Selection
During retrieval, only one datapoint per branch is selected, ensuring we get the most relevant segment from each hierarchical chain.
Graph-Based Context Exclusion
Uses NetworkX graph algorithms to identify and exclude ancestors/descendants, preventing overlapping or nested chunks in results.
Optimized Chunk Sizing
Each query gets optimally-sized chunks based on where relevance lies in the hierarchy, maximizing the relevant-to-total information ratio.
How It Works
Root (Full Document)
├── Level 1 Chunk A
│ ├── Level 2 Chunk A1
│ │ └── Level 3 Chunk A1a
│ └── Level 2 Chunk A2
└── Level 1 Chunk B
├── Level 2 Chunk B1
└── Level 2 Chunk B2 Search
Find the most semantically similar chunk via vector search
Identify
Get all ancestors and descendants of this chunk in the graph
Exclude
Mark these nodes as excluded for future searches
Repeat
Continue until desired number of chunks retrieved
Result
Return diverse chunks from different document branches
Application Area
This architecture was developed for use cases where the source data is long, unstructured text, e.g., conversation transcripts, long-form articles, etc. It's particularly well-suited for analyzing video call transcripts where context and relevance are crucial.
Code Example
from nested_rag import NestedRAG
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient
# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(
client=client,
collection_name="my_documents",
embedding=embeddings,
)
# Create NestedRAG instance
rag = NestedRAG(
vector_store=vector_store,
embedding=embeddings,
max_depth=6,
num_semantic_chunks=2,
)
# Ingest and retrieve
rag.ingest_document(document_text, "my_doc_1")
results = rag.retrieve("What is the main contribution?", limit=5) Ready to try it?
Clone the repository and start building with NestedRAG today.