Open Source • January 12, 2026 • By Darwin AI Lab

NestedRAG: A Hierarchical Retrieval Architecture for AI-Powered Video Call Analysis

Traditional RAG systems face a critical trade-off: larger chunks provide more context but include irrelevant information, while smaller chunks are more focused but may lack necessary context. NestedRAG solves this by dynamically selecting optimally-sized chunks through hierarchical semantic chunking and graph-based context exclusion.

View on GitHub

01 — Results

Internal Testing Results

22% Higher Relevancy Score

Compared to industry standard RAG implementations

19% Better Answer Correctness

Improved accuracy in answering user queries

02 — Background

The Problem with Traditional RAG

Traditional RAG systems face a critical trade-off: retrieving larger chunks provides more context but includes irrelevant information, while smaller chunks are more focused but may lack necessary context. This architecture was developed specifically to serve as an engine for the AI-powered Q&A bot for video calls analysis at Darwin AI Lab.

03 — Features

Key Features

Hierarchical Tree Structure

Documents are recursively split into semantic units, creating a tree where each branch represents nested text segments at different granularities.

Branch-Level Selection

During retrieval, only one datapoint per branch is selected, ensuring we get the most relevant segment from each hierarchical chain.

Graph-Based Context Exclusion

Uses NetworkX graph algorithms to identify and exclude ancestors/descendants, preventing overlapping or nested chunks in results.

Optimized Chunk Sizing

Each query gets optimally-sized chunks based on where relevance lies in the hierarchy, maximizing the relevant-to-total information ratio.

04 — Architecture

How It Works

Root (Full Document)
├── Level 1 Chunk A
│   ├── Level 2 Chunk A1
│   │   └── Level 3 Chunk A1a
│   └── Level 2 Chunk A2
└── Level 1 Chunk B
    ├── Level 2 Chunk B1
    └── Level 2 Chunk B2

Search

Find the most semantically similar chunk via vector search

Identify

Get all ancestors and descendants of this chunk in the graph

Exclude

Mark these nodes as excluded for future searches

Repeat

Continue until desired number of chunks retrieved

Result

Return diverse chunks from different document branches

05 — Use Cases

Application Area

This architecture was developed for use cases where the source data is long, unstructured text, e.g., conversation transcripts, long-form articles, etc. It's particularly well-suited for analyzing video call transcripts where context and relevance are crucial.

06 — Quick Start

Code Example

Python

from nested_rag import NestedRAG
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient

# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
client = QdrantClient(":memory:")

vector_store = QdrantVectorStore(
    client=client,
    collection_name="my_documents",
    embedding=embeddings,
)

# Create NestedRAG instance
rag = NestedRAG(
    vector_store=vector_store,
    embedding=embeddings,
    max_depth=6,
    num_semantic_chunks=2,
)

# Ingest and retrieve
rag.ingest_document(document_text, "my_doc_1")
results = rag.retrieve("What is the main contribution?", limit=5)

Ready to try it?

Clone the repository and start building today.