All Posts

NestedRAG: A Hierarchical Retrieval Architecture for AI-Powered Video Call Analysis

Traditional RAG systems face a critical trade-off: larger chunks provide more context but include irrelevant information, while smaller chunks are more focused but may lack necessary context. NestedRAG solves this by dynamically selecting optimally-sized chunks through hierarchical semantic chunking and graph-based context exclusion.

View on GitHub

Internal Testing Results

22% Higher Relevancy Score

Compared to industry standard RAG implementations

19% Better Answer Correctness

Improved accuracy in answering user queries

The Problem with Traditional RAG

Traditional RAG systems face a critical trade-off: retrieving larger chunks provides more context but includes irrelevant information, while smaller chunks are more focused but may lack necessary context. This architecture was developed specifically to serve as an engine for the AI-powered Q&A bot for video calls analysis at Darwin AI Lab.

Key Features

01

Hierarchical Tree Structure

Documents are recursively split into semantic units, creating a tree where each branch represents nested text segments at different granularities.

02

Branch-Level Selection

During retrieval, only one datapoint per branch is selected, ensuring we get the most relevant segment from each hierarchical chain.

03

Graph-Based Context Exclusion

Uses NetworkX graph algorithms to identify and exclude ancestors/descendants, preventing overlapping or nested chunks in results.

04

Optimized Chunk Sizing

Each query gets optimally-sized chunks based on where relevance lies in the hierarchy, maximizing the relevant-to-total information ratio.

How It Works

Root (Full Document)
├── Level 1 Chunk A
│   ├── Level 2 Chunk A1
│   │   └── Level 3 Chunk A1a
│   └── Level 2 Chunk A2
└── Level 1 Chunk B
    ├── Level 2 Chunk B1
    └── Level 2 Chunk B2
1

Search

Find the most semantically similar chunk via vector search

2

Identify

Get all ancestors and descendants of this chunk in the graph

3

Exclude

Mark these nodes as excluded for future searches

4

Repeat

Continue until desired number of chunks retrieved

5

Result

Return diverse chunks from different document branches

Application Area

This architecture was developed for use cases where the source data is long, unstructured text, e.g., conversation transcripts, long-form articles, etc. It's particularly well-suited for analyzing video call transcripts where context and relevance are crucial.

Code Example

Python
from nested_rag import NestedRAG
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient

# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
client = QdrantClient(":memory:")

vector_store = QdrantVectorStore(
    client=client,
    collection_name="my_documents",
    embedding=embeddings,
)

# Create NestedRAG instance
rag = NestedRAG(
    vector_store=vector_store,
    embedding=embeddings,
    max_depth=6,
    num_semantic_chunks=2,
)

# Ingest and retrieve
rag.ingest_document(document_text, "my_doc_1")
results = rag.retrieve("What is the main contribution?", limit=5)

Ready to try it?

Clone the repository and start building with NestedRAG today.

View on GitHub