What is Chunking?
Chunking is the process of splitting long documents or data sources into smaller retrievable units that preserve enough semantic context for embedding, indexing, retrieval, and grounded generation.
How It Works
Chunking is one of the highest-leverage design choices in RAG. It determines what unit the retriever can return to the model: a paragraph, a heading section, a page, a table, a code block, or a sliding text window. Good chunking preserves meaning and source traceability; bad chunking creates fragments that are too small to answer questions or too large to retrieve precisely. Production chunking should account for document structure, language, tables, code, permissions, citations, token budgets, and evaluation results rather than relying on a single fixed character count.
Key Characteristics
- Retrieval unit design: defines the smallest indexed unit a retriever can return
- Structure-aware when possible: respects headings, paragraphs, tables, lists, code blocks, and page boundaries
- Context preservation: balances precise retrieval with enough surrounding information to answer correctly
- Citation impact: affects whether generated answers can point back to reliable source spans
- Evaluation-dependent: optimal strategy depends on query patterns, corpus type, model context window, and latency budget
Common Use Cases
- Splitting product documentation into heading-aware sections for RAG
- Creating paragraph-level chunks for policy question answering
- Preserving code blocks and surrounding explanation in developer documentation
- Separating tables or forms into retrievable records with metadata
- Testing different chunking strategies to improve retrieval precision and recall
Example
Loading code...Frequently Asked Questions
Why is chunking important in RAG?
Chunking defines what the retriever can return. If chunks are poorly shaped, the system may retrieve incomplete evidence, mix unrelated topics, lose citation boundaries, or waste context-window budget.
Is fixed-size chunking enough?
It can be a useful baseline, but it is rarely optimal for production. Structure-aware chunking usually performs better for documents with headings, tables, lists, code, or policy sections.
How should chunking be evaluated?
Evaluate whether the expected evidence appears in top-k retrieval results, whether chunks contain enough context to answer, and whether citations remain specific and auditable.
Can chunking leak restricted data?
Yes. If chunks merge content across permission boundaries, retrieval may expose text a user should not see. Chunking and metadata assignment should respect access-control boundaries.