What is Chunk Size?
Chunk Size is the token, character, or structural length chosen for each document unit indexed in a retrieval-augmented generation system.
How It Works
Chunk size controls how much evidence a retriever returns at once. Smaller chunks often improve precision because each result focuses on a narrow idea, but they can remove context needed to answer a question. Larger chunks preserve surrounding context and citation continuity, but they may dilute embedding similarity, consume more context-window budget, and make top-k results less specific. In production RAG, chunk size should usually be measured in model tokens, tested against real queries, and adjusted by document type instead of treated as a universal constant.
Key Characteristics
- Usually measured in tokens for LLM workflows, though ingestion pipelines may start from characters or words
- Directly affects retrieval precision, context-window usage, and indexing volume
- Interacts with chunk overlap, document structure, top-k, reranking, and citation requirements
- Often varies by content type, such as policies, API docs, code, legal contracts, or support tickets
- Should be tuned with retrieval evaluation rather than selected by intuition alone
Common Use Cases
- Setting 300 to 600 token sections for product documentation RAG
- Using larger chunks for legal clauses that require surrounding definitions
- Keeping code examples and explanations together in developer search
- Reducing context cost by shrinking overly broad chunks
- Running offline experiments to compare answer quality across chunk sizes
Example
Loading code...Frequently Asked Questions
What is a good default chunk size for RAG?
There is no universal default, but many text-heavy RAG systems start around a few hundred tokens and then tune using retrieval and answer-quality evaluation.
Are smaller chunks always better?
No. Smaller chunks can retrieve precise snippets, but they may omit definitions, assumptions, tables, or preceding context needed for a correct answer.
Why measure chunk size in tokens?
LLMs operate under token budgets. Token-based sizing makes retrieval payloads easier to compare against context-window limits and generation cost.
Should every document use the same chunk size?
Usually not. A codebase, a policy manual, and a FAQ page have different structure and evidence needs, so they often require different sizing rules.