What is Context Compression?
Context Compression is the process of reducing the amount of context sent to an LLM while preserving the information needed for the task.
How It Works
Context compression is used when raw context is too long, expensive, noisy, or poorly structured for a model request. It may summarize conversation history, extract relevant spans, remove duplicates, compress retrieved chunks, or transform documents into structured notes. Compression should be evaluated carefully: it can lower latency and improve focus, but it can also delete caveats, citations, permissions, or minority evidence. Good compression preserves traceability back to original sources.
Key Characteristics
- Reduces prompt length while trying to preserve task-relevant information
- Can use summarization, extraction, filtering, deduplication, or structured rewriting
- Improves context budget, latency, and focus when done well
- Can harm grounding if evidence, citations, or caveats are removed
- Should preserve links or offsets to original sources when used in RAG
Common Use Cases
- Summarizing long chat history before a new request
- Compressing retrieved RAG chunks into source-grounded notes
- Removing duplicate evidence from a prompt
- Extracting only relevant contract clauses for review
- Reducing prompt cost while preserving answer quality
Example
Loading code...Frequently Asked Questions
Is context compression the same as summarization?
Summarization is one method. Compression can also use extraction, filtering, deduplication, ranking, and structured rewriting.
What is the main risk of context compression?
It may remove evidence, caveats, minority viewpoints, permissions, or citation details needed for a correct answer.
How should compression be evaluated?
Evaluate answer quality, evidence recall, citation faithfulness, token savings, and whether original-source traceability is preserved.
When should context be compressed?
Compress when raw context exceeds budget, creates noise, raises latency, or buries important evidence in long prompts.