What is Dense Retrieval?
Dense Retrieval is a semantic search method that represents queries and documents as dense embedding vectors and retrieves results by vector similarity.
How It Works
Dense retrieval is the foundation of many modern RAG systems because it can find semantically related passages even when the query and document use different words. A query encoder maps the user request into a vector, a document encoder maps chunks into vectors, and a vector database or ANN index returns nearest neighbors. Its strength is semantic matching; its weakness is that exact identifiers, rare names, numbers, and strict filters can be missed unless metadata filtering, hybrid search, or reranking is added.
Key Characteristics
- Uses embedding vectors rather than exact term matching as the primary retrieval signal
- Supports semantic matches across paraphrases, synonyms, and multilingual content when the model is trained for it
- Typically relies on vector databases or approximate nearest neighbor indexes
- Sensitive to embedding model quality, chunking, normalization, and domain mismatch
- Often paired with sparse retrieval, metadata filters, and rerankers in production
Common Use Cases
- Finding documentation passages that answer a natural-language question
- Retrieving semantically similar support tickets
- Powering RAG over knowledge bases, policies, and developer docs
- Searching across multilingual content with a compatible embedding model
- Providing candidate passages before cross-encoder reranking
Example
Loading code...Frequently Asked Questions
How is dense retrieval different from keyword search?
Dense retrieval compares vectors learned by an embedding model, while keyword search mainly compares lexical terms. Dense retrieval can match meaning even when words differ.
Does dense retrieval replace BM25?
Not always. BM25 remains strong for exact terms, identifiers, and rare phrases, so many production systems use hybrid retrieval.
What causes poor dense retrieval results?
Common causes include weak embeddings, bad chunking, domain mismatch, missing metadata filters, and query patterns that require exact matching.
Why use reranking after dense retrieval?
Dense retrieval is efficient for candidate generation, but a reranker can compare the full query and passage more carefully to improve final ordering.