What is Bi-Encoder?
Bi-Encoder is a retrieval model architecture that encodes queries and documents separately into embedding vectors so they can be compared efficiently by similarity search.
How It Works
A bi-encoder uses one encoder for the query and one encoder for the document, often sharing weights, then compares their embeddings with dot product, cosine similarity, or another vector metric. Because document embeddings can be computed ahead of time, bi-encoders are efficient enough for large-scale retrieval and are widely used as the first-stage retriever in RAG. The tradeoff is that the model does not jointly inspect the full query-document pair during retrieval, so subtle relevance judgments are often delegated to a cross-encoder reranker.
Key Characteristics
- Encodes queries and documents independently into the same vector space
- Allows document embeddings to be precomputed and stored in a vector database
- Scales well for first-stage candidate retrieval over large corpora
- Less precise than cross-encoders for fine-grained relevance judgments
- Commonly used before reranking in production RAG pipelines
Common Use Cases
- Generating embeddings for document chunks in a RAG index
- Retrieving top-k candidates from a vector database
- Serving low-latency semantic search over large knowledge bases
- Creating multilingual retrieval systems with compatible embeddings
- Pairing fast candidate recall with slower cross-encoder reranking
Example
Loading code...Frequently Asked Questions
Why are bi-encoders fast?
Document embeddings are computed offline. At query time, the system only embeds the query and performs vector similarity search.
Are query and document encoders always the same model?
Not always. Some systems use shared weights, while others use asymmetric encoders trained for different query and document distributions.
What is the main limitation of a bi-encoder?
It compares compressed vector representations rather than jointly reading the full query and document, which can miss fine-grained relevance signals.
How does a bi-encoder relate to a cross-encoder?
A bi-encoder is usually used for fast recall, while a cross-encoder reranks a smaller candidate set with more precise pairwise scoring.