What is Cross-Encoder?

Cross-Encoder is a ranking model architecture that jointly encodes a query and a candidate document or passage to produce a relevance score.

How It Works

A cross-encoder evaluates the query and passage together, allowing attention across both texts. This usually yields better relevance judgments than comparing independent embeddings, especially for nuanced questions, negation, constraints, and short passages. The cost is latency: every query-passage pair must be scored at request time. For that reason, cross-encoders are commonly used as rerankers after a faster retriever such as BM25, dense retrieval, or hybrid search has produced a manageable candidate set.

Key Characteristics

  • Jointly reads the query and candidate passage before scoring relevance
  • More precise than bi-encoders for many reranking tasks
  • Computationally expensive because each query-document pair is evaluated separately
  • Useful for handling negation, constraints, and subtle wording differences
  • Typically applied to top-k candidates rather than the entire corpus

Common Use Cases

  1. Reranking top 50 dense-retrieval results before sending context to an LLM
  2. Improving RAG answers where first-stage retrieval returns noisy candidates
  3. Ranking policy passages against detailed compliance questions
  4. Scoring query-document pairs for retrieval evaluation
  5. Combining BM25 and vector candidates into a final ordered list

Example

loading...
Loading code...

Frequently Asked Questions

Why are cross-encoders more accurate than bi-encoders?

They jointly process the query and passage, so the model can directly compare constraints, entities, and wording before producing a relevance score.

Why not use a cross-encoder for all retrieval?

It is too expensive for large corpora because every query would need to be paired with every document or chunk at request time.

Where does a cross-encoder fit in RAG?

It usually reranks a candidate set produced by BM25, dense retrieval, or hybrid retrieval before final context is assembled.

Can an LLM act as a cross-encoder reranker?

An LLM can score or compare passages, but dedicated cross-encoder rerankers are often cheaper and more consistent for high-volume retrieval.

Related Tools

Related Terms

Rerank

Reranking is an advanced stage in information retrieval and RAG (Retrieval-Augmented Generation) pipelines. After a rapid initial retrieval (e.g., using vector cosine similarity or BM25 keyword search) recalls a broad set of candidate documents, Reranking introduces a more computationally expensive but highly capable Cross-Encoder model. This model takes both the user's Query and the Document simultaneously, calculates a deep semantic relevance score, and re-orders the candidates to push the most relevant snippets to the top for the LLM to generate the final answer.

Bi-Encoder

Bi-Encoder is a retrieval model architecture that encodes queries and documents separately into embedding vectors so they can be compared efficiently by similarity search.

Dense Retrieval

Dense Retrieval is a semantic search method that represents queries and documents as dense embedding vectors and retrieves results by vector similarity.

Hybrid Search

Hybrid Search is a technique in information retrieval and RAG (Retrieval-Augmented Generation) systems that employs multiple search algorithms simultaneously. The most common combination fuses Dense Vector Retrieval, which captures contextual and conceptual meaning, with Sparse Keyword Retrieval (typically the BM25 algorithm), which focuses on exact lexical matching and finding specific entities. The system runs both searches in parallel and then merges their results using a fusion algorithm (like Reciprocal Rank Fusion, RRF). This ensures the system understands user intent while never missing critical documents containing specific product names, IDs, or industry jargon.

Related Articles