What is HyDE?

HyDE is a retrieval technique that asks a language model to generate a hypothetical document or answer, embeds that generated text, and uses it to retrieve real documents.

Quick Facts

Full NameHypothetical Document Embeddings

How It Works

HyDE stands for Hypothetical Document Embeddings. Instead of embedding only the user's short or ambiguous query, the system first generates a plausible answer-like document and embeds that richer text for retrieval. This can improve dense retrieval when the query lacks the vocabulary used in the corpus. The generated document is not treated as evidence; it is only a search aid. Production systems must avoid letting hallucinated details from the hypothetical text leak into the final answer without support from retrieved sources.

Key Characteristics

  • Uses an LLM-generated hypothetical passage as an intermediate retrieval representation
  • Can improve dense retrieval for short, vague, or vocabulary-poor queries
  • The hypothetical text should guide search, not serve as factual evidence
  • Adds generation latency and cost before retrieval
  • Requires grounding checks so generated assumptions do not contaminate the answer

Common Use Cases

  1. Improving retrieval for short natural-language questions
  2. Searching specialized corpora when users lack domain vocabulary
  3. Generating richer semantic queries for dense retrieval
  4. Comparing baseline query embeddings against hypothetical-document embeddings
  5. Handling exploratory research questions in RAG systems

Example

loading...
Loading code...

Frequently Asked Questions

Is HyDE's generated text used as evidence?

No. The generated text is a retrieval aid. Final answers should be grounded in real retrieved documents, not in the hypothetical passage.

When does HyDE help most?

It tends to help when user queries are short, vague, or use vocabulary that differs from the indexed documents.

What is the main risk of HyDE?

The hypothetical document may contain false assumptions. Systems must prevent those assumptions from being treated as facts.

How is HyDE different from normal query rewriting?

Query rewriting usually produces search queries, while HyDE produces an answer-like or document-like text that is embedded for retrieval.

Related Tools

Related Terms

Related Articles