What is GraphRAG?
GraphRAG (Graph Retrieval-Augmented Generation) is an advanced AI retrieval architecture. It uses LLMs to extract entities and relationships from text during the data ingestion phase to build a Knowledge Graph, combining graph retrieval and vector retrieval during the query phase to significantly improve the LLM's accuracy in handling complex logic, cross-document reasoning, and global summarization tasks.
Quick Facts
| Full Name | Graph Retrieval-Augmented Generation |
|---|---|
| Created | Popularized in recent years with LLM architecture evolution |
How It Works
Traditional Naive RAG mainly relies on chunking and vectorizing (Embedding) documents, recalling relevant text snippets by calculating vector similarity. This approach is effective for fact extraction but often performs poorly when logical reasoning across multiple documents is required, or when concepts are ambiguous. GraphRAG solves this pain point by introducing a Knowledge Graph. Its core process includes: 1) Entity Extraction: transforming unstructured text into structured triplets (entity-relationship-entity); 2) Community Detection: dividing the graph into different levels of communities and generating summaries; 3) Hybrid Search: when a user asks a question, it retrieves not only similar text snippets but also related entities, relationships, and community summaries from the graph. This mechanism provides the LLM with extremely rich 'global structured context,' greatly reducing hallucinations and enhancing complex reasoning capabilities.
Key Characteristics
- Entity and Relationship Extraction: Uses LLMs to structure unstructured text
- Graph Database Driven: Usually relies on graph databases like Neo4j for storage and querying
- Community Summaries: Provides global perspectives at different levels
- Hybrid Search: Combines vector matching with graph relationship traversal
- Solves Cross-Document Reasoning: Excels at handling complex Queries requiring integrated scattered information
- High Construction Cost: The indexing phase requires frequent LLM calls, resulting in higher computational costs
Common Use Cases
- Complex QA Systems: Answering complex questions involving the interrelationships of multiple people, events, or concepts
- Global Document Summarization: Generating structured, high-level summaries for ultra-large corpora
- Anti-Fraud and Risk Control: Discovering hidden fraud patterns through relationship networks in the financial sector
- Medical and Research Assistance: Mining potential links between proteins, genes, and diseases across different literature
- Enterprise Knowledge Bases: Providing internal QA assistants with deep reasoning capabilities for enterprises
Example
Loading code...Frequently Asked Questions
What is the difference between GraphRAG and Naive RAG?
Naive RAG simply chunks documents, vectorizes them, and performs similarity comparisons. GraphRAG adds a Knowledge Graph layer on top of this. By extracting entities and relationships, it enables the AI to understand logical connections between concepts, excelling at handling complex cross-document reasoning problems.
Is the cost of building GraphRAG high?
Yes. During the data ingestion phase, GraphRAG requires using an LLM to traverse all text to extract entities and relationships, a process that consumes a massive amount of Tokens. Therefore, it is usually only used in scenarios with extremely high requirements for accuracy and complex reasoning.
What is Hybrid Search in GraphRAG?
Hybrid Search means that during the query phase, the system performs two types of searches simultaneously: one is vector search based on Embeddings to recall specific text chunks; the other is graph search based on entity matching to recall relationship networks. Finally, both are combined as Context to feed the LLM.