Retrieval-Augmented Generation (RAG) technology has significantly improved the accuracy and contextual relevance of large language models’ responses by incorporating external knowledge sources. However, traditional RAG approaches may overlook the structural relationships between entities when dealing with complex, heterogeneous information. For instance, vector databases might incorrectly associate “employee” more closely with “employer” rather than “information.”
To address this limitation, knowledge graphs have emerged as an effective solution. By utilizing a triplet structure of nodes and edges, such as “employer – submits – claim,” knowledge graphs clearly express relationships between entities. This structured approach enables more precise and efficient handling of complex data searches.
Technical Implementation
Setting Up the Environment
To begin building our knowledge graph using LlamaIndex and local PDF documents, we need to install several dependencies:
!pip install -q pypdf python-dotenv pyvis
!pip install -q transformers einops accelerate langchain bitsandbytes sentence_transformers langchain-community langchain-core
!pip install -q llama-index llama-index-llms-huggingface llama-index-embeddings-langchain llama-index-embeddings-huggingface
These installations include:
- LlamaIndex: A flexible data framework for connecting custom data sources to LLMs
- SimpleDirectoryReader: An easy way to load local file data into LlamaIndex
- KnowledgeGraphIndex: For automatically constructing knowledge graphs from unstructured text
- SimpleGraphStore: A simple graph storage index
- PyVis: A Python library for visualizing and building graph networks
Enabling Diagnostic Logging
To gain valuable insights into code execution, we’ll enable diagnostic logging:
import os, logging, sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
Connecting to Hugging Face API
To utilize Hugging Face models, you’ll need to set up your API access:
from huggingface_hub import login
os.environ["HF_KEY"] = "Your Hugging Face access token goes here"
login(token=os.environ.get('HF_KEY'), add_to_git_credential=True)
Loading PDF Documents
We’ll use SimpleDirectoryReader to load our PDF documents:
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader(input_dir="/content/", required_exts=".pdf").load_data()
Building the Knowledge Graph Index
Creating Local Embeddings with Hugging Face
We’ll use the HuggingFaceEmbedding class to create text embeddings:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
EMBEDDING_MODEL_NAME = "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL_NAME, embed_batch_size=10)
Configuring Global Settings
LlamaIndex v0.10.0 introduced a new global Settings object to replace the previous ServiceContext configuration:
from llama_index.core import Settings
Settings.embed_model = embed_model
Settings.chunk_size = 256
Settings.chunk_overlap = 50
Defining Custom Prompts
We’ll set up custom prompts for our AI assistant:
from llama_index.core import PromptTemplate
system_prompt = """<|SYSTEM|># You are an AI-enabled admin assistant.
Your goal is to answer questions accurately using only the context provided.
"""
query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")
LLM_MODEL_NAME = "meta-llama/Llama-2-7b-chat-hf"
Setting Up the Language Model
We’ll configure our language model using HuggingFaceLLM:
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=512,
generate_kwargs={"temperature": 0.1, "do_sample": False},
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name=LLM_MODEL_NAME,
model_name=LLM_MODEL_NAME,
device_map="auto",
model_kwargs={"torch_dtype": torch.float16, "load_in_8bit": True}
)
Settings.llm = llm
Constructing the Knowledge Graph Index
Now, we’ll build our knowledge graph index:
from llama_index.core.storage.storage_context import StorageContext
from llama_index.core import KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore
graph_store = SimpleGraphStore()
storage_context = StorageContext.from_defaults(graph_store=graph_store)
index = KnowledgeGraphIndex.from_documents(
documents=documents,
max_triplets_per_chunk=3,
storage_context=storage_context,
embed_model=embed_model,
include_embeddings=True
)
Visualizing the Knowledge Graph
We can visualize our knowledge graph using PyVis:
from pyvis.network import Network
g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.save_graph("rag_graph.html")
from IPython.display import HTML, display
HTML(filename="rag_graph.html")
This knowledge graph visualization helps you understand the complex relationships between entities in your data.
Querying the Knowledge Graph
Finally, we can set up a query engine to interact with our knowledge graph:
query_engine = index.as_query_engine(llm=llm, similarity_top_k=5)
done = False
while not done:
print("*" * 30)
question = input("Enter your question: ")
response = query_engine.query(question)
print(response)
done = input("End the chat? (y/n): ") == "y"
Conclusion
Traditional vector-based RAG and graph RAG approaches have distinct strengths in data storage and representation. Vector databases excel at comparing objects through similarity, using numerical values to measure distances between objects. On the other hand, knowledge graphs focus on revealing complex relationships and dependencies between objects, enabling deep semantic analysis and logical reasoning through nodes and edges.
Each method is suited to different application scenarios. Vector-based RAG is particularly effective for tasks requiring quick similarity comparisons, such as content recommendation or semantic search. Graph RAG, with its ability to capture and navigate intricate relationships, is well-suited for tasks that require understanding complex interconnections, like advanced question-answering systems or decision support tools in domains with highly interrelated data.
By combining these approaches, as demonstrated in this article, we can leverage the strengths of both vector embeddings and graph structures. This hybrid approach allows for more nuanced and context-aware information retrieval and generation, potentially leading to more accurate and insightful AI-powered applications.
The implementation described here, using LlamaIndex and local PDF documents, provides a practical starting point for developers and researchers looking to explore the potential of GraphRAG in their own projects. As the field of AI continues to evolve, such integrated approaches are likely to play an increasingly important role in creating more sophisticated and capable AI systems.