Retrieval-Augmented Generation (RAG) technology has significantly improved the accuracy and contextual relevance of large language models’ responses by incorporating external knowledge sources. However, traditional RAG approaches may overlook the structural relationships between entities when dealing with complex, heterogeneous information. For instance, vector databases might incorrectly associate “employee” more closely with “employer” rather than “information.”

To address this limitation, knowledge graphs have emerged as an effective solution. By utilizing a triplet structure of nodes and edges, such as “employer – submits – claim,” knowledge graphs clearly express relationships between entities. This structured approach enables more precise and efficient handling of complex data searches.

Technical Implementation

Setting Up the Environment

To begin building our knowledge graph using LlamaIndex and local PDF documents, we need to install several dependencies:

!pip install -q pypdf python-dotenv pyvis
!pip install -q transformers einops accelerate langchain bitsandbytes sentence_transformers langchain-community langchain-core
!pip install -q llama-index llama-index-llms-huggingface llama-index-embeddings-langchain llama-index-embeddings-huggingface

These installations include:

  • LlamaIndex: A flexible data framework for connecting custom data sources to LLMs
  • SimpleDirectoryReader: An easy way to load local file data into LlamaIndex
  • KnowledgeGraphIndex: For automatically constructing knowledge graphs from unstructured text
  • SimpleGraphStore: A simple graph storage index
  • PyVis: A Python library for visualizing and building graph networks

Enabling Diagnostic Logging

To gain valuable insights into code execution, we’ll enable diagnostic logging:

import os, logging, sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Connecting to Hugging Face API

To utilize Hugging Face models, you’ll need to set up your API access:

from huggingface_hub import login

os.environ["HF_KEY"] = "Your Hugging Face access token goes here"
login(token=os.environ.get('HF_KEY'), add_to_git_credential=True)

Loading PDF Documents

We’ll use SimpleDirectoryReader to load our PDF documents:

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_dir="/content/", required_exts=".pdf").load_data()

Building the Knowledge Graph Index

Creating Local Embeddings with Hugging Face

We’ll use the HuggingFaceEmbedding class to create text embeddings:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

EMBEDDING_MODEL_NAME = "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL_NAME, embed_batch_size=10)

Configuring Global Settings

LlamaIndex v0.10.0 introduced a new global Settings object to replace the previous ServiceContext configuration:

from llama_index.core import Settings

Settings.embed_model = embed_model
Settings.chunk_size = 256
Settings.chunk_overlap = 50

Defining Custom Prompts

We’ll set up custom prompts for our AI assistant:

from llama_index.core import PromptTemplate

system_prompt = """<|SYSTEM|># You are an AI-enabled admin assistant.
Your goal is to answer questions accurately using only the context provided.
"""

query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")

LLM_MODEL_NAME = "meta-llama/Llama-2-7b-chat-hf"

Setting Up the Language Model

We’ll configure our language model using HuggingFaceLLM:

import torch
from llama_index.llms.huggingface import HuggingFaceLLM

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=512,
    generate_kwargs={"temperature": 0.1, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name=LLM_MODEL_NAME,
    model_name=LLM_MODEL_NAME,
    device_map="auto",
    model_kwargs={"torch_dtype": torch.float16, "load_in_8bit": True}
)

Settings.llm = llm

Constructing the Knowledge Graph Index

Now, we’ll build our knowledge graph index:

from llama_index.core.storage.storage_context import StorageContext
from llama_index.core import KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore

graph_store = SimpleGraphStore()
storage_context = StorageContext.from_defaults(graph_store=graph_store)

index = KnowledgeGraphIndex.from_documents(
    documents=documents,
    max_triplets_per_chunk=3,
    storage_context=storage_context,
    embed_model=embed_model,
    include_embeddings=True
)

Visualizing the Knowledge Graph

We can visualize our knowledge graph using PyVis:

from pyvis.network import Network

g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.save_graph("rag_graph.html")

from IPython.display import HTML, display
HTML(filename="rag_graph.html")
Visualizing the Knowledge Graph

This knowledge graph visualization helps you understand the complex relationships between entities in your data.

Querying the Knowledge Graph

Finally, we can set up a query engine to interact with our knowledge graph:

query_engine = index.as_query_engine(llm=llm, similarity_top_k=5)

done = False
while not done:
    print("*" * 30)
    question = input("Enter your question: ")
    response = query_engine.query(question)
    print(response)
    done = input("End the chat? (y/n): ") == "y"

Conclusion

Traditional vector-based RAG and graph RAG approaches have distinct strengths in data storage and representation. Vector databases excel at comparing objects through similarity, using numerical values to measure distances between objects. On the other hand, knowledge graphs focus on revealing complex relationships and dependencies between objects, enabling deep semantic analysis and logical reasoning through nodes and edges.

Each method is suited to different application scenarios. Vector-based RAG is particularly effective for tasks requiring quick similarity comparisons, such as content recommendation or semantic search. Graph RAG, with its ability to capture and navigate intricate relationships, is well-suited for tasks that require understanding complex interconnections, like advanced question-answering systems or decision support tools in domains with highly interrelated data.

By combining these approaches, as demonstrated in this article, we can leverage the strengths of both vector embeddings and graph structures. This hybrid approach allows for more nuanced and context-aware information retrieval and generation, potentially leading to more accurate and insightful AI-powered applications.

The implementation described here, using LlamaIndex and local PDF documents, provides a practical starting point for developers and researchers looking to explore the potential of GraphRAG in their own projects. As the field of AI continues to evolve, such integrated approaches are likely to play an increasingly important role in creating more sophisticated and capable AI systems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *