In an era where data is abundant, the ability to efficiently search and extract insights from local files is paramount. This article presents an innovative open-source generative AI search engine that utilizes the Llama 3 model to facilitate intelligent semantic searches of local files. This project not only serves as a robust alternative to existing tools like Microsoft Copilot but also champions the ethos of technology sharing and innovation.
System Architecture
To construct a local generative search engine or assistant, several components are necessary:
- Content Indexing System: This component is responsible for storing the content of local files and is equipped with an information retrieval engine to efficiently search for the most relevant documents related to user queries.
- Language Model: The Llama 3 model analyzes the selected local document content and generates concise summary answers based on it.
- User Interface: An intuitive interface that allows users to easily query and obtain information.
Interaction Between Components
The interaction between these components is illustrated as follows:
- Qdrant is employed as the vector storage solution, while Streamlit serves as the user interface. The Llama 3 model can be accessed via the Nvidia NIM API (700B version) or downloaded from HuggingFace (80B version). Document chunking is accomplished using Langchain.
Semantic Indexing
Semantic indexing is crucial for providing the most relevant document matches by analyzing the similarity between file content and queries. Qdrant serves as the vector storage solution, allowing for efficient document similarity comparisons directly in memory without requiring a full server-side installation.
Initializing Qdrant
When initializing Qdrant, it is necessary to predefine the vectorization method and metrics used. Here’s how to set it up:
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
client = QdrantClient(path="qdrant/")
collection_name = "MyCollection"
if client.collection_exists(collection_name):
client.delete_collection(collection_name)
client.create_collection(collection_name, vectors_config=VectorParams(size=768, distance=Distance.DOT))
Document Embedding
To build the vector index, documents on the hard drive must undergo embedding processing. Selecting an appropriate embedding method and vector comparison metric is crucial, as different paragraph, sentence, or word embedding techniques yield varying results.
One of the main challenges in document vector searches is the asymmetric search problem, which is prevalent in information retrieval, especially when matching short queries with long documents.
In this implementation, we selected a model fine-tuned on the MSMARCO dataset, named sentence-transformers/msmarco-bert-base-dot-v5
. This model is based on the BERT architecture and is specifically optimized for dot-product similarity measures.
Chunking Documents
To address the limitations of BERT models, which can only handle a maximum of 512 tokens, we opted for document chunking. This process utilizes LangChain’s built-in chunking tool:
from langchain_text_splitters import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_text(file_content)
metadata = [{"path": file} for file in texts]
qdrant.add_texts(texts, metadatas=metadata)
This code splits the text into segments of 500 tokens, with a 50-token overlap to maintain contextual continuity.
Generating the Index
Before indexing file content, it is essential to read these files. The project simplifies this process by allowing users to specify the folder they wish to index. The indexer recursively searches through the specified folder and its subfolders for all supported file types, such as PDF, Word, PPT, and TXT formats.
Retrieving Files
Here’s a recursive method to retrieve all files within a given folder:
import os
def get_files(dir):
file_list = []
for f in os.listdir(dir):
if os.path.isfile(os.path.join(dir, f)):
file_list.append(os.path.join(dir, f))
elif os.path.isdir(os.path.join(dir, f)):
file_list += get_files(os.path.join(dir, f))
return file_list
Reading File Content
The project supports reading various formats, including MS Word documents (.docx), PDF documents, MS PowerPoint presentations (.pptx), and plain text files (.txt). Below are examples of how to read these formats:
For MS Word documents:
import docx
def getTextFromWord(filename):
doc = docx.Document(filename)
fullText = [para.text for para in doc.paragraphs]
return 'n'.join(fullText)
For PDF files:
import PyPDF2
def getTextFromPDF(filename):
reader = PyPDF2.PdfReader(filename)
return " ".join([reader.pages[i].extract_text() for i in range(len(reader.pages))])
Complete Indexing Function
The complete indexing function is structured as follows:
file_content = ""
for file in onlyfiles:
file_content = ""
if file.endswith(".pdf"):
print("Indexing " + file)
file_content = getTextFromPDF(file)
elif file.endswith(".txt"):
print("Indexing " + file)
with open(file, 'r') as f:
file_content = f.read()
elif file.endswith(".docx"):
print("Indexing " + file)
file_content = getTextFromWord(file)
elif file.endswith(".pptx"):
print("Indexing " + file)
file_content = getTextFromPPTX(file)
else:
continue
texts = text_splitter.split_text(file_content)
metadata = [{"path": file} for _ in texts]
qdrant.add_texts(texts, metadatas=metadata)
print("Finished indexing!")
Generative Search API
The web service is built using the FastAPI framework, designed to host the generative search engine. This API will connect to the previously established Qdrant client index, leveraging vector similarity search algorithms to delve deeper and using the Llama 3 model to generate precise answers from the most relevant chunks.
Setting Up the API
Here’s how to configure and introduce the key components of the generative search:
from fastapi import FastAPI
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_qdrant import Qdrant
from qdrant_client import QdrantClient
from pydantic import BaseModel
app = FastAPI()
class Item(BaseModel):
query: str
@app.get("/")
async def root():
return {"message": "Hello World"}
Search Functionality
To ensure the API operates correctly, two functionalities will be designed: one for semantic search and another that selects the top 10 most relevant text blocks as context to generate answers based on the search.
@app.post("/search")
def search(Item: Item):
query = Item.query
search_result = qdrant.similarity_search(query=query, k=10)
list_res = [{"id": i, "path": res.metadata.get("path"), "content": res.page_content} for i, res in enumerate(search_result)]
return list_res
@app.post("/ask_localai")
async def ask_localai(Item: Item):
query = Item.query
search_result = qdrant.similarity_search(query=query, k=10)
context = ""
mappings = {}
for i, res in enumerate(search_result):
context += f"{i}n{res.page_content}nn"
mappings[i] = res.metadata.get("path")
rolemsg = {
"role": "system",
"content": "Answer user's question using documents given in the context. Please always reference document id (in square brackets, for example [0],[1]) of the document that was used to make a claim."
}
messages = [rolemsg, {"role": "user", "content": f"Documents:n{context}nnQuestion: {query}"}]
# Call to Llama 3 model for generating the answer
completion = client_ai.chat.completions.create(
model="meta/llama3-70b-instruct",
messages=messages,
temperature=0.5,
top_p=1,
max_tokens=1024,
stream=False
)
response = completion.choices[0].message.content
return {"response": response}
Conclusion
This article has outlined the process of building a generative AI search engine for local files by integrating Qdrant’s semantic search technology with the powerful Llama 3 language model. The resulting system enables a referenced-augmented generation (RAG) workflow, allowing users to search through local files and receive concise, cited answers to their queries.
The project illustrates the potential of open-source AI tools to enhance productivity and knowledge discovery. By leveraging large language models like Llama 3, developers can create intelligent search solutions that go beyond simple keyword matching, truly understanding the meaning behind user requests.
As AI continues to advance, we can expect to see more innovative applications like this generative search engine emerge, empowering users to effortlessly navigate and extract insights from their local data. The future of AI-powered search is bright, and projects like this one are leading the way.