A new project has taken GitHub by storm. The open-source UI project Kotaemon, designed for the production environment GraphRAG, has garnered an impressive 6.6K stars within just two days of its latest update, with a remarkable 1.3K stars added yesterday alone, securing its position at the top of the GitHub Trending list. After spending some time over the weekend deploying it, I found the process quite straightforward and highly recommend it to everyone.
While many users have found GraphRAG effective, several have reported challenges with deployment and usage. The Kotaemon UI aims to resolve these issues. In this article, I will use the paper introduced in my previous article as an example to guide you through the deployment and setup process.
GitHub: https://github.com/Cinnamon/kotaemon
Previous Recommendations
DeepMind’s Latest: Generative Validator (GenRM) Boosts Accuracy of the Popular Best-of-N by 16-64%
Key Features of Kotaemon
- Host your own document QA (RAG) web-UI. Support multi-user login, organize your files in private / public collections, collaborate and share your favorite chat with others.
- Organize your LLM & Embedding models. Support both local LLMs & popular API providers (OpenAI, Azure, Ollama, Groq).
- Hybrid RAG pipeline. Sane default RAG pipeline with hybrid (full-text & vector) retriever + re-ranking to ensure best retrieval quality.
- Multi-modal QA support. Perform Question Answering on multiple documents with figures & tables support. Support multi-modal document parsing (selectable options on UI).
- Advance citations with document preview. By default the system will provide detailed citations to ensure the correctness of LLM answers. View your citations (incl. relevant score) directly in the in-browser PDF viewer with highlights. Warning when retrieval pipeline return low relevant articles.
- Support complex reasoning methods. Use question decomposition to answer your complex / multi-hop question. Support agent-based reasoning with ReAct, ReWOO and other agents.
- Configurable settings UI. You can adjust most important aspects of retrieval & generation process on the UI (incl. prompts).
- Extensible. Being built on Gradio, you are free to customize / add any UI elements as you like. Also, we aim to support multiple strategies for document indexing & retrieval.
GraphRAG
indexing pipeline is provided as an example.
Deploying and Setting Up Kotaemon
For end users
This document is intended for developers. If you just want to install and use the app as it is, please follow the non-technical User Guide (WIP).
For developers
With Docker (recommended)
- Use this command to launch the server
docker run
-e GRADIO_SERVER_NAME=0.0.0.0
-e GRADIO_SERVER_PORT=7860
-p 7860:7860 -it --rm
taprosoft/kotaemon:v1.0
Navigate to http://localhost:7860/
to access the web UI.
Without Docker
- Clone and install required packages on a fresh python environment.
# optional (setup env) conda create -n kotaemon python=3.10 conda activate kotaemon # clone this repo git clone https://github.com/Cinnamon/kotaemon cd kotaemon pip install -e "libs/kotaemon[all]" pip install -e "libs/ktem"
- View and edit your environment variables (API keys, end-points) in
.env
. - (Optional) To enable in-browser PDF_JS viewer, download PDF_JS_DIST and extract it to
libs/ktem/ktem/assets/prebuilt
- Start the web server:
python app.py
The app will be automatically launched in your browser.
Default username / password are: admin
/ admin
. You can setup additional users directly on the UI.
Setup local models (for local / private RAG)
See Local model setup.
Customize your application
By default, all application data are stored in ./ktem_app_data
folder. You can backup or copy this folder to move your installation to a new machine.
For advance users or specific use-cases, you can customize those files:
flowsettings.py
.env
flowsettings.py
This file contains the configuration of your application. You can use the example here as the starting point.
Notable settings
# setup your preferred document store (with full-text search capabilities)
KH_DOCSTORE=(Elasticsearch | LanceDB | SimpleFileDocumentStore)
# setup your preferred vectorstore (for vector-based search)
KH_VECTORSTORE=(ChromaDB | LanceDB | InMemory)
# Enable / disable multimodal QA
KH_REASONINGS_USE_MULTIMODAL=True
# Setup your new reasoning pipeline or modify existing one.
KH_REASONINGS = [
"ktem.reasoning.simple.FullQAPipeline",
"ktem.reasoning.simple.FullDecomposeQAPipeline",
"ktem.reasoning.react.ReactAgentPipeline",
"ktem.reasoning.rewoo.RewooAgentPipeline",
]
)
.env
This file provides another way to configure your models and credentials.
Configure model via the .env file
Alternatively, you can configure the models via the .env
file with the information needed to connect to the LLMs. This file is located in the folder of the application. If you don’t see it, you can create one.
Currently, the following providers are supported:
OpenAI
In the .env
file, set the OPENAI_API_KEY
variable with your OpenAI API key in order to enable access to OpenAI’s models. There are other variables that can be modified, please feel free to edit them to fit your case. Otherwise, the default parameter should work for most people.
OPENAI_API_BASE=https://api.openai.com/v1 OPENAI_API_KEY=<your OpenAI API key here> OPENAI_CHAT_MODEL=gpt-3.5-turbo OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002
Azure OpenAI
For OpenAI models via Azure platform, you need to provide your Azure endpoint and API key. Your might also need to provide your developments’ name for the chat model and the embedding model depending on how you set up Azure development.
AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_API_KEY= OPENAI_API_VERSION=2024-02-15-preview AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002
Local models
Using ollama OpenAI compatible server
Install ollama and start the application.
Pull your model (e.g):
ollama pull llama3.1:8b
ollama pull nomic-embed-text
Set the model names on web UI and make it as default.
Using GGUF with llama-cpp-python
You can search and download a LLM to be ran locally from the Hugging Face Hub. Currently, these model formats are supported:
- GGUF
You should choose a model whose size is less than your device’s memory and should leave about 2 GB. For example, if you have 16 GB of RAM in total, of which 12 GB is available, then you should choose a model that takes up at most 10 GB of RAM. Bigger models tend to give better generation but also take more processing time.
Here are some recommendations and their size in memory:
- Qwen1.5-1.8B-Chat-GGUF: around 2 GB
Add a new LlamaCpp model with the provided model name on the web uI.
Adding your own RAG pipeline
Custom reasoning pipeline
First, check the default pipeline implementation in here. You can make quick adjustment to how the default QA pipeline work.
Next, if you feel comfortable adding new pipeline, add new .py
implementation in libs/ktem/ktem/reasoning/
and later include it in flowssettings
to enable it on the UI.
Custom indexing pipeline
Check sample implementation in libs/ktem/ktem/index/file/graph
(more instruction WIP).
Advanced Features of Kotaemon
Modifying the RAG Pipeline
One of Kotaemon’s significant advantages is its extensibility. Developers can modify the default RAG pipeline or implement custom logic via Python to create entirely new pipelines. This flexibility allows for the integration of customized retrieval and reasoning methods tailored to specific use cases. Kotaemon utilizes an advanced RAG pipeline called Haystack, which is employed by thousands of large enterprises in industrial production environments, and many leading AI companies also use it. I will introduce it to you in future discussions.
Extending the Index Pipeline
Kotaemon includes an example implementation of a graphical index pipeline, allowing developers to customize or replace it with other indexing strategies. This feature is particularly important for developers handling large and complex document libraries.
UI Customization
Due to its Gradio-based architecture, Kotaemon’s UI is highly customizable. Developers can add new components, adjust layouts, and integrate additional features as needed. This capability is crucial for creating user experiences that meet specific application requirements. I am still exploring this UI customization.
Best Practices for Kotaemon
Enterprise Document Management
Kotaemon is highly suitable for enterprise applications where managing and querying large document libraries is a common requirement. Its multi-user support, advanced citation features, and configurable retrieval settings make it an ideal tool for industries such as legal, finance, and healthcare.
Research and Academia
For researchers and academic institutions, Kotaemon provides a powerful platform for exploring large datasets and conducting literature reviews. Its ability to handle complex, multi-hop queries and provide detailed citations ensures the accuracy and traceability of responses.
Custom AI Solutions
AI developers can leverage Kotaemon as a foundation for creating customized RAG pipelines. Whether developing customer support chatbots or building technical document analysis tools, Kotaemon offers the flexibility to adapt to various use cases.
If you encounter any issues with installation or setup, feel free to join the community to discuss with me.
What is GraphRAG and how does it enhance AI applications?
GraphRAG is a retrieval-augmented generation framework that leverages knowledge graphs to improve data retrieval and response generation. By creating a structured representation of data, it allows for more accurate and context-aware answers, making it ideal for complex queries in various domains like finance and healthcare. For more detailed insights, you can visit the official GraphRAG documentation.
Who should use Kotaemon’s GraphRAG UI?
Kotaemon’s GraphRAG UI is suitable for both beginners and advanced users. It is particularly beneficial for data scientists, researchers, and developers looking to streamline their data analysis and retrieval processes. The user-friendly interface simplifies complex tasks, making it accessible for those new to AI tools. For more information, check out the Kotaemon GitHub page.
Are there any costs associated with using GraphRAG?
GraphRAG is open-source and free to use, but users should be aware of potential costs related to cloud services or APIs if they choose to deploy it in a cloud environment. Monitoring usage is advisable to avoid unexpected charges, especially when processing large datasets. For further details, refer to the official GraphRAG resources.
How does GraphRAG compare to traditional data retrieval methods?
GraphRAG significantly improves upon traditional methods by utilizing knowledge graphs, which provide a more comprehensive view of data relationships. This approach enhances the accuracy and relevance of responses, especially for complex queries that involve multiple entities. For a deeper understanding, visit the GraphRAG blog.
Can I integrate GraphRAG with existing software applications?
Yes, GraphRAG can be easily integrated into various applications through its API, allowing developers to enhance their software with advanced data retrieval capabilities. This flexibility makes it a powerful tool for businesses looking to leverage AI in their operations. For integration guides, check the official GraphRAG documentation.