RAGFlow: Ultimate Open-Source RAG Engine for 2024 | 7.8K Stars

July 7, 2024

by kevin

RAGFlow is an innovative open-source Retrieval-Augmented Generation (RAG) engine that leverages deep document understanding to provide a powerful solution for information retrieval and question-answering tasks. By combining large language models (LLMs) with advanced document parsing techniques, RAGFlow offers a streamlined workflow for businesses of all sizes to extract valuable insights from complex, unstructured data.

Core Functionality: DeepDoc

At the heart of RAGFlow lies DeepDoc, a sophisticated document processing system that employs cutting-edge visual processing and parsing capabilities to extract meaningful information from various document formats.

Visual Processing

DeepDoc’s visual processing component utilizes three key technologies:

Optical Character Recognition (OCR): RAGFlow employs state-of-the-art OCR techniques to accurately convert typed, handwritten, or printed text into machine-encoded text.

python deepdoc/vision/t_ocr.py --inputs=path_to_images_or_pdfs --output_dir=path_to_store_result

This fundamental step enables the system to work with a wide range of document types, including scanned images and PDFs.

Layout Recognition: The system can identify and analyze different document layouts, such as those found in newspapers, magazines, books, and resumes.

This capability allows RAGFlow to understand the structural relationships between different elements in a document, including:

Text and titles
Figures and captions
Tables and table captions
Headers and footers
References and formulas

You can view the layout detection results with the following command:

python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=layout --output_dir=path_to_store_result

The input can be a directory of images or PDFs, or a single image or PDF file. You can view the folder path_to_store_result, which contains images showing the detection results, as shown below:

Table Structure Recognition (TSR): RAGFlow excels at identifying and extracting information from complex table structures, including those with hierarchical headers, spanning cells, and projected row headers. The TSR component can recognize:

Columns and rows
Column and row headers
Merged cells

You can view the table structure recognition results with the following command:

python deepdoc/vision/t_recognizer.py --inputs=path_to_images_or_pdfs --threshold=0.2 --mode=tsr --output_dir=path_to_store_result

The input can be a directory of images or PDFs, or a single image or PDF file. You can view the folder path_to_store_result, which contains images and html pages that show the following detection results:

Parser Capabilities

RAGFlow’s parser is designed to handle multiple document formats, including PDF, DOCX, EXCEL, and PPT. The PDF parser, in particular, is highly sophisticated due to the format’s flexibility. It can extract:

Text blocks with precise positioning information (page number and rectangular coordinates)
Tables, including cropped images and content translated into natural language sentences
Figures with captions and embedded text

Getting Started with RAGFlow

To begin using RAGFlow, users need to meet the following system requirements:

CPU: 4 cores or more
RAM: 16 GB or more
Disk space: 50 GB or more
Docker: version 24.0.0 or later
Docker Compose: version 2.26.1 or later

The installation process involves cloning the RAGFlow repository and using Docker to set up the environment. Detailed instructions are provided for configuring the necessary parameters and starting the server.

Start the server

Make sure vm.max_map_count >= 262144

Check the value of vm.max_map_count:

$ sysctl vm.max_map_count

If not, reset the value of vm.max_map_count to at least 262144.

$ sudo sysctl -w vm.max_map_count=262144

This change will reset after a system reboot. To ensure your changes remain permanent, add or update the value of vm.max_map_count in /etc/sysctl.conf accordingly:

vm.max_map_count=262144

Clone the project

$ git clone https://github.com/infiniflow/ragflow.git

Build the pre-built Docker image and start the server:

Running the following command will automatically download the development version RAGFlow Docker image. To download and run a specific Docker version, update RAGFLOW_VERSION in docker/.env to the desired version, such as RAGFLOW_VERSION=v0.6.0, before running the following command.

$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

The core image is approximately 9 GB in size and may take a while to load.

Once the server is up and running, check the server status:

$ docker logs -f ragflow-server

The following output confirms that the system has started successfully:

    ____                 ______ __
   / __  ____ _ ____ _ / ____// /____  _      __
  / /_/ // __ `// __ `// /_   / // __ | | /| / /
 / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
/_/ |_| __,_/ __, //_/    /_/ ____/ |__/|__/
              /____/

 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:9380
 * Running on http://x.x.x.x:9380
 INFO:werkzeug:Press CTRL+C to quit

If you skip this confirmation step and log in directly RAGFlow, your browser may prompt a network exception error because your RAGFlow may not be fully initialized at this time.

In the web browser, enter the IP address of the server and log in to RAGFlow.

With the default settings, you only need to enter http://IP_OF_YOUR_MACHINE (without the port number) because the default HTTP service port 80 can be omitted when using the default configuration.

Configuring LLMs and Creating Knowledge Bases

RAGFlow supports integration with various Large Language Models (LLMs), including:

OpenAI
Tongyi Qianwen
Moonshot
DeepSeek-V2

Users can easily configure their preferred LLM by updating API keys and selecting default models for chat, embedding, and image-to-text tasks.

Creating knowledge bases in RAGFlow is a straightforward process:

Upload documents in supported formats (PDF, DOC, DOCX, TXT, MD, CSV, XLSX, XLS, JPEG, JPG, PNG, TIF, GIF, PPT, PPTX)
Choose an embedding model and chunking template
Parse the uploaded files
Review and optionally intervene in the parsing results

Click on the top right corner of the page > Model Providers:

Every RAGFlow account can use Tongyi-Qianwen’s text-embedding-v2 model for free. That’s why you can see Tongyi-Qianwen in the list of added models. You may need to update your Tongyi-Qianwen API key later.

Click on the desired LLM and update the corresponding API key (in this case, DeepSeek-V2):

The model you added looks like this:

Click System Model Settings to select the default model:

Chat model
Embedding model
Image-to-text model

Some models, such as the Wenshengtu model qwen-vl-max, are dependent models of specific LLMs. You may need to update your API key accordingly to use these models.

Create your first knowledge base

You can upload files to a knowledge base in RAGFlow and parse them into datasets. A knowledge base is actually a collection of datasets. Question answering in RAGFlow can be based on a specific knowledge base or multiple knowledge bases. The file formats supported by RAGFlow include documents (PDF, DOC, DOCX, TXT, MD), tables (CSV, XLSX, XLS), images (JPEG, JPG, PNG, TIF, GIF), and slides (PPT, PPTX).

Click the Knowledge Base tab > Create Knowledge Base in the top middle of the page.

Enter the knowledge base name and click the OK button. You will be taken to the configuration page of the knowledge base:

RAGFlow provides a variety of chunking templates to accommodate different document layouts and file formats. Select the embedding model and chunk template for your knowledge base.

Note: Once you have selected an embedding model and used it to parse files, you cannot change it anymore. The reason is obvious, we must ensure that all files in a specific knowledge base are parsed using the same embedding model (make sure they are compared in the same embedding space).

Click + Add file > Local files to start uploading the specified file to the knowledge base.

In the uploaded file entry, click the Play button to start Parsing the File:

After the file is parsed, the parsing status changes to SUCCESS.

Intervention in File Parsing

RAGFlow has visibility and explainability, allowing you to review the chunking results and intervene if necessary. To do this:

Click on the file that has finished parsing the file to see the chunking results:

Hover over each snapshot to quickly see each chunk.

Double-click on the chunked text to add keywords or make manual changes if necessary:

In the retrieval test, ask a quick question in the test text to double-check that your configuration is working: As you can see below, RAGFlow responded with a real quote.

AI Chat Functionality

RAGFlow’s AI chat feature allows users to interact with their knowledge bases through natural language queries. To set up an AI assistant:

Create a new assistant and specify the knowledge base(s) to use
Configure the assistant’s behavior, including how to handle empty responses
Update the prompt engine and model settings as needed

RAGFlow also provides an API for integrating chat functionality into external applications.

Advanced Features and Future Developments

RAGFlow is continuously evolving to meet the challenges of long-context RAG systems. Some advanced features and future developments include:

Long-context RAG based on RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval): This approach builds a hierarchical tree of document summaries, enabling flexible answering of both low-level questions from single documents and high-level questions requiring information across multiple documents.
Improved handling of multi-lingual documents: While RAGFlow currently focuses on English and Chinese, there are plans to expand language support for broader international use.
Enhanced visualization tools: Future updates may include more advanced tools for visualizing document structure and RAG processes, making it easier for users to understand and fine-tune their systems.
Integration with more LLMs and embedding models: As the field of AI rapidly evolves, RAGFlow aims to stay current by supporting a wider range of models and technologies.

Conclusion

RAGFlow represents a significant advancement in the field of document AI and information retrieval. By combining deep document understanding with flexible RAG workflows, it offers a powerful solution for businesses and researchers looking to extract valuable insights from complex, unstructured data.

The open-source nature of RAGFlow encourages community contributions and adaptations, ensuring that the tool will continue to evolve and improve over time. As organizations increasingly rely on AI-powered document processing and question-answering systems, RAGFlow is well-positioned to become a key player in this rapidly growing field.

What is RAGFlow and how does it work?

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine designed to enhance document understanding and question-answering capabilities. It integrates with large language models (LLMs) to provide accurate responses, leveraging advanced document parsing techniques to extract relevant information from various formats.

How does RAGFlow compare to other RAG engines?

RAGFlow differentiates itself through its fine-grained document parsing and traceable answer features. Unlike other engines, it minimizes hallucinations by providing citations for its responses, ensuring users can verify the information’s source and reliability.

What types of data can RAGFlow process?

RAGFlow can process a wide variety of data types, including text documents, images, and tables. Its advanced parsing capabilities allow it to handle complex layouts, making it suitable for diverse applications such as legal documents, academic papers, and technical reports.

Is RAGFlow suitable for enterprise use?

Yes, RAGFlow is designed for scalability and can be integrated into enterprise environments. Its API allows for seamless integration with existing systems, enabling businesses to leverage its advanced retrieval capabilities to enhance their data processing workflows.

Where can I find official documentation and support for RAGFlow?

Official documentation for RAGFlow can be found on its website, which includes installation guides, API references, and troubleshooting tips. For community support, users can access forums and discussion groups dedicated to RAGFlow and its applications.

Categories: GitHub