Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BukeLy/rag-api

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

178 Commits
178 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 RAG API

Multi-tenant Multimodal Document Intelligent Retrieval System

Enterprise-grade RAG service built on RAG-Anything and LightRAG

CI Python FastAPI LightRAG Docker License

English | 简体中文

FeaturesQuick StartArchitectureAPI DocumentationDeployment


📖 Introduction

RAG API is an enterprise-grade Retrieval-Augmented Generation (RAG) service that combines the powerful document parsing capabilities of RAG-Anything with the efficient knowledge graph retrieval technology of LightRAG, providing intelligent Q&A capabilities for your documents.

🎯 Key Highlights

  • 🏢 Multi-tenant Isolation - Complete tenant data isolation for enterprise multi-tenant scenarios
  • 🎨 Multimodal Parsing - Support for PDF, Word, images and more, with full OCR, tables, and formulas coverage
  • High-performance Retrieval - Knowledge graph-based hybrid retrieval with 6-15 second query response
  • 🔄 Flexible Deployment - Support for production and development modes with one-click switching
  • 📦 Ready to Use - One-click Docker deployment, service starts in 3 minutes
  • 🎛️ Multiple Parsing Engines - DeepSeek-OCR (Remote API) + MinerU (Local/Remote API) + Docling (Fast)
  • 🎨 RAG-Anything VLM Enhancement - Three modes (off/selective/full) for deep chart understanding
  • 💾 Task Persistence - Redis storage support, tasks recoverable after container restart/instance rebuild

✨ Features

📄 Document Processing

  • Multiple Format Support

    • PDF, Word, Excel, PPT
    • PNG, JPG, WebP images
    • TXT, Markdown text
  • Intelligent Parsing

    • Plain text (.txt, .md) → Direct insertion (ultra-fast ~1s, skip parser)
    • OCR text recognition
    • Structured table extraction
    • Mathematical formula recognition
    • Layout analysis
  • RAG-Anything VLM Enhancement 🆕

    • off - Markdown only (fastest)
    • selective - Selective processing of important charts
    • full - Complete context enhancement processing
    • Smart filtering: with titles, large size, first page content
    • ⚠️ Only supports remote MinerU mode, local mode uses RAG-Anything native methods
  • Batch Processing

    • Up to 100 files per batch
    • Async task queue
    • Real-time progress tracking

🔍 Intelligent Retrieval

  • Multi-mode Query

    • naive - Vector retrieval (fastest)
    • local - Local graph
    • global - Global graph
    • hybrid - Hybrid retrieval
    • mix - Full retrieval (most accurate)
  • Knowledge Graph

    • Automatic entity extraction
    • Relationship reasoning
    • Semantic understanding
    • Context enhancement
  • External Storage

    • DragonflyDB (KV storage + task storage)
    • Qdrant (vector storage)
    • Memgraph (graph database)
    • Task persistence (Redis mode)

🏗️ Architecture

System Architecture Diagram

graph TB
    subgraph "Client Layer"
        Client[Client Application]
        WebUI[Web Interface]
    end
    
    subgraph "API Gateway Layer"
        FastAPI[FastAPI Service]
        Auth[Tenant Authentication]
    end
    
    subgraph "Business Logic Layer"
        TenantMgr[Tenant Manager]
        TaskQueue[Task Queue]
        
        subgraph "Document Processing"
            DeepSeekOCR[DeepSeek-OCR<br/>Fast OCR 80% cases]
            MinerU[MinerU Parser<br/>Complex multimodal]
            Docling[Docling Parser<br/>Fast lightweight]
            FileRouter[Smart Router<br/>Complexity scoring]
        end
        
        subgraph "RAG Engine"
            LightRAG[LightRAG Instance Pool<br/>LRU Cache 50]
            KG[Knowledge Graph Engine]
            Vector[Vector Retrieval Engine]
        end
    end
    
    subgraph "Storage Layer"
        DragonflyDB[(DragonflyDB<br/>KV Storage)]
        Qdrant[(Qdrant<br/>Vector Database)]
        Memgraph[(Memgraph<br/>Graph Database)]
        Local[(Local Files<br/>Temp Storage)]
    end
    
    subgraph "External Services"
        LLM[LLM<br/>Entity Extraction/Generation]
        Embedding[Embedding<br/>Vectorization]
        Rerank[Rerank<br/>Reranking]
    end
    
    Client --> FastAPI
    WebUI --> FastAPI
    FastAPI --> Auth
    Auth --> TenantMgr
    TenantMgr --> TaskQueue
    TenantMgr --> LightRAG
    
    TaskQueue --> FileRouter
    FileRouter --> DeepSeekOCR
    FileRouter --> MinerU
    FileRouter --> Docling
    DeepSeekOCR --> LightRAG
    MinerU --> LightRAG
    Docling --> LightRAG
    
    LightRAG --> KG
    LightRAG --> Vector
    
    KG --> DragonflyDB
    KG --> Memgraph
    Vector --> Qdrant
    LightRAG --> Local
    
    LightRAG --> LLM
    LightRAG --> Embedding
    Vector --> Rerank
    
    style FastAPI fill:#00C7B7
    style LightRAG fill:#FF6B6B
    style DeepSeekOCR fill:#5DADE2
    style MinerU fill:#4ECDC4
    style Docling fill:#95E1D3
    style TenantMgr fill:#F38181
Loading

Multi-tenant Architecture

graph TB
    subgraph "Tenant A"
        A_Config[Tenant A Config<br/>Independent API Key]
        A_Instance[LightRAG Instance A<br/>Dedicated LLM/Embedding]
        A_Data[(Tenant A Data<br/>Fully Isolated)]
        A_Config --> A_Instance
        A_Instance --> A_Data
    end

    subgraph "Tenant B"
        B_Config[Tenant B Config<br/>Independent API Key]
        B_Instance[LightRAG Instance B<br/>Dedicated LLM/Embedding]
        B_Data[(Tenant B Data<br/>Fully Isolated)]
        B_Config --> B_Instance
        B_Instance --> B_Data
    end

    subgraph "Tenant C"
        C_Config[Using Global Config]
        C_Instance[LightRAG Instance C<br/>Shared LLM/Embedding]
        C_Data[(Tenant C Data<br/>Fully Isolated)]
        C_Config --> C_Instance
        C_Instance --> C_Data
    end

    Pool[Instance Pool Manager<br/>LRU Cache + Config Isolation]
    Global[Global Config<br/>Default API Key]

    Pool --> A_Instance
    Pool --> B_Instance
    Pool --> C_Instance

    C_Config -.fallback.-> Global

    style Pool fill:#F38181
    style Global fill:#95E1D3
    style A_Config fill:#FFD93D
    style B_Config fill:#FFD93D
    style C_Config fill:#E8E8E8
Loading

Core Technology Stack

🔧 Frameworks & Runtime

  • FastAPI 0.115+
  • Python 3.11+
  • Uvicorn
  • Docker & Docker Compose

🧠 AI & RAG

  • LightRAG 1.4.9.4
  • RAG-Anything
  • MinerU (PDF-Extract-Kit)
  • Docling

💾 Storage & Database

  • DragonflyDB(Redis compatible)
  • Qdrant(Vector Database)
  • Memgraph(Graph Database)
  • Local filesystem

🚀 Quick Start

Option 1: One-click Deployment (Recommended)

Suitable for production and testing environments:

# 1. Clone the project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api

# 2. Configure environment variables
cp env.example .env
nano .env  # Fill in your API keys

# 3. Run deployment script
chmod +x deploy.sh
./deploy.sh

# Select deployment mode:
# 1) Production Mode - Standard container deployment
# 2) Development Mode - Code hot-reload

# 4. Verify service
curl http://localhost:8000/

Access Swagger Documentation: http://localhost:8000/docs

Option 2: Docker Compose

Production Mode

# Configure environment variables
cp env.example .env
nano .env

# Start services
docker compose -f docker-compose.yml up -d

# View logs
docker compose -f docker-compose.yml logs -f

Development Mode (Code Hot-reload)

# Start development environment
docker compose -f docker-compose.dev.yml up -d

# Or use quick script
./scripts/dev.sh

# Code changes will auto-reload without restart

Option 3: Local Development

# Install uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Configure environment variables
cp env.example .env
nano .env

# Start services
uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Environment Variable Configuration

Minimum configuration (required):

# LLM Configuration (Function-oriented naming)
LLM_API_KEY=your_llm_api_key
LLM_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
LLM_MODEL=ep-xxx-xxx
# LLM_REQUESTS_PER_MINUTE=800        # Rate limit (optional)
# LLM_TOKENS_PER_MINUTE=40000        # Rate limit (optional)
# LLM_MAX_ASYNC=8                    # [Optional, expert mode] Manual concurrency control
#                                    # Auto-calculated when unset: min(RPM, TPM/3500) = 11

# Embedding Configuration (Function-oriented naming)
EMBEDDING_API_KEY=your_embedding_api_key
EMBEDDING_BASE_URL=https://api.siliconflow.cn/v1
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
EMBEDDING_DIM=1024
# EMBEDDING_MAX_ASYNC=32             # [Optional, expert mode] Auto-calculated when unset: 800

# MinerU Mode (Remote recommended)
MINERU_MODE=remote
MINERU_API_TOKEN=your_token
MINERU_HTTP_TIMEOUT=60              # MinerU download timeout (seconds, default 60)
FILE_SERVICE_BASE_URL=http://your-ip:8000

# VLM Chart Enhancement Configuration 🆕
# ⚠️ Note: Only effective in MINERU_MODE=remote
RAG_VLM_MODE=off                    # off / selective / full
RAG_IMPORTANCE_THRESHOLD=0.5        # Importance threshold (selective mode)
RAG_CONTEXT_WINDOW=2                # Context window (full mode)
RAG_CONTEXT_MODE=page               # page / chunk
RAG_MAX_CONTEXT_TOKENS=3000         # Max context tokens

# Task Storage Configuration 🆕
TASK_STORE_STORAGE=redis            # memory / redis (production recommends redis)

# Document Insert Verification Configuration 🆕
DOC_INSERT_VERIFICATION_TIMEOUT=300        # Verification timeout (seconds, default 5 minutes)
DOC_INSERT_VERIFICATION_POLL_INTERVAL=0.5  # Poll interval (seconds, default 500ms)

# Model Call Timeout Configuration 🆕
MODEL_CALL_TIMEOUT=90               # Model call max timeout (seconds, default 90)

⚡ Auto Concurrency Calculation:

  • LLM: When LLM_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/3500) ≈ 11
  • Embedding: When EMBEDDING_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/500) ≈ 800
  • Rerank: When RERANK_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/500) ≈ 800

✅ Recommended: Don't set *_MAX_ASYNC, let the system auto-calculate to completely avoid 429 errors

See env.example for complete configuration.


📚 API Documentation

Core Endpoints

1️⃣ Upload Document

# Single file upload (default mode)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc1" \
  -F "file=@document.pdf" \
  -F "parser=auto"

# VLM chart enhancement mode 🆕
# off: Markdown only (fastest, default)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc2&vlm_mode=off" \
  -F "file=@document.pdf"

# selective: Selective processing of important charts (balance performance and quality)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc3&vlm_mode=selective" \
  -F "file=@document.pdf"

# full: Complete RAG-Anything processing (highest quality, context enhancement enabled)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc4&vlm_mode=full" \
  -F "file=@document.pdf"

# Response
{
  "task_id": "task-xxx-xxx",
  "doc_id": "doc1",
  "filename": "document.pdf",
  "vlm_mode": "off",
  "status": "pending"
}

2️⃣ Batch Upload

curl -X POST "http://localhost:8000/batch?tenant_id=your_tenant" \
  -F "files=@doc1.pdf" \
  -F "files=@doc2.docx" \
  -F "files=@image.png"

# Response
{
  "batch_id": "batch-xxx-xxx",
  "total_files": 3,
  "accepted_files": 3,
  "tasks": [...]
}

3️⃣ Intelligent Query (Query API v2.0)

New Advanced Features:

  • Conversation History: Support for multi-turn conversation context
  • Custom Prompts: Customize response style
  • Response Format Control: paragraph/list/json
  • Keyword Precision Retrieval: hl_keywords/ll_keywords
  • Streaming Output: Real-time generation viewing
# Basic query
curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the core viewpoints in the document?",
    "mode": "hybrid"
  }'

# Advanced query (multi-turn dialogue + custom prompt)
curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Can you elaborate on the second point?",
    "mode": "hybrid",
    "conversation_history": [
      {"role": "user", "content": "What are the key points?"},
      {"role": "assistant", "content": "There are mainly three points..."}
    ],
    "user_prompt": "Please answer in professional academic language",
    "response_type": "list"
  }'

# Streaming query (SSE)
curl -N -X POST "http://localhost:8000/query/stream?tenant_id=your_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the core viewpoints in the document?",
    "mode": "hybrid"
  }'

# Response (real-time streaming output)
data: {"chunk": "Based on", "done": false}
data: {"chunk": "document content", "done": false}
data: {"done": true}

4️⃣ Task Status Query

curl "http://localhost:8000/task/task-xxx-xxx?tenant_id=your_tenant"

# Response
{
  "task_id": "task-xxx-xxx",
  "status": "completed",
  "progress": 100,
  "result": {...}
}

5️⃣ Tenant Management

# Get tenant statistics
curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant"

# Clear tenant cache
curl -X DELETE "http://localhost:8000/tenants/cache?tenant_id=your_tenant"

# View instance pool status (admin)
curl "http://localhost:8000/tenants/pool/stats"

VLM Mode Comparison 🆕

Mode Speed Quality Resource Usage Use Case
off ⚡⚡⚡⚡⚡ ⭐⭐⭐ Very Low Plain text documents, fast batch processing
selective ⚡⚡⚡⚡ ⭐⭐⭐⭐ Low Documents with key charts (recommended)
full ⚡⚡ ⭐⭐⭐⭐⭐ High Chart-intensive research reports, papers

Processing Time Estimate (20-page PDF example):

  • off: ~10 seconds(Markdown only)
  • selective: ~30 seconds(5-10 important charts)
  • full: ~120 seconds(complete context processing)

Query Mode Comparison

Mode Speed Accuracy Use Case
naive ⚡⚡⚡⚡⚡ ⭐⭐⭐ Simple Q&A, fast retrieval
local ⚡⚡⚡⚡ ⭐⭐⭐⭐ Local entity relationship queries
global ⚡⚡⚡ ⭐⭐⭐⭐ Global knowledge graph reasoning
hybrid ⚡⚡⚡ ⭐⭐⭐⭐⭐ Hybrid retrieval (recommended)
mix ⚡⚡ ⭐⭐⭐⭐⭐ Complex questions, deep analysis

Query API v2.0 Advanced Parameters

Parameter Type Description Example
conversation_history List[Dict] Multi-turn conversation context [{"role": "user", "content": "..."}]
user_prompt str Custom prompt "Please answer in professional academic language"
response_type str Response format "paragraph", "list", "json"
hl_keywords List[str] High priority keywords ["artificial intelligence", "machine learning"]
ll_keywords List[str] Low priority keywords ["application", "case study"]
only_need_context bool Return context only (debug) true
max_entity_tokens int Entity token limit 6000

Complete API documentation:http://localhost:8000/docs


🎯 Usage Examples

Python SDK

import requests

# Configuration
BASE_URL = "http://localhost:8000"
TENANT_ID = "your_tenant"

# Upload document
with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/insert",
        params={"tenant_id": TENANT_ID, "doc_id": "doc1"},
        files={"file": f}
    )
    task_id = response.json()["task_id"]
    print(f"Task ID: {task_id}")

# Query
response = requests.post(
    f"{BASE_URL}/query",
    params={"tenant_id": TENANT_ID},
    json={
        "query": "What is the main content of the document?",
        "mode": "hybrid",
        "top_k": 10
    }
)
result = response.json()
print(f"Answer: {result['answer']}")

Complete cURL Example

# 1. Upload PDF document
TASK_ID=$(curl -X POST "http://localhost:8000/insert?tenant_id=demo&doc_id=report" \
  -F "file=@report.pdf" | jq -r '.task_id')

echo "Task ID: $TASK_ID"

# 2. Wait for processing completion
while true; do
  STATUS=$(curl -s "http://localhost:8000/task/$TASK_ID?tenant_id=demo" | jq -r '.status')
  echo "Status: $STATUS"
  if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
    break
  fi
  sleep 2
done

# 3. Query document content
curl -X POST "http://localhost:8000/query?tenant_id=demo" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main conclusions of this report?",
    "mode": "hybrid"
  }' | jq '.answer'

🛠️ Deployment

System Requirements

Minimum Configuration:

  • CPU: 2 cores
  • RAM: 4GB
  • Disk: 40GB SSD
  • OS: Ubuntu 20.04+ / Debian 11+ / CentOS 8+

Recommended Configuration (Production):

  • CPU: 4 cores
  • RAM: 8GB
  • Disk: 100GB SSD
  • OS: Ubuntu 22.04 LTS

Server Deployment

Quick Deployment on Aliyun/Tencent Cloud

# SSH login to server
ssh root@your-server-ip

# Clone project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api

# Run one-click deployment script
chmod +x deploy.sh
./deploy.sh

# The script will automatically:
# 1. Install Docker and Docker Compose
# 2. Configure environment variables
# 3. Optimize system parameters
# 4. Start services
# 5. Verify health status

External Storage Configuration

Supports DragonflyDB + Qdrant + Memgraph external storage (enabled by default):

# Configure in .env
USE_EXTERNAL_STORAGE=true

# DragonflyDB configuration (KV Storage)
KV_STORAGE=RedisKVStorage
REDIS_URI=redis://dragonflydb:6379/0

# Qdrant configuration (vector storage)
VECTOR_STORAGE=QdrantVectorDBStorage
QDRANT_URL=http://qdrant:6333

# Memgraph configuration (graph storage)
GRAPH_STORAGE=MemgraphStorage
MEMGRAPH_URI=bolt://memgraph:7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=

See External Storage Deployment Documentation

Docker Compose Configuration

The project provides two configuration files:

File Purpose Features
docker-compose.yml Production mode Code packaged in image, optimal performance
docker-compose.dev.yml Development mode Code mounted externally, supports hot-reload

Select configuration file:

# Production mode
docker compose -f docker-compose.yml up -d

# Development mode
docker compose -f docker-compose.dev.yml up -d

Performance Optimization

Tuning Parameters

Configure in .env:

# ⚡ Concurrency Control (Recommended: use auto-calculation)
# LLM_MAX_ASYNC=8                    # [Expert mode] Manually specify LLM concurrency
#                                    # Auto-calculated when unset: min(RPM, TPM/3500) ≈ 11
# EMBEDDING_MAX_ASYNC=32             # [Expert mode] Manually specify Embedding concurrency
#                                    # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800
# RERANK_MAX_ASYNC=16                # [Expert mode] Manually specify Rerank concurrency
#                                    # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800

# Retrieval count (affects query quality and speed)
TOP_K=20  # Entity/relationship retrieval count
CHUNK_TOP_K=10  # Text chunk retrieval count

# Document processing concurrency
DOCUMENT_PROCESSING_CONCURRENCY=10  # Remote mode can be set high, local mode set to 1

🎯 Concurrency Configuration Recommendations:

  • Recommended: Don't set *_MAX_ASYNC, let the system auto-calculate based on TPM/RPM
  • Expert mode: If manual control needed, can set LLM_MAX_ASYNC and other parameters
  • Advantage: Auto-calculation completely avoids 429 errors (TPM limit reached)

Mode Selection

  • MinerU Remote Mode (Recommended): High concurrency, resource-efficient
  • MinerU Local Mode: Requires GPU, high memory usage
  • Docling Mode: Fast and lightweight, suitable for simple documents

🏢 Multi-tenant Usage

Tenant Isolation

Each tenant has:

  • ✅ Independent LightRAG instance
  • ✅ Isolated data storage space
  • ✅ Independent vector index
  • ✅ Dedicated knowledge graph
  • Independent service configuration (LLM, Embedding, Rerank, DeepSeek-OCR, MinerU)🆕

Tenant Configuration Management 🆕

Each tenant can independently configure 5 services with hot-reload support:

# 1️⃣ Configure independent DeepSeek-OCR API key for Tenant A
curl -X PUT "http://localhost:8000/tenants/tenant_a/config" \
  -H "Content-Type: application/json" \
  -d '{
    "ds_ocr_config": {
      "api_key": "sk-tenant-a-ds-ocr-key",
      "base_url": "https://api.siliconflow.cn/v1",
      "model": "deepseek-ai/DeepSeek-OCR",
      "timeout": 90
    }
  }'

# 2️⃣ Configure independent MinerU API token for Tenant B
curl -X PUT "http://localhost:8000/tenants/tenant_b/config" \
  -H "Content-Type: application/json" \
  -d '{
    "mineru_config": {
      "api_token": "tenant-b-mineru-token",
      "base_url": "https://mineru.net",
      "model_version": "vlm"
    }
  }'

# 3️⃣ Configure multiple services simultaneously (LLM + Embedding + DeepSeek-OCR)
curl -X PUT "http://localhost:8000/tenants/tenant_c/config" \
  -H "Content-Type: application/json" \
  -d '{
    "llm_config": {
      "api_key": "sk-tenant-c-llm-key",
      "model": "gpt-4"
    },
    "embedding_config": {
      "api_key": "sk-tenant-c-embedding-key",
      "model": "Qwen/Qwen3-Embedding-0.6B",
      "dim": 1024
    },
    "ds_ocr_config": {
      "api_key": "sk-tenant-c-ds-ocr-key"
    }
  }'

# 4️⃣ Query tenant configuration (API key auto-masked)
curl "http://localhost:8000/tenants/tenant_a/config"

# Response example
{
  "tenant_id": "tenant_a",
  "ds_ocr_config": {
    "api_key": "sk-***-key",  // Auto-masked
    "timeout": 90
  },
  "merged_config": {
    "llm": {...},        // Using Global Config
    "embedding": {...},  // Using Global Config
    "rerank": {...},     // Using Global Config
    "ds_ocr": {...},     // Using tenant config
    "mineru": {...}      // Using Global Config
  }
}

# 5️⃣ Refresh config cache (config hot-reload)
curl -X POST "http://localhost:8000/tenants/tenant_a/config/refresh"

# 6️⃣ Delete tenant config (restore to global config)
curl -X DELETE "http://localhost:8000/tenants/tenant_a/config"

Supported Configuration Items:

Service Config Field Description
LLM llm_config Model, API key, base_url, etc.
Embedding embedding_config Model, API key, dimension, etc.
Rerank rerank_config Model, API key, etc.
DeepSeek-OCR ds_ocr_config API key, timeout, mode, etc.
MinerU mineru_config API token, version, timeout, etc.

Configuration Priority: Tenant config > Global config

Use Cases:

  • 🔐 Multi-tenant SaaS: Each tenant uses their own API key
  • 💰 Pay-per-use: Track tenant usage through independent API keys
  • 🎯 Differentiated Services: Different tenants use different models (GPT-4 vs GPT-3.5)
  • 🧪 A/B Testing: Compare different models/parameters

Usage

All APIs require tenant_id parameter:

# Tenant A upload document
curl -X POST "http://localhost:8000/insert?tenant_id=tenant_a&doc_id=doc1" \
  -F "file=@doc.pdf"

# Tenant B upload document (fully isolated)
curl -X POST "http://localhost:8000/insert?tenant_id=tenant_b&doc_id=doc1" \
  -F "file=@doc.pdf"

# Tenant A query (can only query own documents)
curl -X POST "http://localhost:8000/query?tenant_id=tenant_a" \
  -H "Content-Type: application/json" \
  -d '{"query": "document content", "mode": "hybrid"}'

Instance Pool Management

  • Capacity: Cache up to 50 tenant instances
  • Strategy: LRU (Least Recently Used) automatic cleanup
  • Config Isolation: Each tenant can use independent LLM, Embedding, parser configuration

📊 Monitoring & Maintenance

Common Commands

# View service status
docker compose ps

# View real-time logs
docker compose logs -f

# Restart services
docker compose restart

# Stop services
docker compose down

# View resource usage
docker stats

# Clean Docker resources
docker system prune -f

Maintenance Scripts

# Monitor service health
./scripts/monitor.sh

# Backup data
./scripts/backup.sh

# Update services
./scripts/update.sh

# Performance testing
./scripts/test_concurrent_perf.sh

# Performance monitoring
./scripts/monitor_performance.sh

Health Checks

# Complete health check (recommended)
./scripts/health_check.sh
./scripts/health_check.sh --verbose  # verbose output

# API health check
curl http://localhost:8000/

# Tenant statistics
curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant"

# Instance pool status
curl "http://localhost:8000/tenants/pool/stats"

🗂️ Project Structure

rag-api/
├── main.py                 # FastAPI application entry
├── api/                    # API route modules
│   ├── __init__.py         # Route aggregation
│   ├── insert.py           # Document upload (single/batch)
│   ├── query.py            # Intelligent query
│   ├── task.py             # Task status query
│   ├── tenant.py           # Tenant management
│   ├── files.py            # File service
│   ├── models.py           # Pydantic models
│   └── task_store.py       # Task storage
├── src/                    # Core business logic
│   ├── rag.py              # LightRAG lifecycle management
│   ├── multi_tenant.py     # Multi-tenant instance manager
│   ├── tenant_deps.py      # Tenant dependency injection
│   ├── logger.py           # Unified logging
│   ├── metrics.py          # Performance metrics
│   ├── file_url_service.py # Temporary file service
│   ├── mineru_client.py    # MinerU client
│   └── mineru_result_processor.py  # Result processing
├── docs/                   # Documentation
│   ├── ARCHITECTURE.md     # Architecture design documentation
│   ├── USAGE.md            # Detailed usage guide
│   ├── DEPLOY_MODES.md     # Deployment mode description
│   ├── PR_WORKFLOW.md      # PR workflow
│   └── ...
├── scripts/                # Maintenance scripts
│   ├── dev.sh              # Development mode quick start
│   ├── monitor.sh          # Service monitoring
│   ├── backup.sh           # Data backup
│   ├── update.sh           # Service update
│   └── ...
├── deploy.sh               # One-click deployment script
├── docker-compose.yml      # Production mode configuration
├── docker-compose.dev.yml  # Development mode configuration
├── Dockerfile              # Production image
├── Dockerfile.dev          # Development image
├── pyproject.toml          # Project dependencies
├── uv.lock                 # Dependency lock
├── env.example             # Environment variable template
├── CLAUDE.md               # Claude AI guide
└── README.md               # This documentation

🐛 Troubleshooting

Common Issues

Q1: What to do if service fails to start?
# View detailed logs
docker compose logs

# Check port usage
netstat -tulpn | grep 8000

# Check Docker status
docker ps -a
Q2: multimodal_processed error?

Note: This issue has been fixed in LightRAG 1.4.9.4+. If you encounter this error, your version is outdated.

Solution:

# Option 1: Upgrade to latest version (recommended)
# Modify LightRAG version in pyproject.toml
# lightrag = "^1.4.9.4"

# Rebuild image
docker compose down
docker compose up -d --build

# Option 2: Clean old data (temporary solution)
rm -rf ./rag_local_storage
docker compose restart
Q3: File upload returns 400 error?

Check:

  • File format supported (PDF, DOCX, PNG, JPG, etc.)
  • File size exceeds 100MB
  • File is empty
# View supported formats
curl http://localhost:8000/docs
Q3.5: Embedding dimension error?

If you encounter dimension-related errors, need to clean data and rebuild:

# Stop services
docker compose down

# Delete all volumes (clear database)
docker volume rm rag-api_dragonflydb_data rag-api_qdrant_data rag-api_memgraph_data

# Modify EMBEDDING_DIM in .env
EMBEDDING_DIM=1024  # or 4096, must match the model

# Restart
docker compose up -d
Q4: Query is very slow (>30 seconds)?

Optimization suggestions:

  1. Use naive or hybrid mode instead of mix
  2. Increase MAX_ASYNC parameter (in .env)
  3. Reduce TOP_K and CHUNK_TOP_K
  4. Enable Reranker
# Modify .env
MAX_ASYNC=8
TOP_K=20
CHUNK_TOP_K=10
Q5: Out of memory (OOM)?

If using local MinerU:

# Switch to remote mode
# Modify in .env
MINERU_MODE=remote
MINERU_API_TOKEN=your_token

# Or limit concurrency
DOCUMENT_PROCESSING_CONCURRENCY=1
Q6: Tasks lost after container restart?

Problem Symptoms:

  • Cannot query previous task status after container restart
  • Tasks disappear after tenant instance evicted by LRU

Solution: Enable Redis task storage

# Modify .env
TASK_STORE_STORAGE=redis

# Restart services
docker compose restart

# Verify
docker compose logs api | grep TaskStore
# Should see: ✅ TaskStore: Redis connection successful

Configuration Description:

  • memory mode: In-memory storage, data lost after restart (default, suitable for development)
  • redis mode: Persistent storage, supports container restart and instance rebuild (production recommended)

TTL Strategy (Redis mode auto-cleanup):

  • completed tasks: 24 hours
  • failed tasks: 24 hours
  • pending/processing tasks: 6 hours
Q7: VLM mode processing failed?

Check Items:

  1. vision_model_func not configured

    • Check logs:vision_model_func not found, fallback to off mode
    • Ensure LLM API is configured in .env
  2. Image file does not exist

    • Check logs:Image file not found: xxx
    • Possibly corrupted MinerU ZIP or extraction failed
  3. Timeout error

    • full mode may timeout on large files
    • Suggestion: Use selective mode first, or increase VLM_TIMEOUT
# Modify .env
VLM_TIMEOUT=300  # Increase to 5 minutes
RAG_VLM_MODE=selective  # downgrade to selective

Debugging Tips:

# View detailed logs
docker compose logs -f | grep VLM

# Test single file
curl -X POST 'http://localhost:8000/insert?tenant_id=test&doc_id=test&vlm_mode=off' \
  -F 'file=@test.pdf'

Performance Tuning Recommendations

Scenario MAX_ASYNC TOP_K CHUNK_TOP_K MINERU_MODE
Fast response 8 10 5 remote
Balanced mode 8 20 10 remote
High accuracy 4 60 20 remote
Resource limited 4 20 10 remote

📖 Documentation


🤝 Contributing

We welcome all forms of contribution!

How to Contribute

  1. Fork the project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api
  1. Create feature branch
git checkout -b feature/your-feature-name
  1. Development and Testing
# Install dependencies
uv sync

# Run tests
uv run pytest

# Code formatting
uv run black .
uv run isort .
  1. Submit code
git add .
git commit -m "feat: Add new feature"
git push origin feature/your-feature-name
  1. Create Pull Request

Create a PR on GitHub with detailed description of your changes.

Commit Conventions

Use semantic commit messages:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation update
  • style: Code formatting
  • refactor: Code refactoring
  • perf: Performance optimization
  • test: Testing
  • chore: Build/tools

See PR Workflow Documentation


📄 License

This project is licensed under the MIT License. See the LICENSE file for details.


🙏 Acknowledgments

This project is built on the following excellent open source projects:

  • LightRAG - Efficient knowledge graph RAG framework
  • RAG-Anything - Multimodal document parsing
  • MinerU - Powerful PDF parsing tool
  • Docling - Lightweight document parsing
  • FastAPI - Modern Python web framework

Special thanks to all contributors and users for their support! 🎉


📬 Contact Us


⭐ If this project helps you, please give it a Star!

Made with ❤️ by BukeLy

© 2025 RAG API. All rights reserved.

About

Multi-tenant RAG API powered by LightRAG/RAG-Anything. Auto-selects best parser (DeepSeek-OCR/MinerU/Docling) via complexity scoring

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.