Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BenoitGaudieri/rag-cli

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rag-cli

Ask questions about your documents — entirely on your machine, zero API costs.

rag-cli is a command-line RAG (Retrieval-Augmented Generation) tool that lets you index a PDF, a text file, or an entire folder of documents, then query them in natural language using a local LLM. Everything runs locally via Ollama: no cloud, no keys, no data leaving your machine.

$ python main.py query "What are the main configuration options?"

Q: What are the main configuration options?

A: According to the documentation, the three main configuration options are...

── Sources ──────────────────────────────
  1. docs/manual.pdf (p.12)
     "Configuration is handled through environment variables or a .env file..."
  2. docs/manual.pdf (p.14)
     "Advanced options can be set at runtime via the --model flag..."

Stack

Layer Technology
LLM & Embeddings Ollama (llama3.2, nomic-embed-text)
RAG Framework LangChain 1.x (LCEL)
Vector Store FAISS — persistent, local, no server needed
Document Parsing pypdf, docx2txt, built-in text loaders
TTS edge-tts — Microsoft Neural voices (multilingual)
CLI Typer + Rich

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                           rag-cli                               │
│                                                                 │
│  INDEX                              QUERY                       │
│                                                                 │
│  PDF / TXT / MD / DOCX              natural language question   │
│         │                                    │                  │
│         ▼                                    ▼                  │
│   Document Loader               OllamaEmbeddings (nomic)        │
│         │                                    │                  │
│         ▼                                    ▼                  │
│  RecursiveCharacter               FAISS MMR Retriever           │
│    TextSplitter                   (top-k relevant chunks)       │
│         │                                    │                  │
│         ▼                                    ▼                  │
│  OllamaEmbeddings  ──────►   FAISS     ChatPromptTemplate       │
│   (nomic-embed-text)        (on disk)        │                  │
│                                               ▼                 │
│                                       ChatOllama (llama3.2)     │
│                                               │                 │
│                                               ▼                 │
│                                       streamed answer           │
└─────────────────────────────────────────────────────────────────┘

Retrieval strategy: MMR (Maximum Marginal Relevance) — retrieved chunks are ranked by relevance to the query and diversity from each other, reducing redundancy in the context window.


Requirements

  • Python 3.10+
  • Ollama installed and running

Pull the required models once:

ollama pull nomic-embed-text   # ~274 MB — embedding model
ollama pull llama3.2           # ~2 GB  — default LLM (or swap for mistral, etc.)

Installation

git clone https://github.com/BenoitGaudieri/rag-cli
cd rag-cli

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

pip install -r requirements.txt

Usage

Index documents

# Single file
python main.py index ./docs/report.pdf

# Entire folder (PDF, TXT, MD, DOCX — recursive)
python main.py index ./docs/

# Named collection (to keep multiple knowledge bases separate)
python main.py index ./docs/ --collection myproject

Query

# One-shot question
python main.py query "Summarise the key findings"

# Show the source chunks used to generate the answer
python main.py query "What are the installation steps?" --sources

# Save the answer to the output/ folder (format inferred from extension)
python main.py query "Summarise the CV" --output summary.md
python main.py query "List key skills" --output skills.json

# Interactive REPL — ask multiple questions in a session
python main.py query

# Override the LLM at runtime
python main.py query "Translate chapter 1 to Italian" --model mistral

# Query a named collection
python main.py query "..." --collection myproject

Text-to-speech

Read text, documents, or query answers aloud using Microsoft Neural voices via edge-tts. Requires an internet connection for synthesis; audio playback is handled natively (no extra dependencies on Windows).

# Read a string directly
python main.py speak "Ciao, questo è un test"

# Read a PDF file
python main.py speak ./docs/report.pdf

# Read a saved answer (.json, .txt, .md)
python main.py speak output/summary.json

# Truncate long documents to the first 3 000 characters
python main.py speak ./docs/report.pdf --max-chars 3000

# Save the synthesised audio to an MP3 instead of playing it
python main.py speak ./docs/report.pdf --save audio/report.mp3

# Use a different voice
python main.py speak "hello" --voice en-US-AriaNeural

Add --speak / -S to any query call to have the answer read aloud automatically:

# Single question
python main.py query "Riassumi il documento" --speak

# Interactive REPL — every answer is read aloud
python main.py query --speak

# Different voice
python main.py query "..." --speak --voice it-IT-IsabellaNeural

Available Italian voices: it-IT-ElsaNeural (default, female), it-IT-IsabellaNeural (female), it-IT-DiegoNeural (male).


Compare models

Run the same question(s) against multiple models and collect results in a CSV or JSON file.

# Single question, two models
python main.py compare "What are your main skills?" --models "llama3.2,mistral" --output comparison.csv

# Multiple questions from a file (one per line)
python main.py compare questions.txt --models "llama3.2,mistral,phi3" --output comparison.csv

# Save as JSON instead
python main.py compare "Summarise the document" --models "llama3.2,mistral" --output comparison.json

The output CSV has four columns: question, model, answer, latency_s.

Manage collections

python main.py list                          # list all collections + chunk counts
python main.py clear --collection myproject  # delete one collection
python main.py clear                         # delete everything

Configuration

All defaults can be overridden via environment variables (or a .env file):

Variable Default Description
RAG_LLM_MODEL llama3.2 Ollama model for generation
RAG_EMBED_MODEL nomic-embed-text Ollama model for embeddings
RAG_COLLECTION default FAISS collection name
RAG_INDEX_DIR ./faiss_db Vector index directory
RAG_OUTPUT_DIR ./output Directory for saved answers and compare results
RAG_CHUNK_SIZE 1000 Characters per text chunk
RAG_CHUNK_OVERLAP 200 Overlap between consecutive chunks
RAG_TOP_K 5 Number of chunks retrieved per query
RAG_TTS_VOICE it-IT-ElsaNeural Default TTS voice
RAG_TTS_MAX_CHARS 0 Max characters to synthesise (0 = no limit)

Example:

RAG_LLM_MODEL=mistral RAG_TOP_K=8 python main.py query "..."

Project structure

rag-cli/
├── main.py          # CLI entry point — index / query / speak / list / clear / compare
├── rag/
│   ├── config.py    # all parameters, overridable via env vars
│   ├── indexer.py   # document loading, chunking, embedding → FAISS
│   ├── chain.py     # LCEL RAG chain, MMR retriever, streaming output
│   └── tts.py       # TTS synthesis (edge-tts) + text extraction from PDF/JSON/MD
├── requirements.txt
├── faiss_db/        # auto-created on first index  ← gitignored
└── output/          # saved answers and compare CSVs ← gitignored

Supported file types

Extension Loader
.pdf PyPDFLoader (pypdf)
.txt TextLoader (UTF-8 autodetect)
.md TextLoader
.docx Docx2txtLoader

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.