Ask questions about your documents — entirely on your machine, zero API costs.
rag-cli is a command-line RAG (Retrieval-Augmented Generation) tool that lets you index a PDF, a text file, or an entire folder of documents, then query them in natural language using a local LLM. Everything runs locally via Ollama: no cloud, no keys, no data leaving your machine.
$ python main.py query "What are the main configuration options?"
Q: What are the main configuration options?
A: According to the documentation, the three main configuration options are...
── Sources ──────────────────────────────
1. docs/manual.pdf (p.12)
"Configuration is handled through environment variables or a .env file..."
2. docs/manual.pdf (p.14)
"Advanced options can be set at runtime via the --model flag..."
| Layer | Technology |
|---|---|
| LLM & Embeddings | Ollama (llama3.2, nomic-embed-text) |
| RAG Framework | LangChain 1.x (LCEL) |
| Vector Store | FAISS — persistent, local, no server needed |
| Document Parsing | pypdf, docx2txt, built-in text loaders |
| TTS | edge-tts — Microsoft Neural voices (multilingual) |
| CLI | Typer + Rich |
┌─────────────────────────────────────────────────────────────────┐
│ rag-cli │
│ │
│ INDEX QUERY │
│ │
│ PDF / TXT / MD / DOCX natural language question │
│ │ │ │
│ ▼ ▼ │
│ Document Loader OllamaEmbeddings (nomic) │
│ │ │ │
│ ▼ ▼ │
│ RecursiveCharacter FAISS MMR Retriever │
│ TextSplitter (top-k relevant chunks) │
│ │ │ │
│ ▼ ▼ │
│ OllamaEmbeddings ──────► FAISS ChatPromptTemplate │
│ (nomic-embed-text) (on disk) │ │
│ ▼ │
│ ChatOllama (llama3.2) │
│ │ │
│ ▼ │
│ streamed answer │
└─────────────────────────────────────────────────────────────────┘
Retrieval strategy: MMR (Maximum Marginal Relevance) — retrieved chunks are ranked by relevance to the query and diversity from each other, reducing redundancy in the context window.
- Python 3.10+
- Ollama installed and running
Pull the required models once:
ollama pull nomic-embed-text # ~274 MB — embedding model
ollama pull llama3.2 # ~2 GB — default LLM (or swap for mistral, etc.)git clone https://github.com/BenoitGaudieri/rag-cli
cd rag-cli
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt# Single file
python main.py index ./docs/report.pdf
# Entire folder (PDF, TXT, MD, DOCX — recursive)
python main.py index ./docs/
# Named collection (to keep multiple knowledge bases separate)
python main.py index ./docs/ --collection myproject# One-shot question
python main.py query "Summarise the key findings"
# Show the source chunks used to generate the answer
python main.py query "What are the installation steps?" --sources
# Save the answer to the output/ folder (format inferred from extension)
python main.py query "Summarise the CV" --output summary.md
python main.py query "List key skills" --output skills.json
# Interactive REPL — ask multiple questions in a session
python main.py query
# Override the LLM at runtime
python main.py query "Translate chapter 1 to Italian" --model mistral
# Query a named collection
python main.py query "..." --collection myprojectRead text, documents, or query answers aloud using Microsoft Neural voices via edge-tts. Requires an internet connection for synthesis; audio playback is handled natively (no extra dependencies on Windows).
# Read a string directly
python main.py speak "Ciao, questo è un test"
# Read a PDF file
python main.py speak ./docs/report.pdf
# Read a saved answer (.json, .txt, .md)
python main.py speak output/summary.json
# Truncate long documents to the first 3 000 characters
python main.py speak ./docs/report.pdf --max-chars 3000
# Save the synthesised audio to an MP3 instead of playing it
python main.py speak ./docs/report.pdf --save audio/report.mp3
# Use a different voice
python main.py speak "hello" --voice en-US-AriaNeuralAdd --speak / -S to any query call to have the answer read aloud automatically:
# Single question
python main.py query "Riassumi il documento" --speak
# Interactive REPL — every answer is read aloud
python main.py query --speak
# Different voice
python main.py query "..." --speak --voice it-IT-IsabellaNeuralAvailable Italian voices: it-IT-ElsaNeural (default, female), it-IT-IsabellaNeural (female), it-IT-DiegoNeural (male).
Run the same question(s) against multiple models and collect results in a CSV or JSON file.
# Single question, two models
python main.py compare "What are your main skills?" --models "llama3.2,mistral" --output comparison.csv
# Multiple questions from a file (one per line)
python main.py compare questions.txt --models "llama3.2,mistral,phi3" --output comparison.csv
# Save as JSON instead
python main.py compare "Summarise the document" --models "llama3.2,mistral" --output comparison.jsonThe output CSV has four columns: question, model, answer, latency_s.
python main.py list # list all collections + chunk counts
python main.py clear --collection myproject # delete one collection
python main.py clear # delete everythingAll defaults can be overridden via environment variables (or a .env file):
| Variable | Default | Description |
|---|---|---|
RAG_LLM_MODEL |
llama3.2 |
Ollama model for generation |
RAG_EMBED_MODEL |
nomic-embed-text |
Ollama model for embeddings |
RAG_COLLECTION |
default |
FAISS collection name |
RAG_INDEX_DIR |
./faiss_db |
Vector index directory |
RAG_OUTPUT_DIR |
./output |
Directory for saved answers and compare results |
RAG_CHUNK_SIZE |
1000 |
Characters per text chunk |
RAG_CHUNK_OVERLAP |
200 |
Overlap between consecutive chunks |
RAG_TOP_K |
5 |
Number of chunks retrieved per query |
RAG_TTS_VOICE |
it-IT-ElsaNeural |
Default TTS voice |
RAG_TTS_MAX_CHARS |
0 |
Max characters to synthesise (0 = no limit) |
Example:
RAG_LLM_MODEL=mistral RAG_TOP_K=8 python main.py query "..."rag-cli/
├── main.py # CLI entry point — index / query / speak / list / clear / compare
├── rag/
│ ├── config.py # all parameters, overridable via env vars
│ ├── indexer.py # document loading, chunking, embedding → FAISS
│ ├── chain.py # LCEL RAG chain, MMR retriever, streaming output
│ └── tts.py # TTS synthesis (edge-tts) + text extraction from PDF/JSON/MD
├── requirements.txt
├── faiss_db/ # auto-created on first index ← gitignored
└── output/ # saved answers and compare CSVs ← gitignored
| Extension | Loader |
|---|---|
.pdf |
PyPDFLoader (pypdf) |
.txt |
TextLoader (UTF-8 autodetect) |
.md |
TextLoader |
.docx |
Docx2txtLoader |
MIT