MONTASHI-OCR is a self-hosted, GPU-accelerated document understanding pipeline. It turns messy real-world inputs scanned PDFs, native PDFs, DOCX (Original layout preserved) files, and images into clean, structured outputs (Markdown, JSON layout, and reconstructed DOCX) ready for downstream RAG, search, or content workflows.
-
Updated
Jun 6, 2026 - Python