pdfto

Here is 1 public repository matching this topic...

mdmonsurali / MONTASHI-OCR

MONTASHI-OCR is a self-hosted, GPU-accelerated document understanding pipeline. It turns messy real-world inputs scanned PDFs, native PDFs, DOCX (Original layout preserved) files, and images into clean, structured outputs (Markdown, JSON layout, and reconstructed DOCX) ready for downstream RAG, search, or content workflows.

python pdf ocr scan pdftotext pdf-parser rag pdfto pdftoword pdftomarkdown

Updated Jun 6, 2026
Python

Improve this page

Add a description, image, and links to the pdfto topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdfto topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdfto

Here is 1 public repository matching this topic...

mdmonsurali / MONTASHI-OCR

Improve this page

Add this topic to your repo

Search code, repositories, users, issues, pull requests...

pdfto

Here is 1 public repository matching this topic...

mdmonsurali / MONTASHI-OCR

Improve this page

Add this topic to your repo