OCR packages

Showing projects tagged as OCR

  • PyMuPDF

    8.5 9.7 Python
    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
  • pytesseract

    8.1 1.0 L5 Python
    A Python wrapper for Google Tesseract
  • Kreuzberg

    6.6 9.9 HTML
    A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Go, PHP, Elixir, and TypeScript/Node.js—or use via CLI, REST API, or MCP server.
  • pdftabextract

    6.4 0.0 L3 Python
    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
  • normcap

    6.0 9.5 Python
    OCR powered screen-capture tool to capture information instead of images
  • pyocr

    5.0 0.0 L5 Python
    DISCONTINUED. A wrapper for Tesseract and Cuneiform.