Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

amit-timalsina/document_classification

Open more actions menu

Repository files navigation

Document Classification: All in one place

This package provides support to classify documents using many popular methods. Along with document classification, it also provides support to a single interface for OCR using both open source models like: Tesseract and PaddleOCR, and commercial models like Google OCR, etc.

PYPI: document-classification

Features

  • OCR
    • Tesseract
    • Google OCR
  • Classification
    • Fasttext (train, evaluate, predict)
    • Language Models like BERT (train, evaluate, predict)
    • Language + Layout Models like LayoutLM (train, evaluate, predict)
    • LLM (evaluate, predict)

Installation

Install with a single command:

pip install -U document-classification

or if you use poetry (like me):

poetry add document-classification

Usuage

Please check the examples directory for examples on how to use the package.

Contributing

Your contributions are welcome! If you have great examples or find neat patterns, clone the repo and add another example. The goal is to find great patterns and cool examples to highlight.

If you encounter any issues or want to provide feedback, you can create an issue in this repository. You can also reach out to me on Twitter at @amittimalsina14.

Check the todo.md file for the list of features that are coming next with their due dates.

What's coming next?

I am going to first add tests and refactor the code to make it more readable, usuable, and maintainable. Then I will release documentation and more examples.

About

All in one package for Document (image, pdf) Classification. Unified Interface for google ocr and tesseract. Train, evaluate, and infer using fasttext, Small language models (NER), Small Vision Language Models (layoutlm), and LLM.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.