Simple Tesseract Python OCR

A simple tesseract python OCR done as a project for ASU 2020 for computer vision course.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Install the requirements using the following:

pip install -r requirements.txt

or if you are using python venv:

python -m venv venv
venv/Scripts/activate
pip install -r requirements.txt

duplicate the .env.example and rename it to .env and fill in the tesseract_path.

Running the code

you can get the list of paramters using the following:

python -m ocr --help

usage: __main__.py [-h] -i IMAGE [-c] [-t TEXT_OUTPUT_FILENAME]
                   [-f IMAGE_OUTPUT_FILENAME] [-v] [--getGrayScaleImage]
                   [--removeNoise] [--applyThresholding]
                   [--applyThresholdingInv] [--getDilatedImage]
                   [--getErodedImage] [--applyOpening] [--applyClosing]
                   [--getCannyResult]

A simple tesseract python script to get text from input image. by default this
list of preprocessing functions is used [getGrayScaleImage, removeNoise,
applyThresholdingInv, getDilatedImage]

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGE, --image IMAGE
                        path to input image
  -c, --show-final-image
                        show the final image with an overlay of the text
                        recognised. (default: False)
  -t TEXT_OUTPUT_FILENAME, --text-output-filename TEXT_OUTPUT_FILENAME
                        file name to put the text output in. (default:
                        output.txt)
  -f IMAGE_OUTPUT_FILENAME, --image-output-filename IMAGE_OUTPUT_FILENAME
                        filename to output the final image in. (default:
                        output.png)
  -v, --verbose         Show intermediate images. (default: False)
  --getGrayScaleImage   (PreProcessing) adds getGrayScaleImage to
                        preprocessing. order is important.
  --removeNoise         (PreProcessing) adds removeNoise to preprocessing.
                        order is important.
  --applyThresholding   (PreProcessing) adds applyThresholding to
                        preprocessing. order is important.
  --applyThresholdingInv
                        (PreProcessing) adds applyThresholdingInv to
                        preprocessing. order is important.
  --getDilatedImage     (PreProcessing) adds getDilatedImage to preprocessing.
                        order is important.
  --getErodedImage      (PreProcessing) adds getErodedImage to preprocessing.
                        order is important.
  --applyOpening        (PreProcessing) adds applyOpening to preprocessing.
                        order is important.
  --applyClosing        (PreProcessing) adds applyClosing to preprocessing.
                        order is important.
  --getCannyResult      (PreProcessing) adds getCannyResult to preprocessing.
                        order is important.

Source: https://github.com/TheDigitalPhoenixX/Simple-Tesseract-Python-OCR

Example

py -m ocr -i "example input\input.jpg" -v

input.jpg

output.txt

This is SAMPLE TEXT
Text is at different regions

output.png

verbose:

Built With

Visual Studio Code - Code Editor

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Mohamed Said Sallam - Main Dev - TheDigitalPhoenixX

See also the list of contributors who participated in this project and their work in CONTRIBUTORS.md.

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

README.md Template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simple Tesseract Python OCR

Getting Started

Prerequisites

Running the code

Example

Built With

Contributing

Versioning

Authors

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name	Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode	.vscode
OCR	OCR
docs	docs
example input	example input
input	input
.env.example	.env.example
.gitattributes	.gitattributes
.gitignore	.gitignore
CHANGELOG.MD	CHANGELOG.MD
CONTRIBUTING.md	CONTRIBUTING.md
CONTRIBUTORS.md	CONTRIBUTORS.md
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt

Search code, repositories, users, issues, pull requests...

License

MohamedSaidSallam/Simple-Tesseract-Python-OCR

Folders and files

Latest commit

History

Repository files navigation

Simple Tesseract Python OCR

Getting Started

Prerequisites

Running the code

Example

Built With

Contributing

Versioning

Authors

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages