Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

gojiplus/search-and-replace

Open more actions menu

Repository files navigation

search-and-replace

CI PyPI Downloads Python 3.10+ License: MIT Docs

High-performance text correction for OCR output using Hyperscan and SymSpell.

Installation

Requires Hyperscan system library:

# macOS
brew install vectorscan  # ARM
brew install hyperscan   # Intel

# Ubuntu/Debian
apt-get install libhyperscan-dev

Then:

pip install search-and-replace

Quick Start

from search_and_replace import SpellCorrector, OCRCorrector, PatternCorrector

# Fix common OCR confusions (0→O, 1→l, rn→m)
ocr = OCRCorrector()
ocr.correct("He11o W0rld")  # "Hello WOrld"

# Spell correction with bundled dictionary
spell = SpellCorrector()
spell.correct("helo")  # "hello"

# Pattern matching with Hyperscan (fast multi-pattern)
patterns = PatternCorrector([("Network", 1), ("Available", 1)])
patterns.correct("The Netwxrk is Avxilable")  # "The Network is Available"

API

Class Description
SpellCorrector Levenshtein-based correction (bundled dictionary or custom words)
OCRCorrector Fix common OCR character confusions
PatternCorrector Hyperscan-based multi-pattern matching
Replacer Direct string replacement
Function Description
process_directory() Batch process files in parallel
load_patterns() Load word,max_errors CSV
load_replacements() Load search,replace CSV

CLI

search-and-replace ./input -o ./output --patterns wordlist.csv
search-and-replace ./input --patterns patterns.csv --replacements replacements.csv -v -j 8

OCR Corrections

OCR Error Fixed
0 O
1, l, I l
rn m
cl d
vv w

License

MIT

About

Edit Distance Based Search and Replace

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.