Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

"I" (vowel) symbol recognition#1989

Copy link
Copy link
@Tulenien

Description

@Tulenien
Issue body actions

馃殌 The feature

"I" (vowel) symbol is systematically missing from recognition model output when using a vertical image layout with preserve _aspect_ratio option set True.

Motivation, pitch

I have discovered that the model consistently struggles with recognizing the vowel "I" in various positions within a sentence鈥攑articularly at the start and end, when the preserve_aspect_ratio parameter in the Pre-Processor is set to True.

Setting preserve_aspect_ratio=False helps mitigate the issue, but only for vertically oriented text.

However, when using horizontally oriented text setting preserve_aspect_ratio=True results in better recognition of "I" symbols occurrences.

I have also tried to change interpolation method in image resize preprocessing stage. While it improves the overall quality of recognition, it does not affect the "I"s.

Alternatives

I have three suggestions of how it is possible to fix that issue:

  1. To use the conditional check to define the value of preserve_aspect_ratio parameter based on the ratio of image sides. Where the detection of horizontal text will lead to using True value and vertical - to False value.

  2. Allow to choose the preserve_aspect_ratio value when calling the model.

  3. To add more horizontal text samples into the datasets and retrain/finetune the detection and recognition models.

Additional context

I have tested the issue with the image composed from different sentences containing I's on the latest version of doctr library (torch).

The picture represents a comparison between two runs of ocr on the same image using different values of preserve_aspect_ratio parameter.

Colors of boxes meaning:

  • the blue color is assigned to preserve_aspect_ratio set to False outlier results;
  • the red color is assigned to preserve_aspect_ratio set to True outlier results;
  • gray color signifies there is no change between runs;
  • other colors represent partial difference.

Two json files attached show the doctr ouput on the same image. In the doctr_ocr_par_False where are 10 "I" occurrences, while in doctr_ocr_par_True where are 5 "I" occurrences.

Image Image

doctr_ocr_par_False.json

doctr_ocr_par_True.json

The basic script used for tests:

from fastapi import FastAPI, UploadFile, File
import numpy as np
from doctr.models import ocr_predictor
from PIL import Image
import io
import torch
import uvicorn

app = FastAPI()

DETECTION_MODEL = "db_resnet50"
RECOGNITION_MODEL = "crnn_mobilenet_v3_large"
PRESERVE_ASPECT_RATIO = False

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = ocr_predictor(
    pretrained=True,
    det_arch=DETECTION_MODEL,
    reco_arch=RECOGNITION_MODEL,
    assume_straight_pages=True,
    preserve_aspect_ratio=PRESERVE_ASPECT_RATIO,
    symmetric_pad=True,
).to(device=DEVICE)

@app.post("/ocr")
async def ocr(file: UploadFile = File(...)):
    image = await file.read()
    await file.close()

    doc = []
    image_pil = Image.open(io.BytesIO(image)).convert("RGB")
    doc.append(np.asarray(image_pil))

    result = model(doc)
    return result.export()


if __name__ == "__main__":
    uvicorn.run("main:app", host="0.0.0.0", port=44556, reload=False
Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.