Add support for exporting padded images and adjusted bounding boxes for accurate region cropping by muhammad-ali-emumba · Pull Request #154 · bytedance/Dolphin

muhammad-ali-emumba · Oct 27, 2025

PR Description

Problem

Dolphin applies padding to PDF page images during processing, but the dumped bboxes in the recognition JSON correspond to the un-padded images, causing misalignment when visualizing or reusing those bboxes.

This leads to a mismatch when users attempt to:

Reconstruct or visualize the processed pages using the dumped bboxes.
Crop specific parts of a page (e.g., code blocks or tables) for downstream tasks such as LLM-based text/code extraction.

As a result, the dumped bboxes from the existing recognition_json/.json do not correctly align with the actual padded images used internally.

Solution

This PR introduces two key enhancements to improve accuracy and reproducibility:

Export Padded Images

Added logic in demo_page.py and utils/util.py to save the padded page images generated during PDF processing.
Padded images are saved in a directory specified by the user via the new CLI argument --processed_images_dir.
If no directory is provided (via CLI or config), Dolphin will automatically create a default folder named processed_images_by_dolphin in the parent directory.
The saved images follow a consistent naming structure:

<processed_images_dir>/<pdf_name>/page-1.png

Dump Adjusted (Padded) Bounding Boxes

When Dolphin transforms the original bboxes to match the padded images, these adjusted coordinates are now stored in the output JSON under a new key:

{
  "bboxes": [...],
  "padded_bboxes": [...]
}

This ensures consumers of the output can precisely map visual elements or crop specific regions from the padded images without additional coordinate transformation.

Impact

Enables accurate cropping of specific PDF regions (e.g., code snippets, figures) for post-processing or LLM-based enrichment.
Improves reproducibility between internal image processing and exported JSON metadata.
Backward compatibility - existing users relying on the current bboxes field will not be affected.

Files Modified

demo_page.py
utils/util.py

Testing

Validated on multiple PDFs containing code blocks and tables.
Verified that:
-Dumped padded_bboxes correctly align with visual regions when overlayed on padded images.

…cognition json files and save the padded images on disk.

muhammad-ali-emumba added 2 commits October 27, 2025 15:46

added the functionality to append padded bboxes coordinates in the re…

bca0863

…cognition json files and save the padded images on disk.

Merge branch 'master' into padded_images_and_bboxes

35a7354

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for exporting padded images and adjusted bounding boxes for accurate region cropping#154

muhammad-ali-emumba commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Search code, repositories, users, issues, pull requests...

Conversation

muhammad-ali-emumba commented Oct 27, 2025

PR Description

Problem

Solution

Export Padded Images

Dump Adjusted (Padded) Bounding Boxes

Impact

Files Modified

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant