Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add support for exporting padded images and adjusted bounding boxes for accurate region cropping#154

Open
muhammad-ali-emumba wants to merge 2 commits into
bytedance:masterbytedance/Dolphin:masterfrom
muhammad-ali-emumba:padded_images_and_bboxesmuhammad-ali-emumba/Dolphin:padded_images_and_bboxesCopy head branch name to clipboard
Open

Add support for exporting padded images and adjusted bounding boxes for accurate region cropping#154
muhammad-ali-emumba wants to merge 2 commits into
bytedance:masterbytedance/Dolphin:masterfrom
muhammad-ali-emumba:padded_images_and_bboxesmuhammad-ali-emumba/Dolphin:padded_images_and_bboxesCopy head branch name to clipboard

Conversation

@muhammad-ali-emumba
Copy link
Copy Markdown

PR Description

Problem

Dolphin applies padding to PDF page images during processing, but the dumped bboxes in the recognition JSON correspond to the un-padded images, causing misalignment when visualizing or reusing those bboxes.

This leads to a mismatch when users attempt to:

  • Reconstruct or visualize the processed pages using the dumped bboxes.

  • Crop specific parts of a page (e.g., code blocks or tables) for downstream tasks such as LLM-based text/code extraction.

As a result, the dumped bboxes from the existing recognition_json/.json do not correctly align with the actual padded images used internally.

Solution

This PR introduces two key enhancements to improve accuracy and reproducibility:

Export Padded Images

  • Added logic in demo_page.py and utils/util.py to save the padded page images generated during PDF processing.

  • Padded images are saved in a directory specified by the user via the new CLI argument --processed_images_dir.

  • If no directory is provided (via CLI or config), Dolphin will automatically create a default folder named processed_images_by_dolphin in the parent directory.

  • The saved images follow a consistent naming structure:

    <processed_images_dir>/<pdf_name>/page-1.png

Dump Adjusted (Padded) Bounding Boxes

  • When Dolphin transforms the original bboxes to match the padded images, these adjusted coordinates are now stored in the output JSON under a new key:
{
  "bboxes": [...],
  "padded_bboxes": [...]
}
  • This ensures consumers of the output can precisely map visual elements or crop specific regions from the padded images without additional coordinate transformation.

Impact

  • Enables accurate cropping of specific PDF regions (e.g., code snippets, figures) for post-processing or LLM-based enrichment.

  • Improves reproducibility between internal image processing and exported JSON metadata.

  • Backward compatibility - existing users relying on the current bboxes field will not be affected.

Files Modified

  • demo_page.py

  • utils/util.py

Testing

Validated on multiple PDFs containing code blocks and tables.
Verified that:
-Dumped padded_bboxes correctly align with visual regions when overlayed on padded images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.