-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
bug: outputPoor markdown/HTML output qualityPoor markdown/HTML output quality
Description
馃摑 Describe the Output Issue
I'm running Surya on PDF with basically this code:
pages = pdf2image.convert_from_path(pdf_path, thread_count=4, dpi=200)
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()
predictions = recognition_predictor(pages, det_predictor=detection_predictor, math_mode=False)
And I'm getting HTML and tags in the output. Is there a way to get Surya to not generate HTML tags?
鈿欙笍 Environment
- Surya version: 0.17.0
- Python version: 3.12.3
- PyTorch version: 2.8.0
- Transformers version: 4.57.0
- Operating System: Ubuntu 24.04.3
Metadata
Metadata
Assignees
Labels
bug: outputPoor markdown/HTML output qualityPoor markdown/HTML output quality