-
Notifications
You must be signed in to change notification settings - Fork 238
Added support for ASR #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Added support for ASR #1359
Conversation
modified the readme with acknowledging the issue data-prep-kit#1042
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shahrokhDaijavad What is the use case for this?
Just to be up-to-date with the latest capabilities of Docling.
|
Hiii , do I need to add a test file or a sample data file or something more for the same ? |
@ShiroYasha18 Any change in docling2parquet requires generating updated "expected" files, for the |
@shahrokhDaijavad : no need to be up-to-date. let's discuss. I need a viable use case before we can proceed. |
|
Hi @touma-I Thanks for the feedback! The idea behind integrating ASR (Automatic Speech Recognition) support is to allow This integration makes the DPK pipeline compatible with speech data workflows, enabling users to extract structured insights from spoken content with minimal setup. It aligns with Docling's existing support and helps bridge that capability into DPK for broader utility. Example Real World use case:Companies often conduct video calls (e.g., via Zoom or Google Meet) with users. These are saved as |
|
@ShiroYasha18 Sure. It is good for DPK to keep us with the latest Docling capabilities, but for us, it only makes sense to add ASR features when there is a specific use case (or client need) in which processing of sound files are followed by one or more DPK transforms in a real application recipe, either in pre-training or post-training LLM applications. As soon as we can find such a use case, we can come back to this PR. |
Why are these changes needed?
Bridges the support for ASR - Automatic Speech Recognition feature from docling to dpk .
Currently supported models :
WHISPER_TINY
WHISPER_SMALL
WHISPER_MEDIUM
WHISPER_BASE
WHISPER_LARGE
WHISPER_TURBO
These are all the ASR models which Docling support as of on 3/07/2025
Related issue number (if any).
Related to issue #1346