-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Adapting transducer greedy decoding #2975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Adding a decoding loop on each decoding timestep. It is more adapted to RNN-T loss and it improves significantly the results (especially for streaming recipes)
Set buffer_chunk_size to -1 to prevent dropping first chunks. This is the case especially when a small chunk size is used (e.g., 160ms)
Hi @younessdkhissi, Thanks for the PR. Could you please develop a bit more what are your suggested changes, and why do you think they are necessary? Furthermore, can you report the general improvements you've got in a streaming setting? Ideally, we would like to imrpove our transfucer decoding interfaces by being more aligned to the literature, so if you could provide some references it would be great as well. Thanks Youness! |
Thanks @Adel-Moumen for taking time to look at my PR. I have tried to implement the following paper "DUAL-MODE ASR: UNIFY AND IMPROVE STREAMINGASR WITH FULL-CONTEXT MODELING" https://openreview.net/pdf?id=Pz_dcqfcKW8 (that could be a future PR) and these are the results I get on LibriSpeech using the streaming mode:
![]() |
Hi @younessdkhissi thanks for the work. I am really not sure what this PR does. I see that there is a new arbitrary for loop for each time steps. Do you have a paper describing what is being done here formally? This would be important to attach to such a change to greedy decoding. |
Hello @TParcollet |
Hi @younessdkhissi and thank you. I'll take a closer look into this asap. In the meantime, could you please provide a few measurements of how the decoding speed is impacted by this change? Many thanks! |
Hi @TParcollet
If you want more measurements let me know :) |
Adding a decoding loop on each decoding timestep. It is more adapted to RNN-T loss and it improves significantly the results (especially for streaming recipes)
What does this PR do?
Fixes the transducer greedy decoding (this probably fixes the issue #2753)
Before submitting
PR review
Reviewer checklist