Add triangularclassic filterbank window and snap_to_bins #2562

asumagic · Jun 4, 2024

What does this PR do?

Fixes #2030

WIP, introduces:

triangularclassic triangular filters (name is a WIP)
Functionality to allow aligning filterbank central frequencies to the closest lower FFT frequency bin

No breaking changes expected. Should test model training to see if those make a difference.

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

asumagic · Jun 5, 2024

Currently training LibriSpeech/ASR/transducer/ with triangularclassic and without to see if it works. I don't expect the resulting WER difference to be any significant past the margin of error, but it is worth checking.

I am not sure if I would want to keep the bin alignment option; kaldi/k2 doesn't seem to be doing it and it doesn't feel like it would be beneficial especially without high n_fft.

TParcollet · Jun 6, 2024

It took us a long time due to many high priority things, and mostly because it does not seem to affect the results, but this is an important fix. Maybe @idruker-cerence wants to have a look.

asumagic · Jun 6, 2024

FWIW I have a stock RNN-T LibriSpeech Conformer being trained again with the triangularclassic option and at epoch ~34 there doesn't seem to be much of a difference yet.
Will report on the results when actually finished.

idruker-cerence · Jun 6, 2024

It took us a long time due to many high priority things, and mostly because it does not seem to affect the results, but this is an important fix. Maybe @idruker-cerence wants to have a look.

Hello

I am not a python expert so it's hard to me to review the code. However, I have a C++ implementation. We can do the following. You calculate the mel-spectrum with your new function and give me the numbers. Then I verify them with my C++ function.

asumagic · Jun 25, 2024

FWIW I have a stock RNN-T LibriSpeech Conformer being trained again with the triangularclassic option and at epoch ~34 there doesn't seem to be much of a difference yet. Will report on the results when actually finished.

Current code, lines are respectively test-clean and test-other:

Epoch loaded: 100 - test loss: 1.80e-02, test CER: 8.81e-01, test WER: 2.57
Epoch loaded: 100 - test loss: 2.16e-02, test CER: 2.50, test WER: 5.88

New "triangularclassic":

Epoch loaded: 100 - test loss: 1.88e-02, test CER: 8.57e-01, test WER: 2.59
Epoch loaded: 100 - test loss: 2.12e-02, test CER: 2.56, test WER: 5.97

That's only one run with 100 epochs at that, but it doesn't seem like a significant accuracy difference (~1% relative difference for only one run).

As for correctness, from my testing, "triangularclassic" matches what torchaudio produces, which itself is adapted from kaldi (which I also checked against, or rather k2, which reuses a good chunk of the same logic). So I think it would be ok to add for correctness.

I am tempted to turn this PR into only adding the "correct" filter as an option, but keep the default the same, and remove the snap_to_bins (which I don't think matters, and that I didn't test extensively).

At least in speech processing, correct me if I'm wrong, it doesn't seem so common to round the center frequencies to the closest bin in definitions of the mel filterbank (which is what I tried to implement as snap_to_bin).
I think you can only get "clean-looking" filter banks as usually shown in blog articles/papers if you really crank up the FFT resolution, because otherwise you're either trading off between obviously distorted central frequency spacing (due to the rounding) or not having the central frequency have a peak of 1.0 for on a specific FFT bin.

asumagic added 3 commits June 4, 2024 12:05

Add triangularclassic filterbank window and snap_to_bins

a42dbdd

Fix leftovers

9be58b4

Fix triangularclassic

31a31a2

asumagic mentioned this pull request Jun 4, 2024

[Bug]: wrong formula of calculation of mel filterbank? #2030

Open

Add filter_shape to conformer RNN-T LS recipe for testing

0f63247

mravanelli assigned asumagic Jun 17, 2024

mravanelli self-requested a review June 17, 2024 16:21

Merge branch 'develop' into fix-fbanks

49c2042

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add triangularclassic filterbank window and snap_to_bins #2562

Add triangularclassic filterbank window and snap_to_bins #2562

Uh oh!

asumagic commented Jun 4, 2024

Uh oh!

asumagic commented Jun 5, 2024

Uh oh!

TParcollet commented Jun 6, 2024

Uh oh!

asumagic commented Jun 6, 2024

Uh oh!

idruker-cerence commented Jun 6, 2024

Uh oh!

asumagic commented Jun 25, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Search code, repositories, users, issues, pull requests...

Add triangularclassic filterbank window and snap_to_bins #2562

Are you sure you want to change the base?

Add triangularclassic filterbank window and snap_to_bins #2562

Uh oh!

Conversation

asumagic commented Jun 4, 2024

What does this PR do?

PR review

Uh oh!

asumagic commented Jun 5, 2024

Uh oh!

TParcollet commented Jun 6, 2024

Uh oh!

asumagic commented Jun 6, 2024

Uh oh!

idruker-cerence commented Jun 6, 2024

Uh oh!

asumagic commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

asumagic commented Jun 25, 2024 •

edited

Loading