SGMSE Voicebank Speech Enhancement Recipe #2947

jrochdi · Jul 16, 2025

What does this PR do?

Introduce SpeechBrain recipe recipes/Voicebank/enhance/SGMSE/ for SGMSE Voicebank enhancement
Add train.py (adapted Brain class and training loop)
Add hparams.yaml (hyperparameters for training)
Add enhance.py inference script to generate enhanced audio on demand
Add extra_requirements.txt requirements file to install dependencies required for this recipe

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

This reverts commit 397b012.

pplantinga

Everything looks quite good, I was able to run the code without any issues. I have only a few minor comments about sections that might work slightly better (e.g. shorter "train.py" file) if they followed the more SpeechBrain-idiomatic way of doing things, like using the speechbrain.processing.features.STFT class or something similar instead of handling it all in the train file.

The only remaining pieces are:

Add results to dropbox and record the link and scores achieved to the README
it would be very nice if we are able to host a model on Huggingface and run inference using SpeechBrain code. Probably the easiest method would be to add an inference class to the sgmse_plus.py file that enhances a waveform when called, and can be used with speechbrain.inference.enhancement.EnhanceWaveform

pplantinga · Jul 24, 2025

recipes/Voicebank/enhance/SGMSE/hparams.yaml

@@ -0,0 +1,90 @@
+output_folder: /export/home/1rochdi/speechbrain/results/Voicebank/enhance/SGMSE    # Main directory to store experiment results


This could just be "results" perhaps, this path won't exist on most systems.

pplantinga · Jul 24, 2025

recipes/Voicebank/enhance/SGMSE/hparams.yaml

+#inference_dir: !ref <output_folder>/<run_name>/enhanced_inference        # Directory to store waveforms at inference
+tensorboard_logs: !ref <output_folder>/tensorboard_logs/                 # Directory for TensorBoard logs
+
+data_dir: /data/datasets/noisy-vctk-16k         # Root dir for the dataset


Could be !PLACEHOLDER so people know to change it.

recipes/Voicebank/enhance/SGMSE/README.md

pplantinga · Jul 24, 2025

recipes/Voicebank/enhance/SGMSE/train.py

+        # STFT
+        n_fft = self.hparams.n_fft
+        hop_length = self.hparams.hop_length
+        window_type = self.hparams.window_type
+        self.window = self.get_window(window_type, n_fft).to(self.device)
+        self.stft_kwargs = {
+            "n_fft": n_fft,
+            "hop_length": hop_length,
+            "center": True,
+            "return_complex": True,
+        }


Typically in Speechbrain we just define the STFT in the hparams

pplantinga · Jul 24, 2025

recipes/Voicebank/enhance/SGMSE/train.py

+        ema = self.modules["score_model"].ema
+        self.checkpointer.add_recoverable(
+            name="ema",
+            obj=ema,
+            custom_save_hook=lambda obj, path: torch.save(
+                obj.state_dict(), path
+            ),
+            custom_load_hook=lambda obj, path, end: obj.load_state_dict(
+                torch.load(path)
+            ),
+        )


This is fine, but you could also add the save/load code to the model itself with @mark_as_saver and @mark_as_loader

https://speechbrain.readthedocs.io/en/latest/API/speechbrain.utils.checkpoints.html#speechbrain.utils.checkpoints.register_checkpoint_hooks

pplantinga · Jul 24, 2025

recipes/Voicebank/enhance/SGMSE/train.py

+    cli.add_argument(
+        "--resume",
+        type=str,
+        default="",
+        help="Path to an existing run directory to resume.",
+    )
+    resume_args, remaining = cli.parse_known_args()
+
+    hparams_file, run_opts, overrides = sb.parse_arguments(remaining)
+
+    if resume_args.resume:  # Resume
+        run_dir = Path(resume_args.resume).resolve()
+        hparams_file = run_dir / "hyperparams.yaml"
+        overrides = overrides or ""
+    else:  # New
+        run_name = f"run_{datetime.now():%Y-%m-%d_%H-%M-%S}"
+        overrides = (overrides or "") + f"\nrun_name: '{run_name}'"


I like the resume mechanism here, nice work! I did get tripped up once while using it (the hparams in the results dir did not have a reference I expected to be there) but I don't think there's anything to fix here necessarily. Maybe a message could be nice that states where the hparams are loaded from.

pplantinga · Jul 25, 2025

The reason the inference_dir was commented out is because the fact that it is never used in the train.py is causing the consistency checker to choke (see test: ERROR: variable "inference_dir" not used in recipes/Voicebank/enhance/SGMSE/train.py!) One solution could be to make this a commandline argument instead. Another could be to have a separate yaml for inference that uses an !include:hparams.yaml tag to import the other file.

Jonas Rochdi added 2 commits July 16, 2025 15:28

SGMSE Voicebank enhance recipe

52db102

renamed extra_requirements.txt

d64eb92

pplantinga self-requested a review July 17, 2025 16:28

pplantinga added the recipes Changes to recipes only (add/edit) label Jul 17, 2025

pplantinga and others added 10 commits July 23, 2025 09:40

Merge branch 'develop' into sgmse_vbdmd_enhance

3b9cfab

Fix formatting errors etc.

f8ee67e

added missing docstrings

397b012

Update voicebank download link

0cad5c8

Revert "added missing docstrings"

f6f2370

This reverts commit 397b012.

Re-add docstrings in correct format

d8281a2

Add recipe test for SGMSE

1802849

Make the yaml consistency checker happy

4a90aba

The YAML consistency checker. Again.

bc69704

Move sgmse to "integrations" as it requires an extra dependency

e4cf646

pplantinga reviewed Jul 24, 2025

View reviewed changes

pplantinga and others added 2 commits July 24, 2025 09:59

Remove unneeded file

3a62763

fix typo

d5bcabf

Jonas Rochdi and others added 9 commits August 12, 2025 11:55

readme

07be1c5

removed inference_dir from hparams

e1b5ffa

removed inference_dir hparams reference from enhance.py

1548de8

Merge branch 'develop' into sgmse_vbdmd_enhance

62621b0

tensorboard logs now in run dir

826dc36

removed tensorboard log from hparams

199fe5d

Merge branch 'develop' into sgmse_vbdmd_enhance

fd2fcec

Fix black formatting error in sgmse recipe

8dae767

Add link to results folder

a99572d

TParcollet added this to the v1.1.0 milestone Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SGMSE Voicebank Speech Enhancement Recipe #2947

SGMSE Voicebank Speech Enhancement Recipe #2947

Uh oh!

jrochdi commented Jul 16, 2025 •

edited

Loading

Uh oh!

pplantinga left a comment

Uh oh!

pplantinga Jul 24, 2025

Uh oh!

pplantinga Jul 24, 2025

Uh oh!

Uh oh!

pplantinga Jul 24, 2025

Uh oh!

pplantinga Jul 24, 2025

Uh oh!

pplantinga Jul 24, 2025

Uh oh!

pplantinga commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,90 @@
		output_folder: /export/home/1rochdi/speechbrain/results/Voicebank/enhance/SGMSE # Main directory to store experiment results

Search code, repositories, users, issues, pull requests...

SGMSE Voicebank Speech Enhancement Recipe #2947

Are you sure you want to change the base?

SGMSE Voicebank Speech Enhancement Recipe #2947

Uh oh!

Conversation

jrochdi commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

pplantinga Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pplantinga Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

pplantinga commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jrochdi commented Jul 16, 2025 •

edited

Loading