Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

jrochdi
Copy link

@jrochdi jrochdi commented Jul 16, 2025

What does this PR do?

  • Introduce SpeechBrain recipe recipes/Voicebank/enhance/SGMSE/ for SGMSE Voicebank enhancement
  • Add train.py (adapted Brain class and training loop)
  • Add hparams.yaml (hyperparameters for training)
  • Add enhance.py inference script to generate enhanced audio on demand
  • Add extra_requirements.txt requirements file to install dependencies required for this recipe
Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@pplantinga pplantinga self-requested a review July 17, 2025 16:28
@pplantinga pplantinga added the recipes Changes to recipes only (add/edit) label Jul 17, 2025
Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks quite good, I was able to run the code without any issues. I have only a few minor comments about sections that might work slightly better (e.g. shorter "train.py" file) if they followed the more SpeechBrain-idiomatic way of doing things, like using the speechbrain.processing.features.STFT class or something similar instead of handling it all in the train file.

The only remaining pieces are:

  1. Add results to dropbox and record the link and scores achieved to the README
  2. it would be very nice if we are able to host a model on Huggingface and run inference using SpeechBrain code. Probably the easiest method would be to add an inference class to the sgmse_plus.py file that enhances a waveform when called, and can be used with speechbrain.inference.enhancement.EnhanceWaveform

@@ -0,0 +1,90 @@
output_folder: /export/home/1rochdi/speechbrain/results/Voicebank/enhance/SGMSE # Main directory to store experiment results
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be "results" perhaps, this path won't exist on most systems.

#inference_dir: !ref <output_folder>/<run_name>/enhanced_inference # Directory to store waveforms at inference
tensorboard_logs: !ref <output_folder>/tensorboard_logs/ # Directory for TensorBoard logs

data_dir: /data/datasets/noisy-vctk-16k # Root dir for the dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be !PLACEHOLDER so people know to change it.

recipes/Voicebank/enhance/SGMSE/README.md Show resolved Hide resolved
Comment on lines +54 to +64
# STFT
n_fft = self.hparams.n_fft
hop_length = self.hparams.hop_length
window_type = self.hparams.window_type
self.window = self.get_window(window_type, n_fft).to(self.device)
self.stft_kwargs = {
"n_fft": n_fft,
"hop_length": hop_length,
"center": True,
"return_complex": True,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically in Speechbrain we just define the STFT in the hparams

Comment on lines +42 to +52
ema = self.modules["score_model"].ema
self.checkpointer.add_recoverable(
name="ema",
obj=ema,
custom_save_hook=lambda obj, path: torch.save(
obj.state_dict(), path
),
custom_load_hook=lambda obj, path, end: obj.load_state_dict(
torch.load(path)
),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but you could also add the save/load code to the model itself with @mark_as_saver and @mark_as_loader

https://speechbrain.readthedocs.io/en/latest/API/speechbrain.utils.checkpoints.html#speechbrain.utils.checkpoints.register_checkpoint_hooks

Comment on lines +809 to +825
cli.add_argument(
"--resume",
type=str,
default="",
help="Path to an existing run directory to resume.",
)
resume_args, remaining = cli.parse_known_args()

hparams_file, run_opts, overrides = sb.parse_arguments(remaining)

if resume_args.resume: # Resume
run_dir = Path(resume_args.resume).resolve()
hparams_file = run_dir / "hyperparams.yaml"
overrides = overrides or ""
else: # New
run_name = f"run_{datetime.now():%Y-%m-%d_%H-%M-%S}"
overrides = (overrides or "") + f"\nrun_name: '{run_name}'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the resume mechanism here, nice work! I did get tripped up once while using it (the hparams in the results dir did not have a reference I expected to be there) but I don't think there's anything to fix here necessarily. Maybe a message could be nice that states where the hparams are loaded from.

@pplantinga
Copy link
Collaborator

The reason the inference_dir was commented out is because the fact that it is never used in the train.py is causing the consistency checker to choke (see test: ERROR: variable "inference_dir" not used in recipes/Voicebank/enhance/SGMSE/train.py!) One solution could be to make this a commandline argument instead. Another could be to have a separate yaml for inference that uses an !include:hparams.yaml tag to import the other file.

@TParcollet TParcollet added this to the v1.1.0 milestone Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

recipes Changes to recipes only (add/edit)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.