Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

primepake
Copy link

This PR introduces Stage-Guided (STG) support to CosyVoice, inspired by the video diffusion framework from STGuidance. The changes enhance the text-to-speech pipeline by integrating stage-guided techniques, improving [e.g., generation quality, efficiency, or compatibility with diffusion-based workflows].

Changes Made

  • Updated cosyvoice/flow/decoder.py to [e.g., "incorporate stage-guided decoding logic for better alignment with diffusion processes"].

  • Modified cosyvoice/flow/flow_matching.py to [e.g., "adapt flow matching to support STG’s stage-based optimization"].

Motivation

The addition of STG support aims to [e.g., "leverage stage-guided diffusion techniques to enhance the quality and speed of speech synthesis, aligning CosyVoice with advanced video diffusion methodologies"]. This builds on the concepts from junhahyung/STGuidance, adapted for audio generation.

@johnwick123f
Copy link

Looks interesting, but may I ask, what are the effects of adding STG? Better voice cloning quality or better emotion?

@primepake
Copy link
Author

yes, I will improve the model quality. For example, the flow matching in flow model is sometimes difficult to maintain the consistent of speaker like it changed the voice identity from male to female in the same audio with STG it's improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.