Add STG Support for Video Diffusion in CosyVoice Audio #1391

primepake · Jun 23, 2025

This PR introduces Stage-Guided (STG) support to CosyVoice, inspired by the video diffusion framework from STGuidance. The changes enhance the text-to-speech pipeline by integrating stage-guided techniques, improving [e.g., generation quality, efficiency, or compatibility with diffusion-based workflows].

Changes Made

Updated cosyvoice/flow/decoder.py to [e.g., "incorporate stage-guided decoding logic for better alignment with diffusion processes"].
Modified cosyvoice/flow/flow_matching.py to [e.g., "adapt flow matching to support STG’s stage-based optimization"].

Motivation

The addition of STG support aims to [e.g., "leverage stage-guided diffusion techniques to enhance the quality and speed of speech synthesis, aligning CosyVoice with advanced video diffusion methodologies"]. This builds on the concepts from junhahyung/STGuidance, adapted for audio generation.

johnwick123f · Jun 24, 2025

Looks interesting, but may I ask, what are the effects of adding STG? Better voice cloning quality or better emotion?

primepake · Jun 25, 2025

yes, I will improve the model quality. For example, the flow matching in flow model is sometimes difficult to maintain the consistent of speaker like it changed the voice identity from male to female in the same audio with STG it's improved

primepake added 2 commits June 23, 2025 14:43

adding STG for audio inference

8fdbd7f

add stg

3199349

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add STG Support for Video Diffusion in CosyVoice Audio #1391

Add STG Support for Video Diffusion in CosyVoice Audio #1391

Uh oh!

primepake commented Jun 23, 2025

Uh oh!

johnwick123f commented Jun 24, 2025

Uh oh!

primepake commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Add STG Support for Video Diffusion in CosyVoice Audio #1391

Are you sure you want to change the base?

Add STG Support for Video Diffusion in CosyVoice Audio #1391

Uh oh!

Conversation

primepake commented Jun 23, 2025

Changes Made

Motivation

Uh oh!

johnwick123f commented Jun 24, 2025

Uh oh!

primepake commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants