Skip to content

Conversation

@kevinhu-nv
Copy link
Collaborator

@kevinhu-nv kevinhu-nv commented Nov 19, 2025

What does this PR do ?

Merge duplex STT changes to NeMo main.

Collection: speechlm2

Changelog

  • Added training support for using nano-9b as LLM backbone
  • Added prompt tokens support
  • Added streaming ASR support
  • Added Refactoring, unit tests, and other minor changes

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • [X ] Did you write any new necessary tests?
  • [ X ] Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Chen Chen and others added 17 commits November 5, 2025 20:48
Signed-off-by: kevinhu <[email protected]>
Signed-off-by: kevinhu <[email protected]>
…-sysprompt

support training and inference for data with system prompt
Signed-off-by: kevinhu <[email protected]>
Signed-off-by: kevinhu <[email protected]>
Signed-off-by: kevinhu <[email protected]>
Signed-off-by: kevinhu <[email protected]>
Signed-off-by: kevinhu <[email protected]>
Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Collaborator

@zhehuaichen zhehuaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started!



@data_type_parser(["s2s_duplex_overlap_as_s2s_duplex"])
def read_s2s_duplex_overlap_as_s2s_duplex(config) -> tuple[CutSet, bool]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzelasko do you think it is ok to continue growing the size of this file or should we create a separate file for s2s specifics

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's grow it and refactor it to smaller files later

@kevinhu-nv
Copy link
Collaborator Author

Resolved comments, and finished another pass of the code to brush up the code by: 1) removing debug code, and 2) add necessary updates since last rebased (e.g. speech cutoff fix).

Some areas to discuss before I make changes:

  • How to define agent_bos, agent_eos, user_bos, user_eos
  • Did not incorporate most recent changes such as early interruption, etc since it is experimental

PTAL @zhehuaichen

@kevinhu-nv kevinhu-nv marked this pull request as ready for review December 19, 2025 18:19
@kevinhu-nv kevinhu-nv changed the title Duplex stt rebased Implement Duplex Speech-to-text model and rebase Dec 19, 2025
@kevinhu-nv
Copy link
Collaborator Author

@pzelasko @Edresson Can you start taking a look and maybe leave some high-level comments first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants