Skip to content

Conversation

@quapham
Copy link
Contributor

@quapham quapham commented Dec 10, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

This PR adds a new Japanese G2P module (JapaneseKanaAccent) that supports multi script Japanese input (Kanji/Katakana/Hiragana) with pitch accent markers (0-low / 1-high),
Collection: TTS, common

Changelog

  • nemo/collections/tts/g2p/models/ja_jp_ipa.py:
    Class JapaneseKanaAccent which uses the pyopenjtalk library and a rule base system to generate Katakana with pitch accent markers (e.g., こんにちは -> 0コ1ン 1ニ1チ1ワ).
    Added logic to preserve pure English/ASCII words while converting other scripts to Katakana.

  • nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py:
    Updated ja-JP GRAPHEME_CHARACTER_SETS and Japanese punctuation.

  • tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py:
    Added unit tests verifying homonym disambiguation based on accent.
    Example: 箸 (Chopsticks) 1ハ0シ (High-Low) vs 橋 (Bridge) 0ハ1シ (Low-High).

  • requirements/requirements_tts.txt

Usage

from nemo.collections.tts.g2p.models.ja_jp_ipa import JapaneseKanaAccent
from nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers import JapanesePhonemeTokenizer

g2p = JapaneseKanaAccent()
tokenizer = JapanesePhonemeTokenizer(g2p=g2p, punct=True)

text = "箸 橋"
encoded = tokenizer.encode(text)
decoded = tokenizer.decode(encoded)

print(f"Input: {text}")
print(f"ID: {encoded}")
print(f"Output: {decoded}")

#Input: 箸 橋
#ID: [2, 49, 1, 25, 0, 1, 49, 2, 25]
#Output: 1|ハ|0|シ| |0|ハ|1|シ

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Japanese G2P (Grapheme-to-Phoneme) support with pitch accent markers for the NeMo TTS collection. The implementation converts multi-script Japanese input (Kanji/Katakana/Hiragana) to Katakana with pitch accent annotations (0=low, 1=high) to enable proper pronunciation and intonation for Japanese TTS models.

Key changes:

  • Introduces JapaneseKanaAccent class using pyopenjtalk for Japanese morphological analysis and pitch accent generation
  • Extends ja-JP grapheme and punctuation character sets to support Katakana output
  • Adds unit tests validating pitch accent disambiguation for Japanese homonyms (e.g., 箸/chopsticks vs 橋/bridge)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
nemo/collections/tts/g2p/models/ja_jp_ipa.py Implements the JapaneseKanaAccent class with pitch accent rules (Heiban, Atamadaka, Nakadaka, Odaka), chain processing for compound words, and ASCII preservation
nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py Adds comprehensive Katakana grapheme character set and Japanese full-width punctuation marks for ja-JP locale
tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py Adds test case verifying homonym disambiguation through pitch accent patterns
requirements/requirements_tts.txt Adds pyopenjtalk dependency for Japanese text processing
examples/tts/conf/magpietts/magpietts_multilingual_v2_lhotse.yaml Adds configuration for Japanese phoneme tokenizer with JapaneseKanaAccent G2P in Magpie TTS multilingual setup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@XuesongYang XuesongYang force-pushed the ja_g2p_katakana_acc_clean branch from 35d675c to 7ce8a40 Compare December 10, 2025 18:37
XuesongYang
XuesongYang previously approved these changes Dec 17, 2025
Signed-off-by: Jason <[email protected]>


Signed-off-by: Jason <[email protected]>
@XuesongYang XuesongYang merged commit 661af02 into NVIDIA-NeMo:main Dec 19, 2025
775 of 781 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants