Skip to content

Feature Request: Georgian G2P for NeMo TTS #15557

@NMikaa

Description

@NMikaa

There is currently no G2P (grapheme-to-phoneme) support for Georgian in NeMo's TTS pipeline. This makes it impossible to train or run inference on Georgian text using NeMo's TTS models without building a custom G2P solution outside the framework.

Describe the solution you'd like
Add a "ka-GE" locale to the existing G2P infrastructure:

  1. Register Georgian grapheme and IPA character sets in ipa_lexicon.py
  2. Add a GeorgianG2p class with a rule-based character-to-IPA mapping table
  3. Georgian has a perfectly phonetic orthography — every letter maps to exactly one phoneme with no exceptions, so a simple mapping table is sufficient as the core G2P logic

Usage would follow the existing IpaG2p pattern:

from nemo.collections.tts.g2p.models.ka_ge_ipa import GeorgianG2p

g2p = GeorgianG2p(locale="ka-GE")

phonemes = g2p("გამარჯობა")  # ['g', 'ɑ', 'm', 'ɑ', 'r', 'dʒ', 'ɔ', 'b', 'ɑ']

Describe alternatives you've considered

Adding a dictionary-based lookup (like other locales use), but Georgian's perfectly phonetic script makes this unnecessary overhead. A dictionary could be added later for edge cases if needed.

Additional context

  • Georgian uses the Mkhedruli script (33 letters, Unicode U+10D0–U+10FF)
  • No uppercase/lowercase distinction in modern usage
  • ~3.7M native speakers, official language of Georgia
  • IPA phoneme inventory is well-documented in linguistic literature
  • eSpeakNG already supports Georgian, confirming IPA mappings are standardized
  • Similar in scope to Add Urdu (ur) G2P support to TTS pipeline #15445 (Urdu G2P)
    I have prior experience building Georgian TTS pipelines using NeMo magpie tts and fastpitch and am willing to contribute a PR for this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions