Skip to content

Conversation

@tarushi2k2
Copy link
Contributor

@tarushi2k2 tarushi2k2 commented Jan 9, 2025

What does this PR do?

This PR incorporates the future implementations for the Date Semiotic class. The changes included are:

  1. Addition of year ranges
  2. Implementation of centuries (AD and BC)
  3. Support for commas in the DMY format
  4. Inclusion of several date-specific cases

This PR also includes the development of the Telephone semiotic class which provides for:

  1. Two-digit country code with a 10-digit number in both Hindi and English
  2. Two, three, four-digit STD landline codes for all tier-1, tier-2, and tier-3 cities in India along with a 7-digit landline number in both Hindi and English.
  3. 6-digit Pincodes in both Hindi and English
  4. Last digits of a credit card in Hindi and English.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

tarushi2k2 and others added 24 commits November 27, 2024 15:17
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: Tarushi V <[email protected]>
@tarushi2k2 tarushi2k2 changed the base branch from main to staging January 27, 2025 11:37
Comment on lines 18 to 24
from nemo_text_processing.inverse_text_normalization.hi.graph_utils import (
NEMO_HI_DIGIT,
GraphFst,
delete_extra_space,
delete_space,
insert_space,
)

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'delete_extra_space' is not used.
Import of 'insert_space' is not used.
Import of 'NEMO_HI_DIGIT' is not used.
delete_space,
insert_space,
)
from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'apply_fst' is not used.
graph_landline_with_three_digit_extension = (
delete_zero + delete_space + self.city_three_digit_extension + delete_space + self.landline_three
)
graph_landline_with_four_digit_extension = (

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable graph_landline_with_four_digit_extension is not used.
import pynini
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'apply_fst' is not used.
from parameterized import parameterized

from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
from nemo_text_processing.text_normalization.normalize import Normalizer

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'Normalizer' is not used.
@github-actions
Copy link

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Feb 18, 2025
@github-actions
Copy link

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Feb 25, 2025
@mgrafu mgrafu reopened this Apr 1, 2025
@github-actions github-actions bot removed the Stale label Apr 2, 2025
@mgrafu mgrafu merged commit b6c907e into NVIDIA:staging Apr 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants