Skip to content

Commit 96ba6a2

Browse files
shreeshd-tnmgrafungachchipre-commit-ci[bot]github-advanced-security[bot]
authored
Hindi TN: Main to staging Fix + Cardinals (leading zero update) (#348)
* Staging hi tn (#271) * Future Implementations for classes - Measure, Money, and Date (#258) * Future Implementations for classes - Measure, Money, and Date Signed-off-by: Namrata Gachchi <[email protected]> * Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases. Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed the unused empty string implementation Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes for the tagger files Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reformatted decimal final graph Signed-off-by: Namrata Gachchi <[email protected]> * incorporated the suggestion for decimal graph Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Century implementations Signed-off-by: Namrata Gachchi <[email protected]> * Working on the yyyy format for the date class Signed-off-by: Namrata Gachchi <[email protected]> * reverted yyyy code Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * working on future implementations Signed-off-by: Namrata Gachchi <[email protected]> * working on improving the date class accuracy Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added year prefix for the date class Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * working on the commma cases for date class Signed-off-by: Namrata Gachchi <[email protected]> * minor fixes Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implemented mixed fractions Signed-off-by: Namrata Gachchi <[email protected]> * rectified the test case Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * working on quarterly measurements Signed-off-by: Namrata Gachchi <[email protected]> * reformatted the prefixes and suffixes for date tagger class Signed-off-by: Namrata Gachchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * replaced text tag with era tag for the date class Signed-off-by: Namrata Gachchi <[email protected]> * Removed the text tag reference from date class verbalizer Signed-off-by: Namrata Gachchi <[email protected]> --------- Signed-off-by: Namrata Gachchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update jenkins cache Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Potential fix for code scanning alert no. 821: Unused local variable Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Mariana <[email protected]> --------- Signed-off-by: Namrata Gachchi <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Co-authored-by: Namrata Gachchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <[email protected]> * Fix requirements for test Signed-off-by: Anand Joseph <[email protected]> * Update paths and docker Signed-off-by: Anand Joseph <[email protected]> * Fix docker name Signed-off-by: Anand Joseph <[email protected]> * Fix click version Signed-off-by: Anand Joseph <[email protected]> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <[email protected]> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <[email protected]> * Update paths Signed-off-by: Anand Joseph <[email protected]> * Revert paths Signed-off-by: Anand Joseph <[email protected]> --------- Signed-off-by: Anand Joseph <[email protected]> * Comma bugfix for En electronics (#332) * fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update Jenkinsfile (#341) Only mount TestData from path Signed-off-by: anand-nv <[email protected]> * [pre-commit.ci] pre-commit suggestions (#335) updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0) - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0) - [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0) - https://github.com/psf/blackhttps://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Cardinal: Leading zero changes Signed-off-by: shreeshd-tn <[email protected]> --------- Signed-off-by: Namrata Gachchi <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Mariana <[email protected]> Signed-off-by: Anand Joseph <[email protected]> Signed-off-by: anand-nv <[email protected]> Signed-off-by: shreeshd-tn <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Namrata Gachchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: anand-nv <[email protected]>
1 parent dd0b8b7 commit 96ba6a2

File tree

5 files changed

+21
-10
lines changed

5 files changed

+21
-10
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,30 +22,30 @@ ci:
2222

2323
repos:
2424
- repo: https://github.com/pre-commit/pre-commit-hooks
25-
rev: v5.0.0
25+
rev: v6.0.0
2626
hooks:
2727
- id: check-yaml
2828
- id: check-case-conflict
2929
- id: detect-private-key
3030
- id: requirements-txt-fixer
3131

3232
- repo: https://github.com/PyCQA/flake8
33-
rev: 7.2.0
33+
rev: 7.3.0
3434
hooks:
3535
- id: flake8
3636
args:
3737
- --select=W605
3838

3939
- repo: https://github.com/PyCQA/isort
40-
rev: 6.0.1
40+
rev: 6.1.0
4141
hooks:
4242
- id: isort
4343
name: Format imports
4444
args: [ --multi-line=3, --trailing-comma, --force-grid-wrap=0, --use-parentheses, --line-width=119, -rc, -ws ]
4545
exclude: docs/
4646

47-
- repo: https://github.com/psf/black
48-
rev: 25.1.0
47+
- repo: https://github.com/psf/black-pre-commit-mirror
48+
rev: 25.9.0
4949
hooks:
5050
- id: black
5151
name: Format code

nemo_text_processing/text_normalization/en/taggers/electronic.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,14 +127,15 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True):
127127

128128
full_stop_accep = pynini.accep(".")
129129
dollar_accep = pynini.accep("$") # Include for the correct transduction of the money graph
130-
excluded_symbols = full_stop_accep | dollar_accep
130+
excluded_symbols = full_stop_accep | dollar_accep | pynini.accep(",")
131131
filtered_symbols = pynini.difference(accepted_symbols, excluded_symbols)
132132
accepted_characters = NEMO_ALPHA | NEMO_DIGIT | filtered_symbols
133133
domain_component = full_stop_accep + pynini.closure(accepted_characters, 2)
134-
graph_domain = (
134+
graph_domain = pynutil.add_weight(
135135
pynutil.insert('domain: "')
136136
+ (pynini.closure(accepted_characters, 1) + pynini.closure(domain_component, 1))
137-
+ pynutil.insert('"')
137+
+ pynutil.insert('"'),
138+
0.1,
138139
).optimize()
139140

140141
graph |= pynutil.add_weight(graph_domain, MIN_NEG_WEIGHT)

nemo_text_processing/text_normalization/hi/taggers/cardinal.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
import pynini
1616
from pynini.lib import pynutil
1717

18-
from nemo_text_processing.text_normalization.hi.graph_utils import GraphFst
18+
from nemo_text_processing.text_normalization.hi.graph_utils import GraphFst, insert_space
1919
from nemo_text_processing.text_normalization.hi.utils import get_abs_path
2020

2121

@@ -298,6 +298,12 @@ def create_larger_number_graph(digit_graph, suffix, zeros_counts, sub_graph):
298298
graph_ten_shankhs |= create_larger_number_graph(teens_and_ties, suffix_shankhs, 0, graph_ten_padmas)
299299
graph_ten_shankhs.optimize()
300300

301+
# Only match exactly 2 digits to avoid interfering with telephone numbers, decimals, etc.
302+
# e.g., "०५" -> "शून्य पाँच"
303+
single_digit = digit | zero
304+
graph_leading_zero = zero + insert_space + single_digit
305+
graph_leading_zero = pynutil.add_weight(graph_leading_zero, 0.5)
306+
301307
final_graph = (
302308
digit
303309
| zero
@@ -319,6 +325,7 @@ def create_larger_number_graph(digit_graph, suffix, zeros_counts, sub_graph):
319325
| graph_ten_padmas
320326
| graph_shankhs
321327
| graph_ten_shankhs
328+
| graph_leading_zero
322329
)
323330

324331
optional_minus_graph = pynini.closure(pynutil.insert("negative: ") + pynini.cross("-", "\"true\" "), 0, 1)

tests/nemo_text_processing/en/data_text_normalization/test_cases_electronic.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,5 @@ https://www.nvidia.com/dgx-basepod/~HTTPS colon slash slash WWW dot NVIDIA dot c
4141
i can use your card ending in 8876~i can use your card ending in eight eight seven six
4242
upgrade/update~upgrade slash update
4343
upgrade / update~upgrade slash update
44-
upgrade/update/downgrade~upgrade slash update slash downgrade
44+
upgrade/update/downgrade~upgrade slash update slash downgrade
45+
5.4, or 5.5~five point four, or five point five

tests/nemo_text_processing/hi/data_text_normalization/test_cases_cardinal.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,3 +143,5 @@
143143
११०२२३४५५६७~ग्यारह अरब दो करोड़ तेईस लाख पैंतालीस हज़ार पाँच सौ सड़सठ
144144
५१०२२३४५५६७~इक्यावन अरब दो करोड़ तेईस लाख पैंतालीस हज़ार पाँच सौ सड़सठ
145145
२ पॉइंट्स १२ गोल~दो पॉइंट्स बारह गोल
146+
०५~शून्य पाँच
147+
०१~शून्य एक

0 commit comments

Comments
 (0)