Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
688 commits
Select commit Hold shift + click to select a range
bab1fcc
Disables Hindi ITN L0 checks
zoobereq Nov 13, 2024
a152461
Reapplies ITN CI Checks
zoobereq Nov 13, 2024
b8c592f
Adds missing inits
zoobereq Nov 13, 2024
ec94af1
resolved the failing sparrowhawk test cases failed
ngachchi Nov 14, 2024
2ea146f
added new graph for symbols
ngachchi Nov 18, 2024
2a9e3d2
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money a…
ngachchi Dec 6, 2024
d93bf4b
added into(x) symbol dependency for measure class
ngachchi Nov 25, 2024
f33e847
working on measure class
ngachchi Nov 26, 2024
5310a01
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
866a7c0
Hindi TN changes
ngachchi Oct 30, 2024
e67a6d1
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
0760796
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
3fad604
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
1df58a7
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
34e96c6
Whitelist and Word class changes
ngachchi Nov 7, 2024
4d04ad4
post processor changes with minor fixes
ngachchi Nov 8, 2024
9414172
removed unused imports and statements
ngachchi Nov 12, 2024
c651d42
Hindi ITN - Addition of Whitelist and Word (#248)
ngachchi Dec 6, 2024
3692ad6
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
bf8aa47
Implements support for minor currency denominations
zoobereq Dec 5, 2024
c169881
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
4664fa2
added unit test cases and minor fixes
ngachchi Dec 5, 2024
32efeec
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
27aab2c
Updates the cache
ngachchi Dec 6, 2024
66e7b0e
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
1c02443
removed unused english whitelist files
ngachchi Dec 6, 2024
d88a361
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
f9bcfc9
reverted to previous logic
ngachchi Dec 6, 2024
9fd1d72
Jp tn 20241017 (#240)
ngachchi Dec 17, 2024
cf76fa6
Hindi TN changes
ngachchi Oct 30, 2024
1fa34b1
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
a5bc674
additional whitelist class .tsv files and unused imports removed
ngachchi Oct 30, 2024
6739784
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
8f85b3f
incorporated suggestions for unused statements and another for closin…
ngachchi Oct 30, 2024
8a97026
Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Tim…
tarushi2k2 Oct 30, 2024
48c5233
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
bec0590
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
4802ab1
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
84e7fe5
commented irrevelant references and unused snippets from whitelist an…
ngachchi Nov 5, 2024
eb6b66d
Whitelist and Word class changes
ngachchi Nov 7, 2024
00ab03f
post processor changes with minor fixes
ngachchi Nov 8, 2024
34e3535
remove space before punctuation for sparrowhawk file
ngachchi Nov 11, 2024
e265b02
minor fixes for measure class
ngachchi Nov 11, 2024
6abf375
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
1118d83
Updated Jenkinsfile
ngachchi Nov 12, 2024
5952b47
removed unused imports and statements
ngachchi Nov 12, 2024
e17b6d2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
98d8090
updated date stamp for HI cache and commented ITN grammars
ngachchi Nov 12, 2024
45c95ec
Updates the cache
zoobereq Nov 13, 2024
4b66164
Disables Hindi ITN L0 checks
zoobereq Nov 13, 2024
f09bf2a
Reapplies ITN CI Checks
zoobereq Nov 13, 2024
14a8a70
resolved the failing sparrowhawk test cases failed
ngachchi Nov 14, 2024
bcd2cb7
added new graph for symbols
ngachchi Nov 18, 2024
8be056d
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money a…
ngachchi Nov 18, 2024
ab3c797
added into(x) symbol dependency for measure class
ngachchi Nov 25, 2024
55c03da
working on measure class
ngachchi Nov 26, 2024
52015ab
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
3972f76
Hindi TN changes
ngachchi Oct 30, 2024
59bd066
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
965031f
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
a7ce4ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
7f2f71c
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
5dcce8a
Whitelist and Word class changes
ngachchi Nov 7, 2024
7b790af
post processor changes with minor fixes
ngachchi Nov 8, 2024
2c79083
removed unused imports and statements
ngachchi Nov 12, 2024
003718b
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
443cfc3
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
451b4c0
Implements support for minor currency denominations
zoobereq Dec 5, 2024
fc838c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
db3f56c
added unit test cases and minor fixes
ngachchi Dec 5, 2024
49b90a6
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
2890815
Updates the cache
zoobereq Dec 5, 2024
8276843
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
cb0207b
removed unused english whitelist files
ngachchi Dec 6, 2024
2b2b16e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
adf1bbc
reverted to previous logic
ngachchi Dec 6, 2024
88e400f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
1018d06
Updates the cache
zoobereq Dec 6, 2024
2337f57
Updates the cache again
zoobereq Dec 6, 2024
866c6ab
dedh and dhai implementation approach
ngachchi Dec 16, 2024
14e65fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 16, 2024
200717e
Fix space issue with ZH ITN (#244)
zoobereq Dec 10, 2024
18d1293
contributing update (#251)
tbartley94 Dec 11, 2024
d086d0b
fix bug #111 (ar currencies) (#117)
mgrafu Oct 23, 2023
68a6e4a
Logging clean up + IT TN fix (#118)
ekmb Oct 24, 2023
2613972
Time_IT_TN (#105)
GiacomoLeoneMaria Oct 25, 2023
301e1ac
IT TN improvement on tests (#120)
mgrafu Oct 26, 2023
01e4d71
add single letter exception for roman numerals (#121)
mgrafu Oct 27, 2023
382ec01
Increase weights for serial (en TN) (#128)
anand-nv Nov 21, 2023
2e218a0
add measures file for FR TN (#131)
mgrafu Dec 8, 2023
1def7ed
Sh jenkins (#127)
anand-nv Jan 19, 2024
abce5aa
update isort - fix precommit (#138)
ekmb Feb 14, 2024
b0b527d
Armenian itn (#136)
davidks13 Feb 15, 2024
6d3f40b
Fix CI (#142)
ekmb Feb 29, 2024
692cbbf
Armenian TN (#137)
davidks13 Mar 13, 2024
55c3004
Marathi ITN (#134)
ChinmayPatil11 Mar 13, 2024
a9c30c0
jenkins fix (#150)
tbartley94 Mar 13, 2024
8587845
r0.3.0 release (#151)
ekmb Mar 13, 2024
b7f923b
remove unused function from ar tn decimals (#165)
mgrafu Apr 25, 2024
9ec2dd3
ZH sentence-level TN (#112)
BuyuanCui Apr 30, 2024
6eefc3e
preparing release, updating change log (#168)
tbartley94 May 3, 2024
fbaf7d2
hotfix (#169)
ekmb May 3, 2024
119dc1b
hotfix (#170)
tbartley94 May 3, 2024
5eba76c
DE TN Fixes (#177)
zoobereq Jun 6, 2024
b9c5049
Tts en tech terms (#167)
mgrafu Jun 7, 2024
836d229
FR TN Fixes (#181)
zoobereq Jun 7, 2024
1079175
EN TN fixes for Issue #166 (#185)
zoobereq Jul 17, 2024
39a0d3d
IT TN Fixes for #166 (#183)
zoobereq Jul 17, 2024
11804f1
HU TN Fixes issue #166 (#184)
zoobereq Jul 18, 2024
449a2f4
Jp itn 20240221 (#141)
BuyuanCui Jul 19, 2024
52a356b
update en tn folder to see if CI tests run - DO NOT MERGE (#199)
anand-nv Jul 24, 2024
e80c174
Reverts EN TN fixes for Issue #166 (#202)
zoobereq Aug 13, 2024
7b0f5f7
es and es_en changes for unified models (#143)
mgrafu Aug 14, 2024
973417a
ES TN Fixes for Issue #166 (#206)
zoobereq Aug 15, 2024
579f90a
Zh tn bug 240712 (#187)
BuyuanCui Aug 16, 2024
e6cf450
EN TN Fixes for Issue 166 (#207)
zoobereq Aug 19, 2024
fda11ca
Fix for nv-bug 4786175 (#213)
zoobereq Aug 21, 2024
40a8871
Release commit r1.1.0 (#217)
tbartley94 Aug 21, 2024
3cd1d04
EN TN Fixes for nv-bug 4786225 (#218)
zoobereq Aug 22, 2024
f123cf3
Applies fixes for nv-bug 4786263 (#220)
zoobereq Aug 22, 2024
728eb6d
Fix invalid escape sequences (#219)
TheKevJames Aug 23, 2024
24dc2d9
IT TN Fixes for Issue #166 (#221)
zoobereq Aug 26, 2024
30c61c8
ES TN Fix for Issue #166 (#224)
zoobereq Sep 3, 2024
a413124
Expands per/unit mappings and updates the cache (#227)
zoobereq Sep 11, 2024
0eb5ab1
Cardinals up to a hundred trillions, timeFST and transliteration (#209)
kurt0cougar Sep 17, 2024
b9b4702
Fix for issue #211 (#232)
mgrafu Sep 27, 2024
ac6bb08
Jp itn update 240805 (#208)
BuyuanCui Oct 1, 2024
00511fe
DE TN Fix for Issue #228 (#237)
zoobereq Oct 17, 2024
212103a
Jp tn 20241017 (#240)
BuyuanCui Oct 18, 2024
115b1b9
Fixes issue 228 (#234)
zoobereq Oct 23, 2024
95d73a8
Hindi TN changes
ngachchi Oct 30, 2024
ef5db91
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
f639e83
additional whitelist class .tsv files and unused imports removed
ngachchi Oct 30, 2024
9c647bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
050e752
incorporated suggestions for unused statements and another for closin…
ngachchi Oct 30, 2024
78ef757
Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Tim…
ngachchi Dec 6, 2024
5c1a612
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
b47777b
[pre-commit.ci] auto fixes from pre-commit.com hooks
ngachchi Dec 6, 2024
dd7a2b5
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
81f410e
commented irrevelant references and unused snippets from whitelist an…
ngachchi Nov 5, 2024
55e59eb
Whitelist and Word class changes
ngachchi Nov 7, 2024
926c393
post processor changes with minor fixes
ngachchi Nov 8, 2024
05d7cde
remove space before punctuation for sparrowhawk file
ngachchi Nov 11, 2024
7005d73
minor fixes for measure class
ngachchi Nov 11, 2024
33464f6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
9f9c95b
Updated Jenkinsfile
ngachchi Nov 12, 2024
274b091
removed unused imports and statements
ngachchi Nov 12, 2024
23969cc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
c4a14e9
updated date stamp for HI cache and commented ITN grammars
ngachchi Nov 12, 2024
93eff1c
Updates the cache
zoobereq Nov 13, 2024
b488d07
Disables Hindi ITN L0 checks
zoobereq Nov 13, 2024
62573fb
Reapplies ITN CI Checks
zoobereq Nov 13, 2024
e5bc86a
resolved the failing sparrowhawk test cases failed
ngachchi Nov 14, 2024
5c399be
added new graph for symbols
ngachchi Nov 18, 2024
8d6c805
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money a…
ngachchi Dec 6, 2024
56b33c5
added into(x) symbol dependency for measure class
ngachchi Nov 25, 2024
ef60c0a
working on measure class
ngachchi Nov 26, 2024
3cd0ff9
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
cdac50c
Hindi TN changes
ngachchi Oct 30, 2024
8554f19
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
dec902c
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
9fe3cbd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
3ab01cb
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
844b017
Whitelist and Word class changes
ngachchi Nov 7, 2024
9a85a11
post processor changes with minor fixes
ngachchi Nov 8, 2024
700aed1
removed unused imports and statements
ngachchi Nov 12, 2024
66cedf0
Hindi ITN - Addition of Whitelist and Word (#248)
ngachchi Dec 6, 2024
47f9cf2
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
a140a42
Implements support for minor currency denominations
zoobereq Dec 5, 2024
0274037
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
b075f54
added unit test cases and minor fixes
ngachchi Dec 5, 2024
e879ffc
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
cd98a2f
Updates the cache
ngachchi Dec 6, 2024
6c3eedf
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
47a1011
removed unused english whitelist files
ngachchi Dec 6, 2024
4e2bf6f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
5c4a6a9
reverted to previous logic
ngachchi Dec 6, 2024
a6c4583
Jp tn 20241017 (#240)
BuyuanCui Oct 18, 2024
17df864
Hindi TN changes
ngachchi Oct 30, 2024
a82ff77
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
bd9e52d
additional whitelist class .tsv files and unused imports removed
ngachchi Oct 30, 2024
dc81c3c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2024
90c88df
incorporated suggestions for unused statements and another for closin…
ngachchi Oct 30, 2024
06e5506
Hindi ITN Support for Cardinal, Decimal, Ordinal, Fraction, Date, Tim…
tarushi2k2 Oct 30, 2024
35509e9
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
f525ee5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
0bd332c
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
15a0cd2
commented irrevelant references and unused snippets from whitelist an…
ngachchi Nov 5, 2024
f53dc2a
Whitelist and Word class changes
ngachchi Nov 7, 2024
a9ac512
post processor changes with minor fixes
ngachchi Nov 8, 2024
9d7c9ef
remove space before punctuation for sparrowhawk file
ngachchi Nov 11, 2024
015aabe
minor fixes for measure class
ngachchi Nov 11, 2024
13456b4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
b7962fc
Updated Jenkinsfile
ngachchi Nov 12, 2024
00c33e6
removed unused imports and statements
ngachchi Nov 12, 2024
8f2b671
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2024
33587e4
updated date stamp for HI cache and commented ITN grammars
ngachchi Nov 12, 2024
52cbdb5
Updates the cache
zoobereq Nov 13, 2024
ba6f768
Disables Hindi ITN L0 checks
zoobereq Nov 13, 2024
f9da1ef
Reapplies ITN CI Checks
zoobereq Nov 13, 2024
e738dda
resolved the failing sparrowhawk test cases failed
ngachchi Nov 14, 2024
88a6640
added new graph for symbols
ngachchi Nov 18, 2024
e4f03f1
Hindi TN Support for Cardinal, Decimal, Fraction, Date, Time, Money a…
ngachchi Nov 18, 2024
f452d1a
added into(x) symbol dependency for measure class
ngachchi Nov 25, 2024
bbfb927
working on measure class
ngachchi Nov 26, 2024
9853d9f
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
8e71b39
Hindi TN changes
ngachchi Oct 30, 2024
a125b2c
Updated date for Hindi TN cache
ngachchi Oct 30, 2024
d148cf5
Combined Hindi TN and ITN seperate blocks into single
ngachchi Nov 5, 2024
4320ec2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
a58bd70
Added init.py files and removed unused commented lines
ngachchi Nov 5, 2024
8eb9d34
Whitelist and Word class changes
ngachchi Nov 7, 2024
f28fea6
post processor changes with minor fixes
ngachchi Nov 8, 2024
b59fc5a
removed unused imports and statements
ngachchi Nov 12, 2024
e0e109d
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
a12f533
refactoring minor currency instead of direct implementation of paise
ngachchi Dec 3, 2024
0153cd7
Implements support for minor currency denominations
zoobereq Dec 5, 2024
3095fb1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2024
7cba72e
added unit test cases and minor fixes
ngachchi Dec 5, 2024
082041d
added missing units to improve accuracy for measure class
ngachchi Dec 5, 2024
3e9fe1f
Updates the cache
zoobereq Dec 5, 2024
dc61b1d
fixed the sparrowhawk to trim extra space
ngachchi Dec 6, 2024
01755c5
removed unused english whitelist files
ngachchi Dec 6, 2024
b491ba9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
1681992
reverted to previous logic
ngachchi Dec 6, 2024
2af7575
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2024
172a314
Updates the cache
zoobereq Dec 6, 2024
6e60b0a
Updates the cache again
zoobereq Dec 6, 2024
4b33dc2
dedh and dhai implementation approach
ngachchi Dec 16, 2024
cb94776
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 16, 2024
ee38e85
Fix space issue with ZH ITN (#244)
zoobereq Dec 10, 2024
2bedc07
reverted code and added zero to the hour tsv file
ngachchi Dec 18, 2024
6a82f32
reverted to previous logic
ngachchi Dec 6, 2024
ff8ed1b
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
8736a9e
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
ecd4a2c
Hindi ITN - Addition of Whitelist and Word (#248)
tarushi2k2 Dec 2, 2024
175a6ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 19, 2024
f57aa18
Date further implementation (BC, B.C.) added
ngachchi Dec 19, 2024
5e6df1e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 19, 2024
08ce71d
added date range implementation
ngachchi Dec 23, 2024
20d7520
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 23, 2024
ec48901
working unit test cases
ngachchi Jan 9, 2025
6c9b6d9
removed the conflicted test case for the instance
ngachchi Jan 13, 2025
66712bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 9, 2025
2b5bfe5
Update Dockerfile (#254)
anand-nv Jan 9, 2025
f9013a4
updated Jenkins file
ngachchi Jan 15, 2025
c8715da
Merge branch 'hi_tn' of https://github.com/ngachchi/NeMo-text-process…
ngachchi Jan 21, 2025
2e7c2a0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 21, 2025
9bf29e1
minor fixes
ngachchi Jan 23, 2025
55f2ee8
reformatting changes
ngachchi Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pipeline {
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-17-24-1'
HI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/11-29-24-1'
HI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/01-15-25-0'
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
}
stages {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ def get_abs_path(rel_path):

Args:
rel_path: relative path to this file

Returns absolute path
"""
return os.path.dirname(os.path.abspath(__file__)) + '/' + rel_path
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ def get_abs_path(rel_path):

Args:
rel_path: relative path to this file

Returns absolute path
"""
return os.path.dirname(os.path.abspath(__file__)) + '/' + rel_path
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,4 @@
/shift per shift
/project per project
/class per class
/session per session
/session per session
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ apr. J.-C. après jésus-christ
av. J.-C. avant Jésus-Christ
le hon. l’honorable
le très hon. le très hononrable
% pour cent
% pour cent
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ई. पू. ईसा पूर्व
ई. ईसवी
तक तक
Original file line number Diff line number Diff line change
Expand Up @@ -141,14 +141,16 @@ month महीना
months महीने
ct कैरेट
pH पीएच
km/h किलोमीटर प्रति घंटा
km/hr किलोमीटर प्रति घंटा
km/min किलोमीटर प्रति मिनट
m/h मीटर प्रति घंटा
m/hr मीटर प्रति घंटा
mi/s मील प्रति सेकंड
mi/h मील प्रति घंटा
mi/hr मील प्रति घंटा
mi/min मील प्रति मिनट
₹/ac रुपए प्रति एकड़
x बाई
X बाई
* बाई
- से
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
₹ रुपए
P पैसे
£ पाउंड
₩ वॉन
$ डॉलर
₺ लीरा
৳ टका
¥ येन
₦ नाइरा
€ यूरो
€ यूरो
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
रुपए पैसे
पाउंड पेंस
वॉन जिओन
डॉलर सेंट
लीरा कुरस
टका पैसे
येन सेन
नाइरा कोबो
यूरो सेंट
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
० शून्य
१ एक
२ दो
३ तीन
Expand Down
63 changes: 63 additions & 0 deletions nemo_text_processing/text_normalization/hi/graph_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import pynini
from pynini import Far
from pynini.examples import plurals
from pynini.export import export
from pynini.lib import byte, pynutil, utf8

Expand Down Expand Up @@ -99,6 +100,30 @@ def generator_main(file_name: str, graphs: Dict[str, 'pynini.FstLike']):
logging.info(f'Created {file_name}')


def get_plurals(fst):
"""
Given singular returns plurals

Args:
fst: Fst

Returns plurals to given singular forms
"""
return SINGULAR_TO_PLURAL @ fst


def get_singulars(fst):
"""
Given plural returns singulars

Args:
fst: Fst

Returns singulars to given plural forms
"""
return PLURAL_TO_SINGULAR @ fst


def convert_space(fst) -> 'pynini.FstLike':
"""
Converts space to nonbreaking space.
Expand All @@ -113,6 +138,44 @@ def convert_space(fst) -> 'pynini.FstLike':
return fst @ pynini.cdrewrite(pynini.cross(NEMO_SPACE, NEMO_NON_BREAKING_SPACE), "", "", NEMO_SIGMA)


def string_map_cased(input_file: str, input_case: str = INPUT_LOWER_CASED):
labels = load_labels(input_file)

if input_case == INPUT_CASED:
additional_labels = []
for written, spoken, *weight in labels:
written_capitalized = written[0].upper() + written[1:]
additional_labels.extend(
[
[written_capitalized, spoken.capitalize()], # first letter capitalized
[
written_capitalized,
spoken.upper().replace(" AND ", " and "),
], # # add pairs with the all letters capitalized
]
)

spoken_no_space = spoken.replace(" ", "")
# add abbreviations without spaces (both lower and upper case), i.e. "BMW" not "B M W"
if len(spoken) == (2 * len(spoken_no_space) - 1):
logging.debug(f"This is weight {weight}")
if len(weight) == 0:
additional_labels.extend(
[[written, spoken_no_space], [written_capitalized, spoken_no_space.upper()]]
)
else:
additional_labels.extend(
[
[written, spoken_no_space, weight[0]],
[written_capitalized, spoken_no_space.upper(), weight[0]],
]
)
labels += additional_labels

whitelist = pynini.string_map(labels).invert().optimize()
return whitelist


class GraphFst:
"""
Base class for all grammar fsts.
Expand Down
30 changes: 26 additions & 4 deletions nemo_text_processing/text_normalization/hi/taggers/date.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@

import pynini
from pynini.lib import pynutil

from nemo_text_processing.text_normalization.hi.graph_utils import (
NEMO_HI_DIGIT,
NEMO_HI_NON_ZERO,
Expand All @@ -26,6 +25,7 @@

days = pynini.string_file(get_abs_path("data/date/days.tsv"))
months = pynini.string_file(get_abs_path("data/date/months.tsv"))
year_suffix = pynini.string_file(get_abs_path("data/date/year_suffix.tsv"))


class DateFst(GraphFst):
Expand Down Expand Up @@ -62,12 +62,17 @@ def __init__(self, cardinal: GraphFst):

years_graph = pynutil.insert("year: \"") + graph_year + pynutil.insert("\"") + insert_space

graph_dd_mm = days_graph + delete_dash + months_graph
graph_dd_mm = days_graph + (delete_dash | pynini.accep("")) + months_graph

graph_mm_dd = months_graph + delete_dash + days_graph
graph_mm_dd = months_graph + (delete_dash | pynini.accep("")) + days_graph

graph_mm_dd += pynutil.insert(" preserve_order: true ")

# Graph for era
era_graph = pynutil.insert("era: \"") + year_suffix + pynutil.insert("\"") + insert_space

range_graph = pynini.cross("-", "से")

graph_dd_mm_yyyy = (
days_graph + (delete_dash | delete_slash) + months_graph + (delete_dash | delete_slash) + years_graph
)
Expand All @@ -78,7 +83,22 @@ def __init__(self, cardinal: GraphFst):

graph_mm_dd_yyyy += pynutil.insert(" preserve_order: true ")

graph_mm_yyyy = months_graph + delete_dash + years_graph
graph_mm_yyyy = (
months_graph + (delete_dash | pynini.accep("")) + years_graph + pynutil.insert(" preserve_order: true ")
)

graph_year_suffix = era_graph

graph_range = (
pynutil.insert("text: \"")
+ (cardinal.final_graph | graph_year)
+ insert_space
+ range_graph
+ insert_space
+ (cardinal.final_graph | graph_year)
+ pynutil.insert("\"")
+ pynutil.insert(" preserve_order: true ")
)

# default assume dd_mm_yyyy

Expand All @@ -88,6 +108,8 @@ def __init__(self, cardinal: GraphFst):
| pynutil.add_weight(graph_dd_mm_yyyy, -0.001)
| graph_mm_dd_yyyy
| graph_mm_yyyy
| pynutil.add_weight(graph_year_suffix, -0.001)
| pynutil.add_weight(graph_range, -0.005)
)

self.final_graph = final_graph.optimize()
Expand Down
30 changes: 27 additions & 3 deletions nemo_text_processing/text_normalization/hi/taggers/measure.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,20 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
)

# Define the unit handling
self.unit = pynutil.insert("units: \"") + unit_graph + pynutil.insert("\" ")
unit = pynutil.insert("units: \"") + unit_graph + pynutil.insert("\" ")

# Handling symbols like x, X, *
symbol_graph = pynini.string_map([("x", "बाई"), ("X", "बाई"), ("*", "बाई"),])

graph_measurements = (
pynutil.insert("decimal { ")
+ optional_graph_negative
+ decimal_graph
+ pynutil.insert(" }")
+ delete_space
+ self.unit
+ unit
)

graph_measurements |= (
pynutil.insert("cardinal { ")
+ optional_graph_negative
Expand All @@ -62,7 +66,27 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
+ pynutil.insert("\"")
+ pynutil.insert(" }")
+ delete_space
+ self.unit
+ unit
)

# Handling cardinal clubbed with symbol as single token
graph_measurements |= (
pynutil.insert("cardinal { ")
+ optional_graph_negative
+ pynutil.insert("integer: \"")
+ cardinal_graph
+ pynutil.insert("\"")
+ pynutil.insert(" }")
+ pynutil.insert(" units: \"")
+ symbol_graph
+ pynutil.insert("\" ")
+ pynutil.insert("} }")
+ insert_space
+ pynutil.insert("tokens { cardinal { ")
+ optional_graph_negative
+ pynutil.insert("integer: \"")
+ cardinal_graph
+ pynutil.insert("\"")
)

graph = graph_measurements
Expand Down
40 changes: 18 additions & 22 deletions nemo_text_processing/text_normalization/hi/taggers/money.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,39 +24,35 @@
class MoneyFst(GraphFst):
"""
Finite state transducer for classifying money, suppletive aware, e.g.
₹1 -> money { currency: "रुपए" integer_part: "एक" }
₹1.2 -> money { currency: "रुपए" integer_part: "एक" fractional_part: "दो" }

₹५० -> money { money { currency_maj: "रुपए" integer_part: "पचास" }
₹५०.५० -> money { currency_maj: "रुपए" integer_part: "पचास" fractional_part: "पचास" currency_min: "centiles" }
₹०.५० -> money { currency_maj: "रुपए" integer_part: "शून्य" fractional_part: "पचास" currency_min: "centiles" }
Note that the 'centiles' string is a placeholder to handle by the verbalizer by applying the corresponding minor currency denomination

Args:
cardinal: CardinalFst
decimal: DecimalFst
deterministic: if True will provide a single transduction option,
for False multiple transduction are generated (used for audio-based normalization)
"""

def __init__(self, cardinal: GraphFst, decimal: GraphFst):
def __init__(self, cardinal: GraphFst):
super().__init__(name="money", kind="classify")

cardinal_graph = cardinal.final_graph

optional_graph_negative = pynini.closure(
pynutil.insert("negative: ") + pynini.cross("-", "\"true\"") + insert_space, 0, 1,
)
self.currency = pynutil.insert("currency: \"") + currency_graph + pynutil.insert("\" ")
self.interger = pynutil.insert("integer_part: \"") + cardinal_graph + pynutil.insert("\" ")
self.fraction = pynutil.insert("fractional_part: \"") + cardinal_graph + pynutil.insert("\" ")

graph_currencies = optional_graph_negative + self.currency + insert_space + self.interger
graph_currencies |= (
optional_graph_negative
+ self.currency
+ insert_space
+ self.interger
+ pynutil.delete(".")
+ insert_space
+ self.fraction
currency_major = pynutil.insert('currency_maj: "') + currency_graph + pynutil.insert('"')
integer = pynutil.insert('integer_part: "') + cardinal_graph + pynutil.insert('"')
fraction = pynutil.insert('fractional_part: "') + cardinal_graph + pynutil.insert('"')
currency_minor = pynutil.insert('currency_min: "') + pynutil.insert("centiles") + pynutil.insert('"')

graph_major_only = currency_major + insert_space + integer
graph_major_and_minor = (
currency_major + insert_space + integer + pynini.cross(".", " ") + fraction + insert_space + currency_minor
)
graph = graph_currencies
self.graph = graph.optimize()

graph_currencies = graph_major_only | graph_major_and_minor

graph = graph_currencies.optimize()
final_graph = self.add_tokens(graph)
self.fst = final_graph
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class ClassifyFst(GraphFst):
Final class that composes all other classification grammars. This class can process an entire sentence including punctuation.
For deployment, this grammar will be compiled and exported to OpenFst Finite State Archive (FAR) File.
More details to deployment at NeMo/tools/text_processing_deployment.

Args:
input_case: accepting either "lower_cased" or "cased" input.
deterministic: if True will provide a single transduction option,
Expand All @@ -68,11 +68,11 @@ def __init__(
os.makedirs(cache_dir, exist_ok=True)
whitelist_file = os.path.basename(whitelist) if whitelist else ""
far_file = os.path.join(
cache_dir, f"hi_tn_{deterministic}_deterministic_{input_case}_{whitelist_file}_tokenize.far"
cache_dir, f"hi_tn_{deterministic}_deterministic_{input_case}_{whitelist_file}_tokenize.far",
)
if not overwrite_cache and far_file and os.path.exists(far_file):
self.fst = pynini.Far(far_file, mode="r")["tokenize_and_classify"]
logging.info(f'ClassifyFst.fst was restored from {far_file}.')
logging.info(f"ClassifyFst.fst was restored from {far_file}.")
else:
logging.info(f"Creating ClassifyFst grammars.")

Expand Down Expand Up @@ -107,7 +107,7 @@ def __init__(
logging.debug(f"measure: {time.time() - start_time: .2f}s -- {measure_graph.num_states()} nodes")

start_time = time.time()
money = MoneyFst(cardinal=cardinal, decimal=decimal)
money = MoneyFst(cardinal=cardinal)
money_graph = money.fst
logging.debug(f"money: {time.time() - start_time: .2f}s -- {money_graph.num_states()} nodes")

Expand Down
1 change: 0 additions & 1 deletion nemo_text_processing/text_normalization/hi/taggers/word.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ def __init__(self, punctuation: PunctuationFst, deterministic: bool = True):
*[chr(i) for i in range(ord("ऀ"), ord("ः") + 1)], # Hindi vowels and consonants
*[chr(i) for i in range(ord("अ"), ord("ह") + 1)], # More Hindi characters
*[chr(i) for i in range(ord("ा"), ord("्") + 1)], # Hindi diacritics
*[chr(i) for i in range(ord("०"), ord("९") + 1)], # Hindi digits
).optimize()

# Include punctuation in the graph
Expand Down
Loading