Skip to content

Commit 2c0e38f

Browse files
authored
fix(tn): 全角数字 (#157)
1 parent 3d3cb80 commit 2c0e38f

File tree

5 files changed

+24
-1
lines changed

5 files changed

+24
-1
lines changed

tn/chinese/data/number/digit.tsv

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,12 @@
77
7
88
8
99
9
10+
11+
12+
13+
14+
15+
16+
17+
18+

tn/chinese/data/number/teen.tsv

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,12 @@
77
7
88
8
99
9
10+
11+
12+
13+
14+
15+
16+
17+
18+

tn/chinese/data/number/zero.tsv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
0
2+

tn/chinese/rules/cardinal.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ def build_tagger(self):
3434
sign = string_file('tn/chinese/data/number/sign.tsv')
3535
dot = string_file('tn/chinese/data/number/dot.tsv')
3636

37-
rmzero = delete('0')
37+
rmzero = delete('0') | delete('0')
3838
rmpunct = delete(',').ques
3939
digits = zero | digit
4040
self.digits = digits

tn/chinese/test/data/normalizer.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,7 @@ B2B => B to B
4242
当场票数≥100万 => 当场票数大于等于一百万
4343
独得300w张 => 独得三百万张
4444
面积是10km² => 面积是十平方千米
45+
仅仅是2015年 => 仅仅是二零一五年
46+
包含3000余件 => 包含三千余件
47+
查处450余名 => 查处四百五十余名
48+
查处450余名 => 查处四百五十余名

0 commit comments

Comments
 (0)