Skip to content

Commit 3d3cb80

Browse files
authored
fix(tn): 300w张 50000票 (#156)
1 parent e91e6f8 commit 3d3cb80

File tree

3 files changed

+6
-2
lines changed

3 files changed

+6
-2
lines changed

tn/chinese/data/measure/units_zh.tsv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
平方
88
立方
99
公里
10+
1011
1112
1213

tn/chinese/rules/measure.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
from tn.processor import Processor
1717

1818
from pynini import accep, cross, string_file
19-
from pynini.lib.pynutil import delete, insert
19+
from pynini.lib.pynutil import delete, insert, add_weight
2020

2121

2222
class Measure(Processor):
@@ -29,7 +29,8 @@ def __init__(self):
2929
def build_tagger(self):
3030
units_en = string_file('tn/chinese/data/measure/units_en.tsv')
3131
units_zh = string_file('tn/chinese/data/measure/units_zh.tsv')
32-
units = units_en | units_zh
32+
units = add_weight((cross("k", "千") | cross("w", "万")), 0.1).ques + \
33+
(units_en | units_zh)
3334
rmspace = delete(' ').ques
3435
to = cross('-', '到') | cross('~', '到') | accep('到')
3536

tn/chinese/test/data/normalizer.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,5 @@ B2B => B to B
4040
给12315打个电话 => 给幺二三幺五打个电话
4141
人均200以内 => 人均两百以内
4242
当场票数≥100万 => 当场票数大于等于一百万
43+
独得300w张 => 独得三百万张
44+
面积是10km² => 面积是十平方千米

0 commit comments

Comments
 (0)