You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Handy functions for NLP tasks in Taiwanese Hokkien.
223
239
240
+
`to_traditional` function converts input to Traditional Chinese characters that are used in the dataset. Also accounts for different variants of Traditional Chinese characters.
241
+
242
+
`to_simplified` function converts input to Simplified Chinese characters.
243
+
244
+
`is_cjk` function checks whether the input string consists entirely of Chinese characters.
245
+
224
246
```python
225
-
# Convert to Traditional
226
247
to_traditional(input)
227
248
228
-
# Convert to Simplified
229
249
to_simplified(input)
230
250
231
-
# Check if the string is fully composed of Chinese characters
232
251
is_cjk(input)
233
252
```
234
253
@@ -283,20 +302,20 @@ c.get("先生講,學生恬恬聽。")
283
302
284
303
## Sandhi
285
304
c = Converter() # for Tailo, sandhi none by default
286
-
c.get("這是台灣囡仔")
287
-
>> Tse sī Tâi-uân gín-á
305
+
c.get("這是你的茶桌仔無")
306
+
>> Tse sī lí ê tê-toh-á bô
288
307
289
308
c = Converter(sandhi='auto')
290
-
c.get("這是台灣囡仔")
291
-
>> Tse sì Tāi-uān gin-á
309
+
c.get("這是你的茶桌仔無")
310
+
>> Tse sì li ē tē-to-á bô
292
311
293
312
c = Converter(sandhi='exc_last')
294
-
c.get("這是台灣囡仔")
295
-
>> Tsē sì Tāi-uān gin-á
313
+
c.get("這是你的茶桌仔無")
314
+
>> Tsē sì li ē tē-tó-a bô
296
315
297
316
c = Converter(sandhi='incl_last')
298
-
c.get("這是台灣囡仔")
299
-
>> Tsē sì Tāi-uān gin-a
317
+
c.get("這是你的茶桌仔無")
318
+
>> Tsē sì li ē tē-tó-a bō
300
319
301
320
## Punctuation
302
321
c = Converter() # format punctuation default
@@ -308,11 +327,11 @@ c.get("太空朋友,恁好!恁食飽未?")
308
327
>> thài-khong pîng-iú,lín-hó!lín tsia̍h-pá buē?
309
328
310
329
## Convert non-CJK
311
-
c =Convert(system='Zhuyin') # False convert_non_cjk default
330
+
c =Converter(system='Zhuyin') # False convert_non_cjk default
312
331
c.get("我食pháng")
313
332
>> ㆣㄨㄚˋ ㄐㄧㄚㆷ˙ pháng
314
333
315
-
c =Convert(system='Zhuyin', convert_non_cjk=True)
334
+
c =Converter(system='Zhuyin', convert_non_cjk=True)
0 commit comments