Skip to content

Commit 2428a4f

Browse files
committed
Tamil flagged word integration
1 parent ea6a811 commit 2428a4f

File tree

2 files changed

+14
-14
lines changed

2 files changed

+14
-14
lines changed

ac_dc/flagged_words.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1512,6 +1512,19 @@
15121512
"xxx",
15131513
"ânus",
15141514
],
1515+
"ta": english_flagged_words
1516+
+ [
1517+
"ஓதா",
1518+
"ஒத்தா",
1519+
"புண்டை",
1520+
"ஒம்மாளே",
1521+
"பக்கி",
1522+
"கூமுட்டை",
1523+
"கருமம்",
1524+
"சனியன்",
1525+
"கஸ்மாலம்",
1526+
"சூத்து",
1527+
],
15151528
"te": english_flagged_words
15161529
+ [
15171530
"గర్భస్రావం",
@@ -2094,17 +2107,4 @@
20942107
"龟儿子",
20952108
"龟头",
20962109
],
2097-
"tam": english_flagged_words
2098-
+ [
2099-
"ஓதா",
2100-
"ஒத்தா",
2101-
"புண்டை",
2102-
"ஒம்மாளே",
2103-
"பக்கி",
2104-
"கூமுட்டை",
2105-
"கருமம்",
2106-
"சனியன்",
2107-
"கஸ்மாலம்",
2108-
"சூத்து",
2109-
],
21102110
}

ac_dc/languages_id.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@
177177
"lang": "Tamil",
178178
"dataset_id": "ta",
179179
"stopwords_id": None,
180-
"flagged_words_id": None,
180+
"flagged_words_id": "ta",
181181
"fasttext_id": "ta",
182182
"sentencepiece_id": "ta",
183183
"kenlm_id": "ta",

0 commit comments

Comments
 (0)