Skip to content

Commit f25f034

Browse files
authored
Improve lang validity check (#275)
* Improve lang validity check The list returned by getISOLanguages does not include deprecated language codes that are still accepted to create locales. * Use a one-liner
1 parent 56158de commit f25f034

File tree

2 files changed

+5
-8
lines changed

2 files changed

+5
-8
lines changed

bindings/python/test/test.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,10 @@ def test_invalid_lang():
5656
pyonmttok.Tokenizer("conservative", lang="xxx")
5757

5858

59+
def test_deprecated_lang():
60+
pyonmttok.Tokenizer("conservative", lang="tl")
61+
62+
5963
def test_invalid_sentencepiece_model():
6064
with pytest.raises(ValueError):
6165
pyonmttok.Tokenizer("none", sp_model_path="xxx")

src/unicode/Unicode.cc

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -235,14 +235,7 @@ namespace onmt
235235

236236
bool is_valid_language(const char* language)
237237
{
238-
for (const char* const* available_languages = icu::Locale::getISOLanguages();
239-
*available_languages;
240-
++available_languages)
241-
{
242-
if (strcmp(*available_languages, language) == 0)
243-
return true;
244-
}
245-
return false;
238+
return icu::Locale(language).getISO3Language()[0] != '\0';
246239
}
247240

248241
// The functions below are made backward compatible with the Kangxi and Kanbun script names

0 commit comments

Comments
 (0)