Add more tests for Intl.Locale.prototype.getTextInfo#5023
Open
Add more tests for Intl.Locale.prototype.getTextInfo#5023
Conversation
Add coverage for: - Language subtag has more than three letters. - Script meta data's RTL field is `UNKNOWN`. - Don't default to `"ltr"` when script meta data has no entry for script. - Missing script is added through add-likely-subtags algorithm. - Script subtag is present and script's general ordering of characters is known. - Script subtag doesn't refer to a valid registered script.
Contributor
Author
|
The are various test failures in JSC and V8, because both implementations don't support returning Fixed ICU4C implementation for JSC and V8: enum class ScriptDirection {
Unknown,
LeftToRight,
RightToLeft,
};
// Input: Canonicalized locale with alias mappings already replaced.
//
// Preferably |locale| contains only language-script-region subtags,
// so ICU4C doesn't reject too long locales.
static ScriptDirection GetScriptDirection(const char* locale) {
UErrorCode status = U_ZERO_ERROR;
// Get the script subtag.
char script[ULOC_SCRIPT_CAPACITY] = {};
int32_t scriptLength = uloc_getScript(locale, script, std::size(script), &status);
if (U_FAILURE(status)) {
return ScriptDirection::Unknown;
}
// If no script subtag present, add likely subtags.
if (scriptLength == 0) {
char maximal[ULOC_FULLNAME_CAPACITY] = {};
int32_t maxLength = uloc_addLikelySubtags(locale, maximal, std::size(maximal), &status);
if (U_FAILURE(status)) {
return ScriptDirection::Unknown;
}
scriptLength = uloc_getScript(maximal, script, std::size(script), &status);
// If still no script subtag present, return Unknown.
if (scriptLength == 0) {
return ScriptDirection::Unknown;
}
}
// Get script code from script.
UScriptCode scriptCode = (UScriptCode) u_getPropertyValueEnum(UCHAR_SCRIPT, script);
if (scriptCode == USCRIPT_INVALID_CODE) {
return ScriptDirection::Unknown;
}
if (const char* shortName = uscript_getShortName(scriptCode)) {
// Ignore Unicode aliases from PropertyValueAliases.txt, because they don't
// apply here.
if (std::strcmp(script, shortName)) {
return ScriptDirection::Unknown;
}
}
switch (scriptCode) {
// Marked as UNKNOWN in scriptMetadata.txt.
//
// ICU4C doesn't allow to query all possible "RTL" field values (YES, NO, UNKNOWN),
// so the four scripts with UNKNOWN are hard-coded below.
case USCRIPT_COMMON: // Zyyy
case USCRIPT_INHERITED: // Zinh
case USCRIPT_UNKNOWN: // Zzzz
case USCRIPT_BRAILLE: // Brai
return ScriptDirection::Unknown;
// Not in scriptMetadata.txt
//
// Up to the implementations how to handle these cases. Either return
// UNKNOWN or the correct script direction. But don't return the obviously
// wrong answer, for example don't return left-to-right for "Aran".
case USCRIPT_AFAKA: // Afak (LTR)
case USCRIPT_ARABIC_NASTALIQ: // Aran (RTL)
case USCRIPT_BLISSYMBOLS: // Blis (varies)
case USCRIPT_CIRTH: // Cirt (varies)
case USCRIPT_OLD_CHURCH_SLAVONIC_CYRILLIC: // Cyrs (LTR)
case USCRIPT_DEMOTIC_EGYPTIAN: // Egyd (mixed)
case USCRIPT_HIERATIC_EGYPTIAN: // Egyh (mixed)
case USCRIPT_KHUTSURI: // Geok (LTR)
case USCRIPT_TRADITIONAL_HAN_WITH_LATIN: // Hntl (Hant+Latn)
case USCRIPT_KATAKANA_OR_HIRAGANA: // Hrkt (LTR)
case USCRIPT_HARAPPAN_INDUS: // Inds (RTL)
case USCRIPT_KPELLE: // Kpel (LTR)
case USCRIPT_LATIN_FRAKTUR: // Latf (LTR)
case USCRIPT_LATIN_GAELIC: // Latg (LTR)
case USCRIPT_LOMA: // Loma (LTR)
case USCRIPT_MAYAN_HIEROGLYPHS: // Maya (mixed)
case USCRIPT_MOON: // Moon (mixed)
case USCRIPT_NAKHI_GEBA: // Nkgb (LTR)
case USCRIPT_BOOK_PAHLAVI: // Phlv (mixed)
case USCRIPT_RONGORONGO: // Roro (mixed)
case USCRIPT_SARATI: // Sara (mixed)
case USCRIPT_ESTRANGELO_SYRIAC: // Syre (RTL)
case USCRIPT_WESTERN_SYRIAC: // Syrj (RTL)
case USCRIPT_EASTERN_SYRIAC: // Syrn (RTL)
case USCRIPT_TENGWAR: // Teng (LTR)
case USCRIPT_VISIBLE_SPEECH: // Visp (LTR)
case USCRIPT_WOLEAI: // Wole (LTR)
case USCRIPT_MATHEMATICAL_NOTATION: // Zmth (UNKNOWN)
case USCRIPT_SYMBOLS_EMOJI: // Zsye (UNKNOWN)
case USCRIPT_SYMBOLS: // Zsym (UNKNOWN)
case USCRIPT_UNWRITTEN_LANGUAGES: // Zxxx (UNKNOWN)
return ScriptDirection::Unknown;
default:
break;
}
if (uscript_isRightToLeft(scriptCode)) {
return ScriptDirection::RightToLeft;
}
return ScriptDirection::LeftToRight;
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add coverage for:
UNKNOWN."ltr"when script meta data has no entry for script.