Skip to content

Commit b2c788c

Browse files
authored
Fix space omission when translating into CJK languages (#956)
This PR fixes an issue where we are checking for equality on the language tag for CJK languages, when we really need to be checking if the tag starts with the language tag. Before this PR `zh-Hans` would not match. After this PR it will match.
1 parent 0f2268f commit b2c788c

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

inference/src/translator/annotation.cpp

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,13 @@ void AnnotatedText::appendSentence(string_view prefix, std::vector<string_view>:
3434
annotation.token_begin_.push_back(offset);
3535
}
3636

37+
/// A simple helper function to check if a string starts with a prefix.
38+
/// The std::string object only has a starts_with() method in C++20, which
39+
/// is not what we are currently compiling with.
40+
bool startsWith(string_view prefix, string_view str) {
41+
return str.size() >= prefix.size() && prefix == str.substr(0, prefix.size());
42+
}
43+
3744
bool AnnotatedText::shouldOmitSpaceBetweenSentences() const {
3845
if (targetLanguage_.empty()) {
3946
// The target language is not specified, so we should not make assumptions about
@@ -45,11 +52,11 @@ bool AnnotatedText::shouldOmitSpaceBetweenSentences() const {
4552
// More robustly handle which language tags should omit whitespace between sentences.
4653
return (
4754
// Japanese does not use space between sentences.
48-
targetLanguage_ == "ja" ||
55+
startsWith("ja", targetLanguage_) ||
4956
// Korean does not use space between sentences.
50-
targetLanguage_ == "ko" ||
57+
startsWith("ko", targetLanguage_) ||
5158
// Chinese does not use space between sentences.
52-
targetLanguage_ == "zh"
59+
startsWith("zh", targetLanguage_)
5360
);
5461
}
5562

0 commit comments

Comments
 (0)