Skip to content

Commit 064429e

Browse files
committed
Fix false positive Chinese detection
The commit-msg hook was incorrectly flagging Unicode symbols like ✓ and ✗ as "Chinese characters" due to improper locale handling with LC_ALL=C. It uses Python3 with proper Unicode regex '[\u4e00-\u9fff]' for accurate detection adds fallback logic for systems without Python3, filtering out known Unicode symbols from false positive matches. Change-Id: I9b6b446b8f6df7a4b081f51c53204d3c97274607
1 parent b52c8ee commit 064429e

File tree

1 file changed

+23
-4
lines changed

1 file changed

+23
-4
lines changed

scripts/commit-msg.hook

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -489,10 +489,29 @@ validate_commit_message() {
489489
# ------------------------------------------------------------------------------
490490

491491
# Alert if the commit message appears to be written in Chinese.
492-
# This pattern matches any Chinese character (common CJK Unified Ideographs).
493-
MISSPELLED_WORDS=$(echo "$FULL_COMMIT_MSG" | LC_ALL=C grep "[一-龥]")
494-
if [ -n "$MISSPELLED_WORDS" ]; then
495-
add_warning 1 "Commit message appears to be written in Chinese: $MISSPELLED_WORDS"
492+
# This pattern matches Chinese CJK Unified Ideographs range.
493+
# Use proper UTF-8 handling to avoid false positives with Unicode symbols like ✓ and ✗.
494+
if command -v python3 >/dev/null 2>&1; then
495+
CHINESE_TEXT=$(echo "$FULL_COMMIT_MSG" | python3 -c "
496+
import sys, re
497+
text = sys.stdin.read()
498+
chinese_chars = re.findall(r'[\u4e00-\u9fff]', text)
499+
if chinese_chars:
500+
print(''.join(chinese_chars))
501+
")
502+
else
503+
# Fallback: Only detect actual Chinese character patterns if python3 is not available
504+
CHINESE_TEXT=$(echo "$FULL_COMMIT_MSG" | grep -o "[一-龥]" 2>/dev/null || echo "")
505+
# Filter out false positives by checking if the detected text contains actual Chinese words
506+
if [ -n "$CHINESE_TEXT" ] && ! echo "$CHINESE_TEXT" | grep -q "[✓✗]"; then
507+
CHINESE_TEXT="$CHINESE_TEXT"
508+
else
509+
CHINESE_TEXT=""
510+
fi
511+
fi
512+
513+
if [ -n "$CHINESE_TEXT" ]; then
514+
add_warning 1 "Commit message appears to be written in Chinese: $CHINESE_TEXT"
496515
fi
497516

498517
MSG_FOR_SPELLCHECK_LINE_FINDING=$(echo "$FULL_COMMIT_MSG_WITH_SPACE" | sed -E \

0 commit comments

Comments
 (0)