Skip to content

Commit 2b6c52c

Browse files
committed
add threshold for check_chinese_space_spam to avoid false positive
1 parent 1e4844b commit 2b6c52c

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

app/services/rule_based_classifier.rb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ def classify
1717
private
1818

1919
def check_chinese_spacing_spam
20+
min_chinese_chars = 5
2021
# This pattern specifically looks for a Chinese character, followed by a space,
2122
# and then another Chinese character like this
2223
# 跟 单 像 捡 钱 ! 再 不 进 群 是 傻 狗 !
@@ -31,7 +32,7 @@ def check_chinese_spacing_spam
3132
threshold = Rails.application.config.chinese_space_spam_threshold
3233
ratio = chinese_chars > 0 ? spaced_chinese_words_count.to_f / chinese_chars : 0.0
3334

34-
if ratio > threshold
35+
if ratio > threshold && chinese_chars >= min_chinese_chars
3536
Rails.logger.info "Classified as spam due to high Chinese character spacing ratio: #{ratio}"
3637
return Shared::ClassificationResult.new(is_spam: true, target: "message_content")
3738
end

vendor/dictionaries/user.dict.utf8

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -299,4 +299,13 @@ K 线
299299
冲鸭
300300
不良林
301301
约啪
302-
舔狗
302+
舔狗
303+
氪金号
304+
淘汰榜
305+
港澳通行证
306+
黑花
307+
川普
308+
学生党
309+
分享群
310+
冷静期
311+
大魔王

0 commit comments

Comments
 (0)