Skip to content

Commit 620d774

Browse files
authored
FIX: Skip images and quotes when sending for language detection (#195)
We're seeing some bugs where if a post starts with a image or quote, the translator provider thinks the post is english or the language of the quoted words. This PR strips lightboxes and quotes when sending for language detection.
1 parent 346d47c commit 620d774

File tree

2 files changed

+12
-3
lines changed

2 files changed

+12
-3
lines changed

app/services/discourse_translator/base.rb

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,7 @@ def self.language_supported?(detected_lang)
7474

7575
def self.strip_tags_for_detection(detection_text)
7676
html_doc = Nokogiri::HTML::DocumentFragment.parse(detection_text)
77-
html_doc.css("img").remove
78-
html_doc.css("a.mention,a.lightbox").remove
77+
html_doc.css("img", "aside.quote", "div.lightbox-wrapper", "a.mention,a.lightbox").remove
7978
html_doc.to_html
8079
end
8180

spec/services/base_spec.rb

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,13 +51,23 @@ class EmptyTranslator < DiscourseTranslator::Base
5151
expect(DiscourseTranslator::Base.text_for_detection(post)).to eq("")
5252
end
5353

54+
it "strips lightboxes" do
55+
post.cooked = "<div class='lightbox-wrapper' />"
56+
expect(DiscourseTranslator::Base.text_for_detection(post)).to eq("")
57+
end
58+
59+
it "strips quotes" do
60+
post.cooked = "<aside class='quote'>多言語トピック</aside>"
61+
expect(DiscourseTranslator::Base.text_for_detection(post)).to eq("")
62+
end
63+
5464
it "leaves other anchor tags alone" do
5565
cooked = <<~HTML
5666
<p>
5767
<a href="http://cat.com/image.png"></a>
5868
<a class="derp" href="http://cat.com/image.png"></a>
5969
</p>
60-
HTML
70+
HTML
6171
post.cooked = cooked
6272
expect(DiscourseTranslator::Base.text_for_detection(post)).to eq(cooked)
6373
end

0 commit comments

Comments
 (0)