Skip to content

Conversation

@nattsw
Copy link
Contributor

@nattsw nattsw commented Jan 17, 2025

Besides removing images, we also want to make sure that we truncate the text after removing the image, otherwise text sent for detection would be empty.

e.g. a cooked post that looks like that

<p></p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://asd.cloudfront.net/original/4X/c/d/d/asd.jpeg\" data-download-href=\"/uploads/short-url/asd.jpeg?dl=1\" title=\"IMG_20928\"><img src=\"https://asd.asd.net/optimized/4X/c/d/d/asd.jpeg\" alt=\"IMG_2029\" data-base62-sha1=\"asd\" width=\"666\" height=\"500\" srcset=\"https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 1.5x, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 2x\" data-dominant-color=\"767065\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">IMG_2029</span><span class=\"informations\">1920×1440 742 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg>\n</div></a></div>\n<p>L’església romànica de Santa Margarida.</p>

should strip the div.lightbox and send <p>L’església romànica de Santa Margarida.</p> but is sending <p></p> now due to the mis-order.

Copy link
Contributor

@Drenmi Drenmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@nattsw nattsw merged commit 97edd7d into main Jan 17, 2025
3 checks passed
@nattsw nattsw deleted the strip-before-truncate branch January 17, 2025 08:47
nattsw added a commit that referenced this pull request Jan 22, 2025
Besides removing images, we also want to make sure that we truncate the text _after_ removing the image, otherwise text sent for detection would be empty.

e.g. a cooked post that looks like that

`<p></p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://asd.cloudfront.net/original/4X/c/d/d/asd.jpeg\" data-download-href=\"/uploads/short-url/asd.jpeg?dl=1\" title=\"IMG_20928\"><img src=\"https://asd.asd.net/optimized/4X/c/d/d/asd.jpeg\" alt=\"IMG_2029\" data-base62-sha1=\"asd\" width=\"666\" height=\"500\" srcset=\"https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 1.5x, https://asd.cloudfront.net/optimized/4X/c/d/d/asd.jpeg 2x\" data-dominant-color=\"767065\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">IMG_2029</span><span class=\"informations\">1920×1440 742 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg>\n</div></a></div>\n<p>L’església romànica de Santa Margarida.</p>`

should strip the `div.lightbox` and send `<p>L’església romànica de Santa Margarida.</p>` but is sending `<p></p>` now due to the mis-order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants