Skip to content

1.2.5 Image/Captions Cleanup

Latest

Choose a tag to compare

@fmacpro fmacpro released this 27 Sep 01:05
· 3 commits to master since this release

Image/Captions Cleanup

  • Added stripImagesForRawText in controllers/textProcessing.js:425 to remove <figure>, <picture>, standalone <img>, and associated captions before the HTML-to-text pass. Raw text now omits image alts/captions while sentence boundary handling stays intact.
  • Reused the shared URL helpers (containsUrlLike, stripUrlsFromText, stripDataUrlsFromText) so both raw and formatted outputs stay URL‑clean without repeating regex logic. getFormattedText also clears data URIs but keeps real HTML links.

Spell Check Alignment

  • controllers/spellCheck.js:1 now imports maskUrlsInText/isUrlLikeToken, masking URL tokens with spaces so offsets and line numbers remain accurate while filtering URL-like misspellings (including data URIs).