Strip TOC elements from article summaries#3512
Strip TOC elements from article summaries#3512russellballestrini wants to merge 1 commit intogetpelican:mainfrom
Conversation
Automatically remove table of contents divs and toc-backref anchor links from article summaries when displayed outside full article context (e.g., on homepage, in RSS feeds). ReStructuredText automatically generates anchor links in section headings when a table of contents directive is present. These anchors work perfectly on full article pages, but become broken links when article summaries appear on homepage or in feeds - the anchor targets don't exist in that context. This change adds a strip_toc_elements_from_html() function in pelican/utils.py that uses regex to remove: - TOC div blocks (<div class="contents">...</div>) containing broken navigation - toc-backref anchor links from headings while preserving heading text Both removals are necessary since TOC anchor targets don't exist in summary context. The function is called automatically in Content.get_summary() so all summaries are cleaned without requiring configuration or template changes. Includes comprehensive unit tests covering various TOC formats, edge cases, and case-insensitive matching.
a609cec to
3ccc6b2
Compare
|
Any @getpelican/reviewers have a moment to review this PR? Would be greatly appreciated 😊 (Apologies for the delay in reviewing, Russell.) |
|
Seems like it's fixing a real problem (broken links), that's good. It smells a little funny to parse HTML with regex-es instead of a parser. Still, I don't see a bug, because the output of docutils is fairly restrictive (no nested elements, classes in a deterministic order, etc.). I think there's no unit test for the get_summary change. Claude proposes the one below.
|
https://russell.ballestrini.net/pelican-theme-upgrade-right-sidebar-toc/
Automatically remove table of contents divs and toc-backref anchor links from article summaries when displayed outside full article context (e.g., on homepage, in RSS feeds).
ReStructuredText automatically generates anchor links in section headings when a table of contents directive is present. These anchors work perfectly on full article pages, but become broken links when article summaries appear on homepage or in feeds - the anchor targets don't exist in that context.
This change adds a strip_toc_elements_from_html() function in pelican/utils.py that uses regex to remove:
The function is called automatically in Content.get_summary() so all summaries are cleaned without requiring configuration or template changes.
Includes comprehensive unit tests covering various TOC formats, edge cases, and case-insensitive matching.
Pull Request Checklist