-
Notifications
You must be signed in to change notification settings - Fork 217
new extension: rich metadata for SEO #1182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tegration Add new Sphinx extension that injects SEO-optimized metadata into HTML head from frontmatter: Core Features: - Extract frontmatter (description, tags, personas, difficulty, content_type, modality) - Generate standard meta tags (description, keywords, audience) - Generate Open Graph tags for social sharing (Facebook, LinkedIn) - Generate Twitter Card tags for enhanced previews - Generate JSON-LD structured data (schema.org) for search engines - Support product versioning via cascade.product fields Components: - rich_metadata/__init__.py: Main extension with config-inited and html-page-context hooks - rich_metadata/templates/layout.html: Template override for metadata injection - rich_metadata/README.md: Technical overview and features - rich_metadata/USAGE.md: Complete usage guide with examples - rich_metadata/SUMMARY.md: Quick reference guide - rich_metadata/IMPLEMENTATION.md: Architecture and implementation details - rich_metadata/verify_metadata.py: Automated verification script Template handling follows search_assets pattern - templates live within extension folder and are automatically added to Sphinx template search path via config-inited hook. Extension enabled in conf.py and ready for use with existing frontmatter.
… and enhanced titles - Extract frontmatter from markdown files and inject SEO metadata - Support standard meta tags (description, keywords) - Support Open Graph tags (og:description, og:title, og:type, og:url) - Support Twitter Card tags (twitter:description, twitter:title, twitter:card) - Support custom content metadata (audience, difficulty, modality, content_type) - Generate JSON-LD structured data (schema.org Article/TechArticle) - Organize metadata with HTML comments for readability - Enhanced page titles: 'Page: Section - Site | NVIDIA' - Template override for clean title rendering - Suppress warnings for generated pages (genindex, search, etc.) - Add frontmatter to homepage (index.md) - Fix invalid Jinja2 meta tag in search.html template
…l sharing - Remove duplicate metatags rendering (parent theme already renders it) - Use enhanced structured title format for og:title and twitter:title - All titles now consistent: 'Page: Section - Site | NVIDIA' - Improves social sharing context on Facebook, Twitter, LinkedIn
Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new Sphinx extension (rich_metadata) that programmatically enriches HTML meta tags across all documentation pages by extracting metadata from frontmatter. The extension generates SEO-optimized tags including Open Graph, Twitter Cards, JSON-LD structured data, and custom content metadata.
Key Changes:
- Implements a comprehensive SEO metadata injection system that reads YAML frontmatter from documentation pages
- Adds frontmatter to the documentation home page (
docs/index.md) with metadata fields for description, tags, personas, difficulty, content type, and modality - Creates a verification utility to validate metadata injection in built HTML files
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
docs/index.md |
Adds frontmatter with comprehensive metadata (description, tags, personas, difficulty, content type, modality) to enable SEO features on the home page |
docs/conf.py |
Enables the new rich_metadata extension in Sphinx configuration |
docs/_extensions/search_assets/templates/search.html |
Reformats HTML markup (code style changes only, no functional changes) |
docs/_extensions/rich_metadata/verify_metadata.py |
Implements verification script to validate metadata injection in built HTML files |
docs/_extensions/rich_metadata/templates/layout.html |
Provides Jinja2 template override to inject enhanced page titles and metadata into HTML head |
docs/_extensions/rich_metadata/__init__.py |
Core extension implementation that extracts frontmatter, builds meta tags (standard, Open Graph, Twitter, custom), generates JSON-LD structured data, and injects into page context |
Co-authored-by: Copilot <[email protected]> Signed-off-by: L.B. <[email protected]>
…elper functions _add_basic_fields, _add_opengraph_fields, _add_twitter_fields, _add_custom_fields
- Replace aliased errors with OSError in frontmatter extraction - Refactor _add_custom_fields into smaller functions to reduce complexity - Remove unnecessary variable assignment before return in build_meta_tags - Break down verify_html_file into helper functions (_display_meta_tags, _display_json_ld, _display_no_metadata_help) - Add return type annotation to main() function - Remove trailing whitespace from blank lines throughout verify_metadata.py
sarahyurick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool concept, PR looks nice to me from a high level overview. Happy to help unblock.
| structured_data["publisher"] = { | ||
| "@type": "Organization", | ||
| "name": "NVIDIA Corporation", | ||
| "url": "https://www.nvidia.com", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could like to https://www.nvidia.com/en-us/ai-data-science/products/nemo/ or something more NVIDIA NeMo specific, but ultimately I don't have a strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great feedback. i'll look into maybe being able to set this from the confpy in the next version so it isn't in the extension
* docs(extensions): add rich metadata SEO extension with frontmatter integration Add new Sphinx extension that injects SEO-optimized metadata into HTML head from frontmatter: Core Features: - Extract frontmatter (description, tags, personas, difficulty, content_type, modality) - Generate standard meta tags (description, keywords, audience) - Generate Open Graph tags for social sharing (Facebook, LinkedIn) - Generate Twitter Card tags for enhanced previews - Generate JSON-LD structured data (schema.org) for search engines - Support product versioning via cascade.product fields Components: - rich_metadata/__init__.py: Main extension with config-inited and html-page-context hooks - rich_metadata/templates/layout.html: Template override for metadata injection - rich_metadata/README.md: Technical overview and features - rich_metadata/USAGE.md: Complete usage guide with examples - rich_metadata/SUMMARY.md: Quick reference guide - rich_metadata/IMPLEMENTATION.md: Architecture and implementation details - rich_metadata/verify_metadata.py: Automated verification script Template handling follows search_assets pattern - templates live within extension folder and are automatically added to Sphinx template search path via config-inited hook. Extension enabled in conf.py and ready for use with existing frontmatter. * docs(rich_metadata): add SEO metadata extension with organized output and enhanced titles - Extract frontmatter from markdown files and inject SEO metadata - Support standard meta tags (description, keywords) - Support Open Graph tags (og:description, og:title, og:type, og:url) - Support Twitter Card tags (twitter:description, twitter:title, twitter:card) - Support custom content metadata (audience, difficulty, modality, content_type) - Generate JSON-LD structured data (schema.org Article/TechArticle) - Organize metadata with HTML comments for readability - Enhanced page titles: 'Page: Section - Site | NVIDIA' - Template override for clean title rendering - Suppress warnings for generated pages (genindex, search, etc.) - Add frontmatter to homepage (index.md) - Fix invalid Jinja2 meta tag in search.html template * docs(rich_metadata): fix duplicates and use enhanced titles for social sharing - Remove duplicate metatags rendering (parent theme already renders it) - Use enhanced structured title format for og:title and twitter:title - All titles now consistent: 'Page: Section - Site | NVIDIA' - Improves social sharing context on Facebook, Twitter, LinkedIn * rich metadata Signed-off-by: Lawrence Lane <[email protected]> * docs: rich metadata for seo Signed-off-by: Lawrence Lane <[email protected]> * Update docs/_extensions/rich_metadata/__init__.py Co-authored-by: Copilot <[email protected]> Signed-off-by: L.B. <[email protected]> * docs(extensions): refactor build_meta_tags to reduce complexity via helper functions _add_basic_fields, _add_opengraph_fields, _add_twitter_fields, _add_custom_fields * docs(rich_metadata): fix linter errors in extension code - Replace aliased errors with OSError in frontmatter extraction - Refactor _add_custom_fields into smaller functions to reduce complexity - Remove unnecessary variable assignment before return in build_meta_tags - Break down verify_html_file into helper functions (_display_meta_tags, _display_json_ld, _display_no_metadata_help) - Add return type annotation to main() function - Remove trailing whitespace from blank lines throughout verify_metadata.py * docs(rich_metadata): fix quote style to use double quotes per Ruff Q000 --------- Signed-off-by: Lawrence Lane <[email protected]> Signed-off-by: L.B. <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: Lawrence Lane <[email protected]>
* docs(extensions): add rich metadata SEO extension with frontmatter integration Add new Sphinx extension that injects SEO-optimized metadata into HTML head from frontmatter: Core Features: - Extract frontmatter (description, tags, personas, difficulty, content_type, modality) - Generate standard meta tags (description, keywords, audience) - Generate Open Graph tags for social sharing (Facebook, LinkedIn) - Generate Twitter Card tags for enhanced previews - Generate JSON-LD structured data (schema.org) for search engines - Support product versioning via cascade.product fields Components: - rich_metadata/__init__.py: Main extension with config-inited and html-page-context hooks - rich_metadata/templates/layout.html: Template override for metadata injection - rich_metadata/README.md: Technical overview and features - rich_metadata/USAGE.md: Complete usage guide with examples - rich_metadata/SUMMARY.md: Quick reference guide - rich_metadata/IMPLEMENTATION.md: Architecture and implementation details - rich_metadata/verify_metadata.py: Automated verification script Template handling follows search_assets pattern - templates live within extension folder and are automatically added to Sphinx template search path via config-inited hook. Extension enabled in conf.py and ready for use with existing frontmatter. * docs(rich_metadata): add SEO metadata extension with organized output and enhanced titles - Extract frontmatter from markdown files and inject SEO metadata - Support standard meta tags (description, keywords) - Support Open Graph tags (og:description, og:title, og:type, og:url) - Support Twitter Card tags (twitter:description, twitter:title, twitter:card) - Support custom content metadata (audience, difficulty, modality, content_type) - Generate JSON-LD structured data (schema.org Article/TechArticle) - Organize metadata with HTML comments for readability - Enhanced page titles: 'Page: Section - Site | NVIDIA' - Template override for clean title rendering - Suppress warnings for generated pages (genindex, search, etc.) - Add frontmatter to homepage (index.md) - Fix invalid Jinja2 meta tag in search.html template * docs(rich_metadata): fix duplicates and use enhanced titles for social sharing - Remove duplicate metatags rendering (parent theme already renders it) - Use enhanced structured title format for og:title and twitter:title - All titles now consistent: 'Page: Section - Site | NVIDIA' - Improves social sharing context on Facebook, Twitter, LinkedIn * rich metadata Signed-off-by: Lawrence Lane <[email protected]> * docs: rich metadata for seo Signed-off-by: Lawrence Lane <[email protected]> * Update docs/_extensions/rich_metadata/__init__.py Co-authored-by: Copilot <[email protected]> Signed-off-by: L.B. <[email protected]> * docs(extensions): refactor build_meta_tags to reduce complexity via helper functions _add_basic_fields, _add_opengraph_fields, _add_twitter_fields, _add_custom_fields * docs(rich_metadata): fix linter errors in extension code - Replace aliased errors with OSError in frontmatter extraction - Refactor _add_custom_fields into smaller functions to reduce complexity - Remove unnecessary variable assignment before return in build_meta_tags - Break down verify_html_file into helper functions (_display_meta_tags, _display_json_ld, _display_no_metadata_help) - Add return type annotation to main() function - Remove trailing whitespace from blank lines throughout verify_metadata.py * docs(rich_metadata): fix quote style to use double quotes per Ruff Q000 --------- Signed-off-by: Lawrence Lane <[email protected]> Signed-off-by: L.B. <[email protected]> Co-authored-by: Copilot <[email protected]>
programatically extends and enriches the meta tags across all docs pages by reading the frontmatter we installed during the docs refactor. This change will arguably make nemo curator the first truly seo-optimized Sphinx-based docs site that I know of at NVIDIA. This will also enable other cross-cutting docs initiatives by providing an early working example of what should be "baked in" to default sphinx builds for docs.
Before
After