Skip to content

Commit 9b240f2

Browse files
committed
fix: remove empty anchor tags from GHSA vulnerability details
Fixes issue where empty anchor tags like <a name="executive-summary"></a> were appearing in GHSA vulnerability details fields. These anchor tags are used for navigation in the original GHSA advisories but create empty links when displayed in OSV records. This affects 51 GHSA records, including NuGet packages. The fix is implemented at two layers for defense in depth: 1. Data layer (osv/sources.py): - Add _sanitize_anchor_tags() function to remove empty anchor tags with name attributes using regex pattern matching - Apply sanitization in parse_vulnerability_from_dict() to clean the details field during vulnerability parsing - Ensures anchor tags are removed when GHSA JSON files are imported 2. Display layer (gcp/website/frontend_handlers.py): - Add _ANCHOR_TAG_REPLACER regex pattern for anchor tag removal - Apply sanitization in markdown() template filter during rendering - Provides fallback protection if any anchor tags slip through 3. Emulator update (gcp/website/frontend_emulator.py): - Update to use parse_vulnerability_from_dict() instead of direct json_format.ParseDict() to ensure sanitization is applied during local testing Testing: - Verified fix removes all 7 anchor tags from GHSA-hh2w-p6rv-4g7w test case - Tested with various anchor tag formats (empty, self-closing, with attributes) - Confirmed regular links and anchor tags with content are preserved - Local testing performed using direct function tests and file parsing (gcloud emulator setup unavailable due to permission issues with ~/.config/gcloud directory ownership) Signed-off-by: Vasu <[email protected]>
1 parent 393b8fb commit 9b240f2

File tree

3 files changed

+19
-2
lines changed

3 files changed

+19
-2
lines changed

gcp/website/frontend_emulator.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,9 +90,9 @@ def _dict_to_vuln(data: object,
9090
if not vuln_id:
9191
return None
9292

93-
vulnerability = vulnerability_pb2.Vulnerability()
9493
try:
95-
json_format.ParseDict(data, vulnerability, ignore_unknown_fields=True)
94+
vulnerability = sources.parse_vulnerability_from_dict(
95+
data, strict=False)
9696
except Exception as error:
9797
print(f'[emulator] Failed to convert entry in {path}: {error}')
9898
return None

gcp/website/frontend_handlers.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -834,6 +834,9 @@ def sort_versions(versions: list[str], ecosystem: str) -> list[str]:
834834
# with
835835
# <a href="https://chromium.googlesource.com/v8/v8.git/+/refs/heads/beta">
836836
_URL_MARKDOWN_REPLACER = re.compile(r'(<a href=\".*?)(/ /)(.*?\">)')
837+
_ANCHOR_TAG_REPLACER = re.compile(
838+
r'<a\s+[^>]*name=["\'][^"\']*["\'][^>]*>\s*</a>|<a\s+[^>]*name=["\'][^"\']*["\'][^>]*/>',
839+
re.IGNORECASE)
837840

838841

839842
@blueprint.app_template_filter('markdown')
@@ -852,6 +855,7 @@ def markdown(text):
852855
# space rather than %2B
853856
# See: https://github.com/trentm/python-markdown2/issues/621
854857
md = _URL_MARKDOWN_REPLACER.sub(r'\1/+/\3', md)
858+
md = _ANCHOR_TAG_REPLACER.sub('', md)
855859

856860
return md
857861

osv/sources.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import hashlib
1818
import logging
1919
import os
20+
import re
2021

2122
import jsonschema
2223
import pygit2
@@ -162,9 +163,21 @@ def _get_nested_vulnerability(data, key_path=None):
162163
return data
163164

164165

166+
def _sanitize_anchor_tags(text):
167+
if not text or not isinstance(text, str):
168+
return text
169+
pattern = r'<a\s+[^>]*name=["\'][^"\']*["\'][^>]*>\s*</a>|<a\s+[^>]*name=["\'][^"\']*["\'][^>]*/>'
170+
return re.sub(pattern, '', text, flags=re.IGNORECASE)
171+
172+
165173
def parse_vulnerability_from_dict(data, key_path=None, strict=False):
166174
"""Parse vulnerability from dict."""
167175
data = _get_nested_vulnerability(data, key_path)
176+
177+
# Sanitize anchor tags from details field if present
178+
if isinstance(data, dict) and 'details' in data and data['details']:
179+
data['details'] = _sanitize_anchor_tags(data['details'])
180+
168181
try:
169182
jsonschema.validate(data, load_schema())
170183
except jsonschema.exceptions.ValidationError as e:

0 commit comments

Comments
 (0)