Skip to content

Conversation

@Vasuk12
Copy link

@Vasuk12 Vasuk12 commented Nov 29, 2025

Fixes issue #4237
where empty anchor tags like were appearing in GHSA vulnerability details fields.

  • Add _sanitize_anchor_tags() function in osv/sources.py
  • Apply sanitization in parse_vulnerability_from_dict()
  • Add anchor tag removal in markdown() template filter
  • Update frontend_emulator.py to use parse_vulnerability_from_dict()

Fixes issue where empty anchor tags like <a name="executive-summary"></a>
were appearing in GHSA vulnerability details fields. These anchor tags
are used for navigation in the original GHSA advisories but create empty
links when displayed in OSV records. This affects 51 GHSA records,
including NuGet packages.

The fix is implemented at two layers for defense in depth:

1. Data layer (osv/sources.py):
   - Add _sanitize_anchor_tags() function to remove empty anchor tags
     with name attributes using regex pattern matching
   - Apply sanitization in parse_vulnerability_from_dict() to clean
     the details field during vulnerability parsing
   - Ensures anchor tags are removed when GHSA JSON files are imported

2. Display layer (gcp/website/frontend_handlers.py):
   - Add _ANCHOR_TAG_REPLACER regex pattern for anchor tag removal
   - Apply sanitization in markdown() template filter during rendering
   - Provides fallback protection if any anchor tags slip through

3. Emulator update (gcp/website/frontend_emulator.py):
   - Update to use parse_vulnerability_from_dict() instead of direct
     json_format.ParseDict() to ensure sanitization is applied during
     local testing

Testing:
- Verified fix removes all 7 anchor tags from GHSA-hh2w-p6rv-4g7w test case
- Tested with various anchor tag formats (empty, self-closing, with attributes)
- Confirmed regular links and anchor tags with content are preserved
- Local testing performed using direct function tests and file parsing
  (gcloud emulator setup unavailable due to permission issues with
  ~/.config/gcloud directory ownership)

Signed-off-by: Vasu <[email protected]>
Copy link
Contributor

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of issues here:

  • This should only be updating the frontend, we shouldn't touch the records we are importing.
  • I was hoping that we could keep the anchor tags if possible, but I'm not sure that's possible without editing the markdown library itself, so happy to stick with the regex replacement if it's not possible.

@Vasuk12
Copy link
Author

Vasuk12 commented Dec 1, 2025

Couple of issues here:

  • This should only be updating the frontend, we shouldn't touch the records we are importing.
  • I was hoping that we could keep the anchor tags if possible, but I'm not sure that's possible without editing the markdown library itself, so happy to stick with the regex replacement if it's not possible.
  • The latest commit resolves the issue of mutation of imported records
  • The regex element removal is working as intended but if you would like to keep the anchor tags we could add CSS rules to hide them visually while keeping them? I am not sure about this but would love to hear what you think about this!

@another-rex
Copy link
Contributor

/gcbrun

Copy link
Contributor

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure having it deleted before rendering vs adding a css rule would change much here. Let's stick with this for now.

What is the frontend emulator change for?

@Vasuk12 Vasuk12 force-pushed the fix/remove-empty-anchor-tags branch from c38f285 to 9037093 Compare December 2, 2025 04:23
@Vasuk12
Copy link
Author

Vasuk12 commented Dec 2, 2025

I'm not sure having it deleted before rendering vs adding a css rule would change much here. Let's stick with this for now.

What is the frontend emulator change for?

Okay.
Oh thats not related to Anchor tag.The frontend_emulator.py change was originally to include anchor tag sanitization via parse_vulnerability_from_dict(). Since we removed sanitization from that function, the emulator change is not needed anymore. Sorry about that I have reverted it back. I thought we could keep it as a code improvement as this includes some schema validation as well.

@Vasuk12 Vasuk12 force-pushed the fix/remove-empty-anchor-tags branch 2 times, most recently from 015c0d6 to 38997f7 Compare December 2, 2025 04:30
@Vasuk12 Vasuk12 force-pushed the fix/remove-empty-anchor-tags branch from 38997f7 to e33a424 Compare December 2, 2025 04:33
another-rex
another-rex previously approved these changes Dec 2, 2025
@another-rex
Copy link
Contributor

I see, I don't mind keeping it, just wanted to know what the functional difference is. Feel free to add it back in or not.

@another-rex
Copy link
Contributor

another-rex commented Dec 2, 2025

/gcbrun

@another-rex
Copy link
Contributor

another-rex commented Dec 2, 2025

Seems like there's a linting error for too long line. I think it's fine to add an exception for that, since we want the regex on one line:

# pylint: disable=line-too-long

You should be able to run make lint to check locally as well.

@Vasuk12
Copy link
Author

Vasuk12 commented Dec 2, 2025

I see, I don't mind keeping it, just wanted to know what the functional difference is. Feel free to add it back in or not.

oh i will add it back then :). Umm i saw that for the function parse_vulnerability_from_dict() there are no unit tests although it is tested indirectly through integration tests. I could implement a unit test for this particular function or is not needed?

@Vasuk12
Copy link
Author

Vasuk12 commented Dec 2, 2025

Seems like there's a linting error for too long line. I think it's fine to add an exception for that, since we want the regex on one line:

# pylint: disable=line-too-long

You should be able to run make lint to check locally as well.

I ran 'make lint' and pylint disable comment is working but there is another error where there is a unused import in frontend_emulator.py at line 18 'from google.protobuf import json_format'. Should I remove this?

@another-rex
Copy link
Contributor

Feel free to add the unit tests, that would be great!

I ran 'make lint' and pylint disable comment is working but there is another error where there is a unused import in frontend_emulator.py at line 18 'from google.protobuf import json_format'. Should I remove this?

Huh, I'm surprised it doesn't show up in the CI pylint. I would not change that for now, and let's see if CI passes.

Added inline pylint disable comment for line-too-long warning on the
anchor tag regex pattern in frontend_handlers.py.

Signed-off-by: Vasu Khare <[email protected]>
@Vasuk12
Copy link
Author

Vasuk12 commented Dec 2, 2025

Feel free to add the unit tests, that would be great!

I ran 'make lint' and pylint disable comment is working but there is another error where there is a unused import in frontend_emulator.py at line 18 'from google.protobuf import json_format'. Should I remove this?

Huh, I'm surprised it doesn't show up in the CI pylint. I would not change that for now, and let's see if CI passes.

Added the pylint comment in this commit. Let me know if lint check is working now. Should I implement the unit tests in this PR only or would you like me to create another PR? Also are there any specific functions you have in mind other than 'parse_vulnerability_from_dict()'.

@another-rex
Copy link
Contributor

Please do so in another PR :)

/gcbrun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants