-
Notifications
You must be signed in to change notification settings - Fork 257
fix: remove empty anchor tags from GHSA vulnerability details #4431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Fixes issue where empty anchor tags like <a name="executive-summary"></a>
were appearing in GHSA vulnerability details fields. These anchor tags
are used for navigation in the original GHSA advisories but create empty
links when displayed in OSV records. This affects 51 GHSA records,
including NuGet packages.
The fix is implemented at two layers for defense in depth:
1. Data layer (osv/sources.py):
- Add _sanitize_anchor_tags() function to remove empty anchor tags
with name attributes using regex pattern matching
- Apply sanitization in parse_vulnerability_from_dict() to clean
the details field during vulnerability parsing
- Ensures anchor tags are removed when GHSA JSON files are imported
2. Display layer (gcp/website/frontend_handlers.py):
- Add _ANCHOR_TAG_REPLACER regex pattern for anchor tag removal
- Apply sanitization in markdown() template filter during rendering
- Provides fallback protection if any anchor tags slip through
3. Emulator update (gcp/website/frontend_emulator.py):
- Update to use parse_vulnerability_from_dict() instead of direct
json_format.ParseDict() to ensure sanitization is applied during
local testing
Testing:
- Verified fix removes all 7 anchor tags from GHSA-hh2w-p6rv-4g7w test case
- Tested with various anchor tag formats (empty, self-closing, with attributes)
- Confirmed regular links and anchor tags with content are preserved
- Local testing performed using direct function tests and file parsing
(gcloud emulator setup unavailable due to permission issues with
~/.config/gcloud directory ownership)
Signed-off-by: Vasu <[email protected]>
another-rex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of issues here:
- This should only be updating the frontend, we shouldn't touch the records we are importing.
- I was hoping that we could keep the anchor tags if possible, but I'm not sure that's possible without editing the markdown library itself, so happy to stick with the regex replacement if it's not possible.
Signed-off-by: Vasu <[email protected]>
|
|
/gcbrun |
another-rex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure having it deleted before rendering vs adding a css rule would change much here. Let's stick with this for now.
What is the frontend emulator change for?
c38f285 to
9037093
Compare
Okay. |
015c0d6 to
38997f7
Compare
Fixes google#4237 Signed-off-by: Vasu <[email protected]>
Fixes google#4237 Signed-off-by: Vasu <[email protected]>
38997f7 to
e33a424
Compare
Signed-off-by: Vasu <[email protected]>
|
I see, I don't mind keeping it, just wanted to know what the functional difference is. Feel free to add it back in or not. |
|
/gcbrun |
|
Seems like there's a linting error for too long line. I think it's fine to add an exception for that, since we want the regex on one line: You should be able to run |
oh i will add it back then :). Umm i saw that for the function parse_vulnerability_from_dict() there are no unit tests although it is tested indirectly through integration tests. I could implement a unit test for this particular function or is not needed? |
I ran 'make lint' and pylint disable comment is working but there is another error where there is a unused import in frontend_emulator.py at line 18 'from google.protobuf import json_format'. Should I remove this? |
|
Feel free to add the unit tests, that would be great!
Huh, I'm surprised it doesn't show up in the CI pylint. I would not change that for now, and let's see if CI passes. |
Added inline pylint disable comment for line-too-long warning on the anchor tag regex pattern in frontend_handlers.py. Signed-off-by: Vasu Khare <[email protected]>
Added the pylint comment in this commit. Let me know if lint check is working now. Should I implement the unit tests in this PR only or would you like me to create another PR? Also are there any specific functions you have in mind other than 'parse_vulnerability_from_dict()'. |
|
Please do so in another PR :) /gcbrun |
Fixes issue #4237
where empty anchor tags like were appearing in GHSA vulnerability details fields.