-
Notifications
You must be signed in to change notification settings - Fork 0
Add HTML sanitizer for translated message resources #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-html-sanitizer-baseline
Are you sure you want to change the base?
Add HTML sanitizer for translated message resources #7
Conversation
Closes #37428 Signed-off-by: Alexander Schwartz <[email protected]>
Greptile SummaryAdds HTML sanitization validation to the theme verifier using OWASP Java HTML Sanitizer to prevent XSS attacks in translated message resources. The implementation compares translated strings against English source strings and only allows HTML tags ( Key changes:
Critical issues found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Verifier as VerifyMessageProperties
participant Bundle as PropertyResourceBundle
participant Sanitizer as OWASP HtmlPolicyBuilder
participant Validator as santizeAnchors()
Verifier->>Verifier: verify()
Verifier->>Verifier: verifyNoDuplicateKeys()
Verifier->>Verifier: verifySafeHtml()
Verifier->>Bundle: Load translation file
Bundle-->>Verifier: PropertyResourceBundle
Verifier->>Bundle: Load English reference file
Bundle-->>Verifier: English PropertyResourceBundle
loop For each key in translation
Verifier->>Bundle: getString(key)
Bundle-->>Verifier: translatedValue
Verifier->>Verifier: normalizeValue(key, translatedValue)
Note over Verifier: Remove special cases,<br/>unescape HTML entities
Verifier->>Bundle: getEnglishValue(key)
Bundle-->>Verifier: englishValue
Verifier->>Verifier: normalizeValue(key, englishValue)
Verifier->>Validator: santizeAnchors(key, value, englishValue)
Note over Validator: Validate anchor tags match<br/>Remove matching anchors
Validator-->>Verifier: value without anchors
Verifier->>Verifier: containsHtml(englishValue)
alt English contains HTML
Verifier->>Sanitizer: POLICY_SOME_HTML.sanitize()
Note over Sanitizer: Allow br, p, strong, b
else English has no HTML
Verifier->>Sanitizer: POLICY_NO_HTML.sanitize()
Note over Sanitizer: Strip all HTML
end
Sanitizer-->>Verifier: sanitizedValue
Verifier->>Verifier: Unescape HTML entities
Verifier->>Verifier: Remove HTML comments
alt sanitized != original
Verifier->>Verifier: Add "Illegal HTML" error message
end
end
Verifier-->>Verifier: Return list of error messages
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
45 files reviewed, 2 comments
| private String santizeAnchors(String key, String value, String englishValue) { | ||
| Matcher matcher = ANCHOR_PATTERN.matcher(value); | ||
| Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue); | ||
| while (matcher.find()) { | ||
| if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) { | ||
| value = value.replaceFirst(Pattern.quote(englishMatcher.group()), ""); | ||
| } else { | ||
| messages.add("Didn't find anchor tag " + matcher.group() + " in original string"); | ||
| break; | ||
| } | ||
| } | ||
| return value; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: modifying value during iteration breaks the matcher that's iterating over it. the matcher was created with the original string (line 151) and holds position indices into that string. when you modify value on line 155, those indices become invalid, potentially causing the matcher to skip anchors or fail
| private String santizeAnchors(String key, String value, String englishValue) { | |
| Matcher matcher = ANCHOR_PATTERN.matcher(value); | |
| Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue); | |
| while (matcher.find()) { | |
| if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) { | |
| value = value.replaceFirst(Pattern.quote(englishMatcher.group()), ""); | |
| } else { | |
| messages.add("Didn't find anchor tag " + matcher.group() + " in original string"); | |
| break; | |
| } | |
| } | |
| return value; | |
| } | |
| private String santizeAnchors(String key, String value, String englishValue) { | |
| Matcher matcher = ANCHOR_PATTERN.matcher(value); | |
| Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue); | |
| StringBuilder result = new StringBuilder(); | |
| int lastEnd = 0; | |
| while (matcher.find()) { | |
| if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) { | |
| result.append(value, lastEnd, matcher.start()); | |
| lastEnd = matcher.end(); | |
| } else { | |
| messages.add("Didn't find anchor tag " + matcher.group() + " in original string"); | |
| return value; | |
| } | |
| } | |
| result.append(value, lastEnd, value.length()); | |
| return result.toString(); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 150:162
Comment:
**logic:** modifying `value` during iteration breaks the `matcher` that's iterating over it. the `matcher` was created with the original string (line 151) and holds position indices into that string. when you modify `value` on line 155, those indices become invalid, potentially causing the matcher to skip anchors or fail
```suggestion
private String santizeAnchors(String key, String value, String englishValue) {
Matcher matcher = ANCHOR_PATTERN.matcher(value);
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
StringBuilder result = new StringBuilder();
int lastEnd = 0;
while (matcher.find()) {
if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
result.append(value, lastEnd, matcher.start());
lastEnd = matcher.end();
} else {
messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
return value;
}
}
result.append(value, lastEnd, value.length());
return result.toString();
}
```
How can I resolve this? If you propose a fix, please make it concise.| return value; | ||
| } | ||
|
|
||
| Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: pattern only matches lowercase tag names like <br> but will miss uppercase variants like <BR> or <Strong>. use case-insensitive matching
| Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>"); | |
| Pattern HTML_TAGS = Pattern.compile("<[a-zA-Z]+[^>]*>", Pattern.CASE_INSENSITIVE); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 139:139
Comment:
**logic:** pattern only matches lowercase tag names like `<br>` but will miss uppercase variants like `<BR>` or `<Strong>`. use case-insensitive matching
```suggestion
Pattern HTML_TAGS = Pattern.compile("<[a-zA-Z]+[^>]*>", Pattern.CASE_INSENSITIVE);
```
How can I resolve this? If you propose a fix, please make it concise.
Test 7
Replicated from ai-code-review-evaluation/keycloak-greptile#7