-
Notifications
You must be signed in to change notification settings - Fork 0
Add HTML sanitizer for translated message resources #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-html-sanitizer-baseline
Are you sure you want to change the base?
Add HTML sanitizer for translated message resources #4
Conversation
Closes #37428 Signed-off-by: Alexander Schwartz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Greptile OverviewConfidence Score: 3/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Maven as Maven Plugin
participant VMP as VerifyMessageProperties
participant OWASP as OWASP Sanitizer
participant Bundle as PropertyResourceBundle
Maven->>VMP: verify(file)
VMP->>VMP: verifyNoDuplicateKeys()
VMP->>VMP: verifySafeHtml()
VMP->>Bundle: Load target locale file
VMP->>Bundle: Load English reference file
loop For each property key
VMP->>Bundle: getString(key)
VMP->>VMP: normalizeValue() - Handle edge cases
VMP->>VMP: sanitizeAnchors() - Validate anchor tags
VMP->>VMP: containsHtml(englishValue)
alt English has HTML
VMP->>OWASP: POLICY_SOME_HTML.sanitize()
else English has no HTML
VMP->>OWASP: POLICY_NO_HTML.sanitize()
end
OWASP-->>VMP: sanitized value
VMP->>VMP: unescapeHtml4() - Normalize entities
VMP->>VMP: remove HTML comments
alt sanitized != original
VMP->>VMP: Add error message with diff
end
end
VMP-->>Maven: List of validation errors
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
45 files reviewed, 5 comments
| private String santizeAnchors(String key, String value, String englishValue) { | ||
| Matcher matcher = ANCHOR_PATTERN.matcher(value); | ||
| Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue); | ||
| while (matcher.find()) { | ||
| if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) { | ||
| value = value.replaceFirst(Pattern.quote(englishMatcher.group()), ""); | ||
| } else { | ||
| messages.add("Didn't find anchor tag " + matcher.group() + " in original string"); | ||
| break; | ||
| } | ||
| } | ||
| return value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Typo in method name and logical bug in anchor validation
The method name has a typo (santizeAnchors should be sanitizeAnchors), but more critically, the anchor matching logic is flawed. When anchors match, the code removes them from value using replaceFirst, but this modifies the string being validated, not the pattern matcher's view. Subsequent matcher.find() calls continue scanning the original unmodified string, potentially causing mismatches.
| private String santizeAnchors(String key, String value, String englishValue) { | |
| Matcher matcher = ANCHOR_PATTERN.matcher(value); | |
| Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue); | |
| while (matcher.find()) { | |
| if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) { | |
| value = value.replaceFirst(Pattern.quote(englishMatcher.group()), ""); | |
| } else { | |
| messages.add("Didn't find anchor tag " + matcher.group() + " in original string"); | |
| break; | |
| } | |
| } | |
| return value; | |
| private String sanitizeAnchors(String key, String value, String englishValue) { | |
| List<String> valueAnchors = new ArrayList<>(); | |
| List<String> englishAnchors = new ArrayList<>(); | |
| Matcher matcher = ANCHOR_PATTERN.matcher(value); | |
| while (matcher.find()) { | |
| valueAnchors.add(matcher.group()); | |
| } | |
| Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue); | |
| while (englishMatcher.find()) { | |
| englishAnchors.add(englishMatcher.group()); | |
| } | |
| if (!valueAnchors.equals(englishAnchors)) { | |
| messages.add("Anchor tags don't match original string in key " + key); | |
| } | |
| // Remove all anchor tags for sanitization | |
| return ANCHOR_PATTERN.matcher(value).replaceAll(""); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 150:161
Comment:
**logic:** Typo in method name and logical bug in anchor validation
The method name has a typo (`santizeAnchors` should be `sanitizeAnchors`), but more critically, the anchor matching logic is flawed. When anchors match, the code removes them from `value` using `replaceFirst`, but this modifies the string being validated, not the pattern matcher's view. Subsequent `matcher.find()` calls continue scanning the original unmodified string, potentially causing mismatches.
```suggestion
private String sanitizeAnchors(String key, String value, String englishValue) {
List<String> valueAnchors = new ArrayList<>();
List<String> englishAnchors = new ArrayList<>();
Matcher matcher = ANCHOR_PATTERN.matcher(value);
while (matcher.find()) {
valueAnchors.add(matcher.group());
}
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
while (englishMatcher.find()) {
englishAnchors.add(englishMatcher.group());
}
if (!valueAnchors.equals(englishAnchors)) {
messages.add("Anchor tags don't match original string in key " + key);
}
// Remove all anchor tags for sanitization
return ANCHOR_PATTERN.matcher(value).replaceAll("");
}
```
How can I resolve this? If you propose a fix, please make it concise.| PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder() | ||
| .allowElements( | ||
| "br", "p", "strong", "b" | ||
| ).toFactory(); | ||
|
|
||
| PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Policy factories should be static final constants
Instance-level policy factories are recreated for each VerifyMessageProperties instance, which is inefficient. These should be static final constants.
| PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder() | |
| .allowElements( | |
| "br", "p", "strong", "b" | |
| ).toFactory(); | |
| PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory(); | |
| static final PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder() | |
| .allowElements( | |
| "br", "p", "strong", "b" | |
| ).toFactory(); | |
| static final PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory(); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 58:63
Comment:
**style:** Policy factories should be static final constants
Instance-level policy factories are recreated for each `VerifyMessageProperties` instance, which is inefficient. These should be `static final` constants.
```suggestion
static final PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder()
.allowElements(
"br", "p", "strong", "b"
).toFactory();
static final PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory();
```
How can I resolve this? If you propose a fix, please make it concise.| return value; | ||
| } | ||
|
|
||
| Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: HTML pattern should be static final
Pattern compilation is expensive and this regex is used repeatedly.
| Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>"); | |
| static final Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>"); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 139:139
Comment:
**style:** HTML pattern should be static final
Pattern compilation is expensive and this regex is used repeatedly.
```suggestion
static final Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");
```
How can I resolve this? If you propose a fix, please make it concise.| String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources") | ||
| .replaceAll("_[a-zA-Z-_]*\\.properties", "_en.properties"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Regex will fail to find English file for files already ending in _en.properties
If the file is already messages_en.properties, the regex _[a-zA-Z-_]*\\.properties won't match and englishFile will equal the original file path. The code should handle this case or use a more robust approach.
| String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources") | |
| .replaceAll("_[a-zA-Z-_]*\\.properties", "_en.properties"); | |
| String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources"); | |
| if (!englishFile.endsWith("_en.properties")) { | |
| englishFile = englishFile.replaceAll("_[a-zA-Z-_]+\\.properties$", "_en.properties"); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 74:75
Comment:
**logic:** Regex will fail to find English file for files already ending in `_en.properties`
If the file is already `messages_en.properties`, the regex `_[a-zA-Z-_]*\\.properties` won't match and `englishFile` will equal the original file path. The code should handle this case or use a more robust approach.
```suggestion
String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources");
if (!englishFile.endsWith("_en.properties")) {
englishFile = englishFile.replaceAll("_[a-zA-Z-_]+\\.properties$", "_en.properties");
}
```
How can I resolve this? If you propose a fix, please make it concise.| int start = 0; | ||
| while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) { | ||
| start++; | ||
| } | ||
| int end = 0; | ||
| while (end < sanitized.length() && end < value.length() && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) { | ||
| end++; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Potential StringIndexOutOfBoundsException when calculating diff
When start > value.length() - end, the substring operations will throw exceptions. This happens when the common prefix and suffix overlap or when strings have very different lengths.
| int start = 0; | |
| while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) { | |
| start++; | |
| } | |
| int end = 0; | |
| while (end < sanitized.length() && end < value.length() && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) { | |
| end++; | |
| } | |
| // Strip identical characters from the beginning and the end to show where the difference is | |
| int start = 0; | |
| while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) { | |
| start++; | |
| } | |
| int end = 0; | |
| while (end < sanitized.length() - start && end < value.length() - start && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) { | |
| end++; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 102:109
Comment:
**logic:** Potential `StringIndexOutOfBoundsException` when calculating diff
When `start > value.length() - end`, the substring operations will throw exceptions. This happens when the common prefix and suffix overlap or when strings have very different lengths.
```suggestion
// Strip identical characters from the beginning and the end to show where the difference is
int start = 0;
while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) {
start++;
}
int end = 0;
while (end < sanitized.length() - start && end < value.length() - start && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) {
end++;
}
```
How can I resolve this? If you propose a fix, please make it concise.
Test 7
Replicated from ai-code-review-evaluation/keycloak-greptile#7