Skip to content

Conversation

@ShashankFC
Copy link
Collaborator

Closes #37428

Signed-off-by: Alexander Schwartz <[email protected]>
@greptile-apps
Copy link

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Adds HTML sanitization validation to the theme verifier using OWASP Java HTML Sanitizer to prevent XSS attacks in translated message resources. The implementation compares translated strings against English source strings and only allows HTML tags (br, p, strong, b) if they exist in the original English text. Also validates that anchor tags in translations match those in the source.

Key changes:

  • Implemented verifySafeHtml() method that sanitizes and validates HTML in property files
  • Added special handling for specific keys like templateHelp, optimizeLookupHelp, and choice format patterns
  • Created anchor tag validation to prevent malicious URL substitution in translations
  • Updated 46+ message property files to fix <br/> spacing to <br /> for consistency
  • Added comprehensive test coverage for illegal HTML detection

Critical issues found:

  • The santizeAnchors method modifies the string during matcher iteration, which invalidates matcher position indices and can cause incorrect validation
  • HTML tag detection regex only matches lowercase tags, potentially missing uppercase variants

Confidence Score: 2/5

  • This PR has critical logic bugs that will cause incorrect validation behavior
  • While the security approach is sound (using OWASP sanitizer to prevent XSS), the implementation contains a critical bug in the santizeAnchors method that modifies a string while a regex matcher is iterating over it, breaking the matcher's internal state. This bug will cause unpredictable validation failures when messages contain multiple anchor tags. Additionally, the HTML detection regex only matches lowercase tags, which could miss security issues.
  • Pay close attention to misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java - the santizeAnchors method needs to be fixed before merge

Important Files Changed

Filename Overview
misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java Added HTML sanitization validation using OWASP Java HTML Sanitizer, but contains a critical bug in the santizeAnchors method that modifies the string being matched during iteration
misc/theme-verifier/pom.xml Added dependencies for OWASP HTML Sanitizer (20240325.1) and Apache Commons Text (1.13.0)
misc/theme-verifier/src/test/java/org/keycloak/themeverifier/VerifyMessagePropertiesTest.java Added comprehensive tests for HTML sanitization including illegal tags, disallowed HTML in translations, and anchor tag validation

Sequence Diagram

sequenceDiagram
    participant Verifier as VerifyMessageProperties
    participant Bundle as PropertyResourceBundle
    participant Sanitizer as OWASP HtmlPolicyBuilder
    participant Validator as santizeAnchors()

    Verifier->>Verifier: verify()
    Verifier->>Verifier: verifyNoDuplicateKeys()
    Verifier->>Verifier: verifySafeHtml()
    
    Verifier->>Bundle: Load translation file
    Bundle-->>Verifier: PropertyResourceBundle
    
    Verifier->>Bundle: Load English reference file
    Bundle-->>Verifier: English PropertyResourceBundle
    
    loop For each key in translation
        Verifier->>Bundle: getString(key)
        Bundle-->>Verifier: translatedValue
        
        Verifier->>Verifier: normalizeValue(key, translatedValue)
        Note over Verifier: Remove special cases,<br/>unescape HTML entities
        
        Verifier->>Bundle: getEnglishValue(key)
        Bundle-->>Verifier: englishValue
        
        Verifier->>Verifier: normalizeValue(key, englishValue)
        
        Verifier->>Validator: santizeAnchors(key, value, englishValue)
        Note over Validator: Validate anchor tags match<br/>Remove matching anchors
        Validator-->>Verifier: value without anchors
        
        Verifier->>Verifier: containsHtml(englishValue)
        alt English contains HTML
            Verifier->>Sanitizer: POLICY_SOME_HTML.sanitize()
            Note over Sanitizer: Allow br, p, strong, b
        else English has no HTML
            Verifier->>Sanitizer: POLICY_NO_HTML.sanitize()
            Note over Sanitizer: Strip all HTML
        end
        
        Sanitizer-->>Verifier: sanitizedValue
        
        Verifier->>Verifier: Unescape HTML entities
        Verifier->>Verifier: Remove HTML comments
        
        alt sanitized != original
            Verifier->>Verifier: Add "Illegal HTML" error message
        end
    end
    
    Verifier-->>Verifier: Return list of error messages
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +150 to +162
private String santizeAnchors(String key, String value, String englishValue) {
Matcher matcher = ANCHOR_PATTERN.matcher(value);
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
while (matcher.find()) {
if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
value = value.replaceFirst(Pattern.quote(englishMatcher.group()), "");
} else {
messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
break;
}
}
return value;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: modifying value during iteration breaks the matcher that's iterating over it. the matcher was created with the original string (line 151) and holds position indices into that string. when you modify value on line 155, those indices become invalid, potentially causing the matcher to skip anchors or fail

Suggested change
private String santizeAnchors(String key, String value, String englishValue) {
Matcher matcher = ANCHOR_PATTERN.matcher(value);
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
while (matcher.find()) {
if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
value = value.replaceFirst(Pattern.quote(englishMatcher.group()), "");
} else {
messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
break;
}
}
return value;
}
private String santizeAnchors(String key, String value, String englishValue) {
Matcher matcher = ANCHOR_PATTERN.matcher(value);
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
StringBuilder result = new StringBuilder();
int lastEnd = 0;
while (matcher.find()) {
if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
result.append(value, lastEnd, matcher.start());
lastEnd = matcher.end();
} else {
messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
return value;
}
}
result.append(value, lastEnd, value.length());
return result.toString();
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 150:162

Comment:
**logic:** modifying `value` during iteration breaks the `matcher` that's iterating over it. the `matcher` was created with the original string (line 151) and holds position indices into that string. when you modify `value` on line 155, those indices become invalid, potentially causing the matcher to skip anchors or fail

```suggestion
    private String santizeAnchors(String key, String value, String englishValue) {
        Matcher matcher = ANCHOR_PATTERN.matcher(value);
        Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
        StringBuilder result = new StringBuilder();
        int lastEnd = 0;
        
        while (matcher.find()) {
            if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
                result.append(value, lastEnd, matcher.start());
                lastEnd = matcher.end();
            } else {
                messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
                return value;
            }
        }
        result.append(value, lastEnd, value.length());
        return result.toString();
    }
```

How can I resolve this? If you propose a fix, please make it concise.

return value;
}

Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: pattern only matches lowercase tag names like <br> but will miss uppercase variants like <BR> or <Strong>. use case-insensitive matching

Suggested change
Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");
Pattern HTML_TAGS = Pattern.compile("<[a-zA-Z]+[^>]*>", Pattern.CASE_INSENSITIVE);
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 139:139

Comment:
**logic:** pattern only matches lowercase tag names like `<br>` but will miss uppercase variants like `<BR>` or `<Strong>`. use case-insensitive matching

```suggestion
    Pattern HTML_TAGS = Pattern.compile("<[a-zA-Z]+[^>]*>", Pattern.CASE_INSENSITIVE);
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants