Skip to content

Conversation

@akshayutture-augment
Copy link

Closes #37428

Signed-off-by: Alexander Schwartz <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@akshayutture-augment
Copy link
Author

@greptileai

@greptile-apps
Copy link

greptile-apps bot commented Nov 14, 2025

Greptile Overview

Confidence Score: 3/5

  • This PR improves security by adding HTML sanitization, but has critical bugs in the anchor validation logic that need fixing before merge
  • The PR adds important security controls using a well-established OWASP library, but contains three critical logical bugs: (1) the sanitizeAnchors method has flawed string manipulation that breaks validation, (2) English file path resolution fails for _en.properties files, and (3) the diff calculation can throw StringIndexOutOfBoundsException. These bugs could cause false positives/negatives in validation or runtime crashes. Once fixed, the implementation would be solid.
  • Pay close attention to misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java - contains critical bugs in anchor validation and string handling

Important Files Changed

File Analysis

Filename Score Overview
misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java 3/5 Adds HTML sanitization logic with OWASP sanitizer, includes anchor tag validation and normalization logic with several edge cases
misc/theme-verifier/pom.xml 5/5 Adds OWASP Java HTML Sanitizer and Apache Commons Text dependencies for HTML validation
misc/theme-verifier/src/test/java/org/keycloak/themeverifier/VerifyMessagePropertiesTest.java 5/5 Adds test coverage for illegal HTML tags, changed anchors, and HTML in non-HTML contexts
themes/src/main/resources-community/theme/base/email/messages/messages_fa.properties 5/5 Fixes malformed HTML tag < p> to <p> in Persian email template

Sequence Diagram

sequenceDiagram
    participant Maven as Maven Plugin
    participant VMP as VerifyMessageProperties
    participant OWASP as OWASP Sanitizer
    participant Bundle as PropertyResourceBundle
    
    Maven->>VMP: verify(file)
    VMP->>VMP: verifyNoDuplicateKeys()
    VMP->>VMP: verifySafeHtml()
    
    VMP->>Bundle: Load target locale file
    VMP->>Bundle: Load English reference file
    
    loop For each property key
        VMP->>Bundle: getString(key)
        VMP->>VMP: normalizeValue() - Handle edge cases
        VMP->>VMP: sanitizeAnchors() - Validate anchor tags
        VMP->>VMP: containsHtml(englishValue)
        
        alt English has HTML
            VMP->>OWASP: POLICY_SOME_HTML.sanitize()
        else English has no HTML
            VMP->>OWASP: POLICY_NO_HTML.sanitize()
        end
        
        OWASP-->>VMP: sanitized value
        VMP->>VMP: unescapeHtml4() - Normalize entities
        VMP->>VMP: remove HTML comments
        
        alt sanitized != original
            VMP->>VMP: Add error message with diff
        end
    end
    
    VMP-->>Maven: List of validation errors
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +150 to +161
private String santizeAnchors(String key, String value, String englishValue) {
Matcher matcher = ANCHOR_PATTERN.matcher(value);
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
while (matcher.find()) {
if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
value = value.replaceFirst(Pattern.quote(englishMatcher.group()), "");
} else {
messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
break;
}
}
return value;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Typo in method name and logical bug in anchor validation

The method name has a typo (santizeAnchors should be sanitizeAnchors), but more critically, the anchor matching logic is flawed. When anchors match, the code removes them from value using replaceFirst, but this modifies the string being validated, not the pattern matcher's view. Subsequent matcher.find() calls continue scanning the original unmodified string, potentially causing mismatches.

Suggested change
private String santizeAnchors(String key, String value, String englishValue) {
Matcher matcher = ANCHOR_PATTERN.matcher(value);
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
while (matcher.find()) {
if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
value = value.replaceFirst(Pattern.quote(englishMatcher.group()), "");
} else {
messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
break;
}
}
return value;
private String sanitizeAnchors(String key, String value, String englishValue) {
List<String> valueAnchors = new ArrayList<>();
List<String> englishAnchors = new ArrayList<>();
Matcher matcher = ANCHOR_PATTERN.matcher(value);
while (matcher.find()) {
valueAnchors.add(matcher.group());
}
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
while (englishMatcher.find()) {
englishAnchors.add(englishMatcher.group());
}
if (!valueAnchors.equals(englishAnchors)) {
messages.add("Anchor tags don't match original string in key " + key);
}
// Remove all anchor tags for sanitization
return ANCHOR_PATTERN.matcher(value).replaceAll("");
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 150:161

Comment:
**logic:** Typo in method name and logical bug in anchor validation

The method name has a typo (`santizeAnchors` should be `sanitizeAnchors`), but more critically, the anchor matching logic is flawed. When anchors match, the code removes them from `value` using `replaceFirst`, but this modifies the string being validated, not the pattern matcher's view. Subsequent `matcher.find()` calls continue scanning the original unmodified string, potentially causing mismatches.

```suggestion
    private String sanitizeAnchors(String key, String value, String englishValue) {
        List<String> valueAnchors = new ArrayList<>();
        List<String> englishAnchors = new ArrayList<>();
        
        Matcher matcher = ANCHOR_PATTERN.matcher(value);
        while (matcher.find()) {
            valueAnchors.add(matcher.group());
        }
        
        Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
        while (englishMatcher.find()) {
            englishAnchors.add(englishMatcher.group());
        }
        
        if (!valueAnchors.equals(englishAnchors)) {
            messages.add("Anchor tags don't match original string in key " + key);
        }
        
        // Remove all anchor tags for sanitization
        return ANCHOR_PATTERN.matcher(value).replaceAll("");
    }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +58 to +63
PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder()
.allowElements(
"br", "p", "strong", "b"
).toFactory();

PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Policy factories should be static final constants

Instance-level policy factories are recreated for each VerifyMessageProperties instance, which is inefficient. These should be static final constants.

Suggested change
PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder()
.allowElements(
"br", "p", "strong", "b"
).toFactory();
PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory();
static final PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder()
.allowElements(
"br", "p", "strong", "b"
).toFactory();
static final PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory();
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 58:63

Comment:
**style:** Policy factories should be static final constants

Instance-level policy factories are recreated for each `VerifyMessageProperties` instance, which is inefficient. These should be `static final` constants.

```suggestion
    static final PolicyFactory POLICY_SOME_HTML = new org.owasp.html.HtmlPolicyBuilder()
            .allowElements(
                    "br", "p", "strong", "b"
            ).toFactory();

    static final PolicyFactory POLICY_NO_HTML = new org.owasp.html.HtmlPolicyBuilder().toFactory();
```

How can I resolve this? If you propose a fix, please make it concise.

return value;
}

Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: HTML pattern should be static final

Pattern compilation is expensive and this regex is used repeatedly.

Suggested change
Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");
static final Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 139:139

Comment:
**style:** HTML pattern should be static final

Pattern compilation is expensive and this regex is used repeatedly.

```suggestion
    static final Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +74 to +75
String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources")
.replaceAll("_[a-zA-Z-_]*\\.properties", "_en.properties");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Regex will fail to find English file for files already ending in _en.properties

If the file is already messages_en.properties, the regex _[a-zA-Z-_]*\\.properties won't match and englishFile will equal the original file path. The code should handle this case or use a more robust approach.

Suggested change
String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources")
.replaceAll("_[a-zA-Z-_]*\\.properties", "_en.properties");
String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources");
if (!englishFile.endsWith("_en.properties")) {
englishFile = englishFile.replaceAll("_[a-zA-Z-_]+\\.properties$", "_en.properties");
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 74:75

Comment:
**logic:** Regex will fail to find English file for files already ending in `_en.properties`

If the file is already `messages_en.properties`, the regex `_[a-zA-Z-_]*\\.properties` won't match and `englishFile` will equal the original file path. The code should handle this case or use a more robust approach.

```suggestion
        String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources");
        if (!englishFile.endsWith("_en.properties")) {
            englishFile = englishFile.replaceAll("_[a-zA-Z-_]+\\.properties$", "_en.properties");
        }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +102 to +109
int start = 0;
while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) {
start++;
}
int end = 0;
while (end < sanitized.length() && end < value.length() && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) {
end++;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Potential StringIndexOutOfBoundsException when calculating diff

When start > value.length() - end, the substring operations will throw exceptions. This happens when the common prefix and suffix overlap or when strings have very different lengths.

Suggested change
int start = 0;
while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) {
start++;
}
int end = 0;
while (end < sanitized.length() && end < value.length() && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) {
end++;
}
// Strip identical characters from the beginning and the end to show where the difference is
int start = 0;
while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) {
start++;
}
int end = 0;
while (end < sanitized.length() - start && end < value.length() - start && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) {
end++;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java
Line: 102:109

Comment:
**logic:** Potential `StringIndexOutOfBoundsException` when calculating diff

When `start > value.length() - end`, the substring operations will throw exceptions. This happens when the common prefix and suffix overlap or when strings have very different lengths.

```suggestion
                // Strip identical characters from the beginning and the end to show where the difference is
                int start = 0;
                while (start < sanitized.length() && start < value.length() && value.charAt(start) == sanitized.charAt(start)) {
                    start++;
                }
                int end = 0;
                while (end < sanitized.length() - start && end < value.length() - start && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) {
                    end++;
                }
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants