Skip to content

Conversation

@akshayutture-augment
Copy link
Contributor

Closes #37428

Signed-off-by: Alexander Schwartz <[email protected]>
@akshayutture-augment
Copy link
Contributor Author

@claude review this PR

1 similar comment
@akshayutture-augment
Copy link
Contributor Author

@claude review this PR

@claude
Copy link

claude bot commented Nov 20, 2025

Claude finished @akshayutture-augment's task —— View job


PR Review Complete ✓

I've reviewed this HTML sanitizer implementation for translated message resources. This is a security-focused feature to prevent XSS vulnerabilities in internationalized content. Here's my analysis:


🔒 Security Implementation

Overall Approach: The implementation uses OWASP Java HTML Sanitizer to validate translated message properties against their English source, which is a solid defensive strategy.

Strengths:

  • ✅ Uses industry-standard OWASP sanitizer library
  • ✅ Whitelist approach for allowed HTML tags (br, p, strong, b)
  • ✅ Only allows HTML in translations if present in English source
  • ✅ Special handling for anchor tags to prevent URL manipulation

🐛 Critical Issues

1. Typo in Method Name (misc/theme-verifier/src/main/java/org/keycloak/themeverifier/VerifyMessageProperties.java:150)

private String santizeAnchors(String key, String value, String englishValue) {

Should be sanitizeAnchors (missing 'i'). This typo is propagated to the method call on line 88.

2. Logic Error in Anchor Validation (VerifyMessageProperties.java:150-162)

The anchor sanitization logic has a fundamental flaw:

private String santizeAnchors(String key, String value, String englishValue) {
    Matcher matcher = ANCHOR_PATTERN.matcher(value);
    Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
    while (matcher.find()) {
        if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
            value = value.replaceFirst(Pattern.quote(englishMatcher.group()), "");
        } else {
            messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
            break;
        }
    }
    return value;
}

Problems:

  • Removes matching anchors from the value string: This modifies the actual translation content that gets validated later, which could lead to false negatives
  • Unclear purpose: The method name suggests sanitization, but it actually validates and strips anchors
  • Side effect on validation: By removing anchors from value, subsequent HTML validation sees modified content
  • Test doesn't catch this: The test file changedAnchor_de.properties has a different URL than English, which triggers the error message, but doesn't validate that matching anchors are preserved correctly

Expected behavior: Should either:

  1. Validate anchors match without modifying the value, OR
  2. Return a sanitized copy without side effects on the original value

3. Incomplete HTML Tag Pattern (VerifyMessageProperties.java:139)

Pattern HTML_TAGS = Pattern.compile("<[a-z]+[^>]*>");

This pattern only matches lowercase HTML tags, but HTML is case-insensitive. While most/all message files likely use lowercase, this could miss uppercase or mixed-case tags like <BR> or <Strong>.

Recommendation: Use case-insensitive pattern: Pattern.compile("<[a-zA-Z]+[^>]*>", Pattern.CASE_INSENSITIVE)


⚠️ Moderate Issues

4. Hardcoded File Path Manipulation (VerifyMessageProperties.java:74-75)

String englishFile = file.getAbsolutePath().replaceAll("resources-community", "resources")
        .replaceAll("_[a-zA-Z-_]*\\.properties", "_en.properties");

Issues:

  • Fragile string replacement that assumes specific directory structure
  • Regex _[a-zA-Z-_]*\\.properties could incorrectly match filenames like some_test_file.properties
  • No error handling if English file doesn't exist at computed path (will throw RuntimeException on line 79)

Recommendation:

  • Make path resolution more robust
  • Add validation that English file exists before attempting to read
  • Consider making English file location configurable

5. Missing Test for Self-Closing Tags (Test coverage gap)

The allowed tags include br which is often written as <br/> or <br />, but there's no test verifying self-closing tag handling. The sanitizer should handle these correctly, but explicit test coverage would be valuable.

6. Unclear Normalization Logic (VerifyMessageProperties.java:117-137)

The normalizeValue method has several special cases:

  • templateHelp key allows CLAIM.<NAME> pattern
  • optimizeLookupHelp allows <Extensions>
  • Choice format handling for specific keys
  • Email-specific RTL style stripping (with TODO comment)

Issues:

  • These special cases are not documented in comments
  • The TODO on line 133 suggests incomplete work: // TODO: move the RTL information for emails
  • Hard to understand why these specific keys need special treatment
  • No tests explicitly validating these edge cases

Recommendation: Add documentation explaining why each special case exists.


📝 Minor Issues / Suggestions

7. Inconsistent Exception Handling

  • Line 53: Throws MojoExecutionException
  • Line 70, 79: Throws RuntimeException

Consider consistent exception handling strategy.

8. Magic String Comparison (VerifyMessageProperties.java:97)

sanitized = sanitized.replace("", "");

This comment about "double curly braces" on line 96 isn't immediately clear. Consider adding an example in the comment.

9. Dependency Version

pom.xml uses owasp-java-html-sanitizer version 20240325.1 which appears to be from March 2024. Consider checking if a newer version is available with security fixes.

10. Test File Organization

Test resources were renamed (duplicateKeys.propertiesduplicateKeys_en.properties) but there's no test file for the German/translated case for the changedAnchor test to verify the error is caught correctly.


✅ Positive Aspects

  1. Good test coverage structure - Tests for duplicate keys, illegal HTML, no HTML allowed, and changed anchors
  2. Whitelist approach - Only allowing specific safe tags is the right security approach
  3. Source comparison - Validating translations against English source prevents translators from injecting HTML
  4. HTML entity handling - Properly unescapes HTML entities for fair comparison
  5. Clear error messages - Shows exactly what content failed validation with before/after comparison

🎯 Recommendations

Must Fix:

  1. Fix the typo: santizeAnchorssanitizeAnchors
  2. Fix the anchor validation logic to not modify the value being validated
  3. Add validation that English reference file exists

Should Fix:
4. Make HTML tag pattern case-insensitive
5. Make file path resolution more robust
6. Document the special cases in normalizeValue

Nice to Have:
7. Add test for self-closing tags
8. Consistent exception handling
9. Address the TODO comment for RTL email styles
10. Check for newer OWASP sanitizer version


Message Property Changes

The changes to the 48 message property files appear to be corrections found by running this new validator:

  • Standardizing <br/> to <br /> (space before slash)
  • Fixing the error-invalid-multivalued-size message to use proper choice format

These changes look appropriate and are exactly what you'd expect from running a new sanitizer.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants