Skip to content

fix: specify UTF-8 charset in getBytes() calls on security-sensitive paths#26946

Open
HarishRock0 wants to merge 2 commits intowso2:masterfrom
HarishRock0:Fix--getBytes()-Without-Charset-in-Security-Sensitive-Paths
Open

fix: specify UTF-8 charset in getBytes() calls on security-sensitive paths#26946
HarishRock0 wants to merge 2 commits intowso2:masterfrom
HarishRock0:Fix--getBytes()-Without-Charset-in-Security-Sensitive-Paths

Conversation

@HarishRock0
Copy link
Copy Markdown

@HarishRock0 HarishRock0 commented Mar 8, 2026

fix: specify UTF-8 charset in getBytes() calls on security-sensitive paths

String.getBytes() without an explicit charset uses the JVM's default platform encoding, which varies by OS and locale. This causes non-deterministic byte output in security-sensitive code paths.

In SAML2SSOTestBase.java, the redirect-binding branch already used UTF-8 explicitly, but the POST-binding and fallback branches did not. This inconsistency could produce garbled SAML assertions or signature verification failures on non-UTF-8 systems. The getDocument() helper also parsed SAML XML using platform-default bytes.

In ExternalLDAPServer.java, three getBytes() calls in the password hashing path were platform-dependent. Since MessageDigest operates on raw bytes, a charset mismatch between the system storing the hash and the system verifying it could cause authentication failures.Changes made:
SAML2SSOTestBase.java: replaced getBytes() with getBytes(StandardCharsets.UTF_8) in the POST-binding branch, fallback branch, and getDocument().
ExternalLDAPServer.java: replaced getBytes() with getBytes(StandardCharsets.UTF_8) in all three password-handling calls and added the missing import for StandardCharsets.

No behavioral change on systems already using UTF-8 as default. Fixes incorrect encoding on Windows or other non-UTF-8 locales. StandardCharsets.UTF_8 is always available per the Java spec so no new checked exceptions are introduced.

Summary by CodeRabbit

  • Bug Fixes
    • Standardized UTF-8 encoding for LDAP password handling to ensure correct processing and storage of credentials with non-ASCII characters.
    • Standardized UTF-8 encoding for SAML message processing (including POST and fallback binding paths) to ensure consistent, reliable handling of SAML payloads across bindings.

Copilot AI review requested due to automatic review settings March 8, 2026 17:44
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59327237-7463-41c8-8ebd-82d2b66aac3c

📥 Commits

Reviewing files that changed from the base of the PR and between 67d0c93 and d985091.

📒 Files selected for processing (1)
  • modules/integration/tests-integration/tests-backend/src/test/java/org/wso2/identity/integration/test/utils/ExternalLDAPServer.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • modules/integration/tests-integration/tests-backend/src/test/java/org/wso2/identity/integration/test/utils/ExternalLDAPServer.java

Walkthrough

This PR adds explicit UTF-8 encoding (StandardCharsets.UTF_8) to password handling in an LDAP test utility and to SAML message encoding/parsing in SAML2 SSO test helpers, replacing implicit platform-default byte conversions.

Changes

Cohort / File(s) Summary
LDAP test utility
modules/integration/tests-integration/tests-backend/src/test/java/org/wso2/identity/integration/test/utils/ExternalLDAPServer.java
Use StandardCharsets.UTF_8 for password.getBytes(...) calls when hashing and storing passwords; added import for StandardCharsets.
SAML2 SSO test base
product-scenarios/scenarios-commons/src/main/java/org/wso2/identity/scenarios/commons/SAML2SSOTestBase.java
Use StandardCharsets.UTF_8 for converting SAML strings to bytes before Base64 encoding and when creating ByteArrayInputStream from SAML strings; aligns POST and fallback paths with Redirect path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 I hopped through bytes both near and far,
UTF-8 tucked in each little char,
No more surprises from platforms sly,
Tests hum steady, bytes comply.
Cheers — a rabbit's stamp of approval! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: specifying UTF-8 charset in getBytes() calls on security-sensitive paths. It is concise, specific, and clearly describes the primary focus of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@HarishRock0
Copy link
Copy Markdown
Author

@splinter @gnudeep @hevayo @chaturadilan

Hi team,

I've opened a PR to fix a medium-severity bug where String.getBytes() was called without an explicit charset in two security-sensitive files. This is a straightforward but important fix — the missing charset causes platform-dependent byte encoding, which can lead to inconsistent SAML assertion encoding and non-deterministic password hashing across different OS environments.
All instances have been updated to use StandardCharsets.UTF_8 explicitly. The change has no impact on systems already running with UTF-8 as the default charset (Linux, CI), and fixes incorrect behavior on Windows or other non-UTF-8 locales. No new exceptions are introduced.

Would appreciate a review, particularly on the ExternalLDAPServer password hashing path to confirm the UTF-8 encoding aligns with how passwords are expected to be stored and verified in the LDAP integration tests

Thanks

regards,
Hariksan Thulasithas

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes byte conversions deterministic in security-sensitive SAML and LDAP password-handling code paths by replacing platform-default String.getBytes() calls with explicit UTF-8 encoding.

Changes:

  • Use getBytes(StandardCharsets.UTF_8) for SAML POST-binding and fallback Base64 encoding.
  • Parse SAML XML using UTF-8 bytes in getDocument(...).
  • Use UTF-8 for password hashing/storage byte conversions in ExternalLDAPServer.changeConnectionUserPassword(...) and add the missing StandardCharsets import.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
product-scenarios/scenarios-commons/src/main/java/org/wso2/identity/scenarios/commons/SAML2SSOTestBase.java Ensures SAML request encoding and SAML XML parsing use explicit UTF-8 rather than platform-default encoding.
modules/integration/tests-integration/tests-backend/src/test/java/org/wso2/identity/integration/test/utils/ExternalLDAPServer.java Ensures LDAP password digest/update and password attribute bytes use explicit UTF-8 and adds required import.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +197 to 205
messageDigest.update(password.getBytes(StandardCharsets.UTF_8));
byte[] bytes = messageDigest.digest();
String hash = Base64.encode(bytes);
passwordToStore = passwordToStore + hash;
adminPrincipal.setUserPassword(passwordToStore.getBytes());
adminPrincipal.setUserPassword(passwordToStore.getBytes(StandardCharsets.UTF_8));
Attribute passwordAttribute = new DefaultAttribute(getAttributeType("userPassword"));
try {
passwordAttribute.add(passwordToStore.getBytes());
passwordAttribute.add(passwordToStore.getBytes(StandardCharsets.UTF_8));
} catch (LdapInvalidAttributeValueException e) {
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExternalLDAPServer still has platform-default getBytes() calls in addAdminPassword(...) (e.g., md.update(password.getBytes()) and passwordToStore.getBytes()), so password hashing/storage can remain non-deterministic on non-UTF-8 platforms. Consider updating those remaining calls to use StandardCharsets.UTF_8 as well for consistency with the changes here.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@modules/integration/tests-integration/tests-backend/src/test/java/org/wso2/identity/integration/test/utils/ExternalLDAPServer.java`:
- Around line 197-204: The admin password bootstrap path still uses
platform-default charset; update addAdminPassword(...) in
ExternalLDAPServer.java so all uses of getBytes() are explicit about UTF-8: when
hashing (MessageDigest.update / digest inputs) and when storing/adding the
password (calls that pass byte[] to adminPrincipal.setUserPassword and
passwordAttribute.add), replace getBytes() with getBytes(StandardCharsets.UTF_8)
for both occurrences so the generated digest and LDAP-stored values are
deterministic across JVM locales.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b75fc77c-68d3-4afd-977a-022ea8e12ac0

📥 Commits

Reviewing files that changed from the base of the PR and between cadb385 and 67d0c93.

📒 Files selected for processing (2)
  • modules/integration/tests-integration/tests-backend/src/test/java/org/wso2/identity/integration/test/utils/ExternalLDAPServer.java
  • product-scenarios/scenarios-commons/src/main/java/org/wso2/identity/scenarios/commons/SAML2SSOTestBase.java

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 8, 2026

@HarishRock0
Copy link
Copy Markdown
Author

@splinter @gnudeep @hevayo @chaturadilan

Hi team,

I've opened a PR to fix a medium-severity bug where String.getBytes() was called without an explicit charset in two security-sensitive files. This is a straightforward but important fix — the missing charset causes platform-dependent byte encoding, which can lead to inconsistent SAML assertion encoding and non-deterministic password hashing across different OS environments.
All instances have been updated to use StandardCharsets.UTF_8 explicitly. The change has no impact on systems already running with UTF-8 as the default charset (Linux, CI), and fixes incorrect behavior on Windows or other non-UTF-8 locales. No new exceptions are introduced.

Would appreciate a review, particularly on the ExternalLDAPServer password hashing path to confirm the UTF-8 encoding aligns with how passwords are expected to be stored and verified in the LDAP integration tests

Thanks

regards,
Hariksan Thulasithas

@HarishRock0
Copy link
Copy Markdown
Author

@NipunaMadhushan
Hi ,

I've opened a PR to fix a medium-severity bug where String.getBytes() was called without an explicit charset in two security-sensitive files. This is a straightforward but important fix — the missing charset causes platform-dependent byte encoding, which can lead to inconsistent SAML assertion encoding and non-deterministic password hashing across different OS environments.
All instances have been updated to use StandardCharsets.UTF_8 explicitly. The change has no impact on systems already running with UTF-8 as the default charset (Linux, CI), and fixes incorrect behavior on Windows or other non-UTF-8 locales. No new exceptions are introduced.

Would appreciate a review, particularly on the ExternalLDAPServer password hashing path to confirm the UTF-8 encoding aligns with how passwords are expected to be stored and verified in the LDAP integration tests

Thanks

regards,
Hariksan Thulasithas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants