Skip to content

fix(sql): enforce UTF-8 when loading keyword resources#1260

Closed
renechoi wants to merge 1 commit intoOpenFeign:masterfrom
renechoi:Fix/enforce-UTF-8-when-loading-keyword-resources
Closed

fix(sql): enforce UTF-8 when loading keyword resources#1260
renechoi wants to merge 1 commit intoOpenFeign:masterfrom
renechoi:Fix/enforce-UTF-8-when-loading-keyword-resources

Conversation

@renechoi
Copy link
Contributor

📝 Pull-Request Description

What & Why

Keywords.readLines loaded SQL keyword lists with the JVM’s default charset.
On environments configured for non-UTF-8 encodings (e.g. Windows CP-1252) this silently corrupted any keyword containing non-ASCII characters, leading to parsing errors in templates that rely on those lists.

This patch forces UTF-8 decoding for every /keywords/* resource, guaranteeing identical behaviour on all platforms.


Changes in this PR

Type Module / File Summary
🛠 Bug-fix querydsl-sql/src/main/java/com/querydsl/sql/Keywords.java Passes StandardCharsets.UTF_8 to InputStreamReader, replacing reliance on the default charset.
✅ Test querydsl-sql/src/test/java/com/querydsl/sql/KeywordsEncodingTest.java New regression test that loads a UTF-8 resource (encoding-test) and asserts the content is preserved.
📦 Resource querydsl-sql/src/test/resources/keywords/encoding-test Minimal UTF-8 test asset (SELECT + ÄÖÜ) used by the new unit test.

Compatibility

  • Non-breaking – internal implementation detail only; public API unchanged.

  • Applies uniformly to all dialects that depend on Keywords.


Tests & CI

  • New JUnit test verifies UTF-8 decoding.

  • All existing tests continue to pass locally.

  • Current CI hiccup around easy-jacoco-maven-plugin resolution is unrelated; if desired I can follow up with a version pin or mirror configuration.


Related

  • Inspired by common cross-platform issues with default charset usage (no open upstream ticket).


🤝 Thanks for reviewing!

Keywords.readLines previously relied on the JVM default charset,
which could mis-parse the word list on non-UTF-8 systems (e.g. CP1252).

Changes:
  • Pass StandardCharsets.UTF_8 to InputStreamReader
  • Add KeywordsEncodingTest to guard against regressions

Cross-platform behaviour is now deterministic.
@velo
Copy link
Member

velo commented Jul 24, 2025

Build is red, feel free to reopen when you get it working.

thanks for the support

@velo velo closed this Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants