perf(runtime): rewrite case-change in C++ to skip UTF-8 round-trip#26773
Open
cirospaciari wants to merge 2 commits intomainfrom
Open
perf(runtime): rewrite case-change in C++ to skip UTF-8 round-trip#26773cirospaciari wants to merge 2 commits intomainfrom
cirospaciari wants to merge 2 commits intomainfrom
Conversation
…5087) Add Bun.camelCase, pascalCase, snakeCase, kebabCase, constantCase, dotCase, capitalCase, trainCase, pathCase, sentenceCase, and noCase matching the change-case npm package. Uses ICU for full Unicode support and bun.strings.UnsignedCodepointIterator for codepoint iteration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-trip Rewrites the 11 case-changing utility methods (camelCase, pascalCase, snakeCase, etc.) from Zig to C++, eliminating two unnecessary allocations and transcoding steps. The Zig implementation converted every JS string to UTF-8 via bunstr.toUTF8(allocator), processed codepoints, then converted back via bun.String.cloneUTF8(result_bytes) — two unnecessary allocations + transcoding for every call. Move to C++ and work directly with the JSC string's native encoding (Latin1 or UTF-16) using StringView, StringBuilder, and ICU — same pattern as stripANSI.cpp. - New: CaseChange.cpp + CaseChange.h with the case-change algorithm templated on Latin1Character/UChar - Wiring: 11 functions registered directly in bunObjectTable as C++ host functions - Cleanup: Deleted string_case.zig and all Zig/C++ bridge wiring (icu_toUpper/icu_toLower wrappers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
Contributor
WalkthroughAdds eleven string case-transformation functions (camelCase, pascalCase, snakeCase, kebabCase, constantCase, dotCase, capitalCase, trainCase, pathCase, sentenceCase, noCase) to Bun. Includes TypeScript declarations, native C++ implementations with character classification and separator logic, registration in the Bun object, and comprehensive test suite validating cross-compatibility. Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Contributor
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/bun.js/bindings/CaseChange.cpp`:
- Around line 236-266: The per-codepoint mapping using u_toupper/u_tolower
inside the loop (see getTransform, WordTransform, u_toupper, u_tolower,
builder.append, CharType/Latin1Character) must be replaced with ICU full-string
case mappings to handle multi-codepoint expansions and context/locale rules;
extract the word slice from input, call u_strToUpper/u_strToLower or use
UCaseMap to transform the whole word (or transform first character + remainder
for Capitalize) into a temporary UTF-16/UTF-32 buffer, then append that
transformed string to builder (handling length changes and encoding conversion)
instead of appending per-codepoint results. Ensure Capitalize semantics use
full-mapping for the first grapheme/character and full-mapping lowercase for the
rest, and thread locale/flags through the UCaseMap calls if applicable.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #26772, which introduced the 11 case-changing utility methods in Zig.
bunstr.toUTF8(allocator), processed codepoints, then converted back viabun.String.cloneUTF8(result_bytes). The new C++ implementation works directly with JSC's native string encoding (Latin1 or UTF-16) usingStringView,StringBuilder, and ICU — same pattern asstripANSI.cppCaseChange.cpp+CaseChange.hwith the algorithm templated onLatin1Character/UChar; 11 functions registered directly inbunObjectTableas C++ host functions; deletedstring_case.zigand removed theicu_toUpper/icu_toLowerC-extern bridge wrappersTest plan
bun bd test test/js/bun/util/case-change.test.ts)Changelog