-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Update McpToolUtils.java #4192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update McpToolUtils.java #4192
Conversation
fix McpToolUtils.prefixedToolName formatted in order to support Chinese characters in toolName Signed-off-by: shishuiwuhen2009 <[email protected]>
fix McpToolUtils.prefixedToolName formatted in order to support Chinese characters in toolName. Signed-off-by: shishuiwuhen2009 <[email protected]>
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.
Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases
Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates
Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)
Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.
Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>
Fixes #4192
(cherry picked from commit 35486e9)
|
Thanks!!! had some help from ai! let me know what you think of the final changes @shishuiwuhen2009 . it is also backported. |
|
Thank you so much for your detailed modification suggestions! This Unicode property-based approach for Chinese character support is much more standardized and comprehensive than my original hard-coded range method. I’ve learned a lot about robust Unicode handling—especially the nuances of Script vs. Block classification across JDK versions and how to ensure full Han character coverage. It’s really helpful for improving the quality of my code. Great work on pointing out these key details! 👍 |
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.
Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases
Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates
Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)
Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.
Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>
Auto-cherry-pick to 1.0.x
Fixes spring-projects#4192
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.
Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases
Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates
Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)
Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.
Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>
Auto-cherry-pick to 1.0.x
Fixes spring-projects#4192
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.
Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases
Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates
Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)
Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.
Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>
Auto-cherry-pick to 1.0.x
Fixes spring-projects#4192
Signed-off-by: 家娃 <[email protected]>
Replace hard-coded Unicode range with comprehensive Unicode property approach
to fix incomplete Han character coverage in MCP tool name formatting.
Changes:
- Replace \u4e00-\u9fa5 range with union of Unicode Script and Block properties
- Use \p{IsHan} + \p{InCJK_Unified_Ideographs} + \p{InCJK_Compatibility_Ideographs}
- Fix boundary case where \u9fff was incorrectly excluded by script-only approach
- Add comprehensive test coverage for all Han character blocks and edge cases
Technical details:
- Addresses Unicode Script vs Block classification differences across JDK versions
- \u9fff (鿿) is in CJK Unified Ideographs block but not Han script in some JDKs
- Union approach ensures complete coverage while maintaining exclusion of other scripts
- Future-proof solution that automatically includes new Han characters in Unicode updates
Test coverage added:
- CJK Unified Ideographs boundary cases (\u4e00, \u9fff)
- CJK Extension A characters (\u3400)
- CJK Compatibility Ideographs (\uf900)
- Mixed character block scenarios
- Proper exclusion verification for non-Han scripts (Hiragana, Emoji, etc.)
Fixes incomplete Chinese character support while maintaining backward compatibility
and minimal risk profile of the original change.
Signed-off-by: shishuiwuhen2009
Signed-off-by: Mark Pollack <[email protected]>
Auto-cherry-pick to 1.0.x
Fixes spring-projects#4192
Signed-off-by: logan-mac <[email protected]>
fix McpToolUtils.prefixedToolName formatted in order to support Chinese characters in toolName