Skip to content

Commit d933c2d

Browse files
CopilotMrHinsh
andcommitted
Update documentation for Unicode support in StringManipulatorTool
Co-authored-by: MrHinsh <[email protected]>
1 parent 6464ef6 commit d933c2d

File tree

1 file changed

+15
-3
lines changed
  • docs/content/docs/reference/tools/stringmanipulatortool

1 file changed

+15
-3
lines changed

docs/content/docs/reference/tools/stringmanipulatortool/index.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The tool is automatically invoked by migration processors and applies transforma
3535

3636
Common scenarios where the String Manipulator Tool is essential:
3737

38-
- **Data Cleanup**: Removing invalid Unicode characters, control characters, or formatting artifacts
38+
- **Data Cleanup**: Removing control characters or formatting artifacts while preserving Unicode content
3939
- **Format Standardization**: Converting text patterns to consistent formats
4040
- **Length Compliance**: Ensuring field values don't exceed target system limits
4141
- **Character Encoding**: Fixing encoding issues from legacy systems
@@ -99,12 +99,24 @@ Each manipulator supports these properties:
9999

100100
### Removing Invalid Characters
101101

102-
Remove non-printable characters that may cause issues in the target system:
102+
Remove control characters that may cause issues while preserving Unicode content:
103103

104104
```json
105105
{
106106
"$type": "RegexStringManipulator",
107-
"Description": "Remove invalid characters from the end of the string",
107+
"Description": "Remove control characters but preserve Unicode letters and symbols",
108+
"Enabled": true,
109+
"Pattern": "[\\x00-\\x08\\x0B\\x0C\\x0E-\\x1F\\x7F-\\x9F]+",
110+
"Replacement": ""
111+
}
112+
```
113+
114+
For legacy ASCII-only environments, you can use the more restrictive pattern:
115+
116+
```json
117+
{
118+
"$type": "RegexStringManipulator",
119+
"Description": "Remove all non-ASCII characters (legacy behavior)",
108120
"Enabled": true,
109121
"Pattern": "[^( -~)\n\r\t]+",
110122
"Replacement": ""

0 commit comments

Comments
 (0)