@@ -35,7 +35,7 @@ The tool is automatically invoked by migration processors and applies transforma
3535
3636Common scenarios where the String Manipulator Tool is essential:
3737
38- - ** Data Cleanup** : Removing invalid Unicode characters, control characters, or formatting artifacts
38+ - ** Data Cleanup** : Removing control characters or formatting artifacts while preserving Unicode content
3939- ** Format Standardization** : Converting text patterns to consistent formats
4040- ** Length Compliance** : Ensuring field values don't exceed target system limits
4141- ** Character Encoding** : Fixing encoding issues from legacy systems
@@ -99,12 +99,24 @@ Each manipulator supports these properties:
9999
100100### Removing Invalid Characters
101101
102- Remove non-printable characters that may cause issues in the target system :
102+ Remove control characters that may cause issues while preserving Unicode content :
103103
104104``` json
105105{
106106 "$type" : " RegexStringManipulator" ,
107- "Description" : " Remove invalid characters from the end of the string" ,
107+ "Description" : " Remove control characters but preserve Unicode letters and symbols" ,
108+ "Enabled" : true ,
109+ "Pattern" : " [\\ x00-\\ x08\\ x0B\\ x0C\\ x0E-\\ x1F\\ x7F-\\ x9F]+" ,
110+ "Replacement" : " "
111+ }
112+ ```
113+
114+ For legacy ASCII-only environments, you can use the more restrictive pattern:
115+
116+ ``` json
117+ {
118+ "$type" : " RegexStringManipulator" ,
119+ "Description" : " Remove all non-ASCII characters (legacy behavior)" ,
108120 "Enabled" : true ,
109121 "Pattern" : " [^( -~)\n\r\t ]+" ,
110122 "Replacement" : " "
0 commit comments