You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-15Lines changed: 17 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -88,6 +88,23 @@ profanity.exists('Arsenic is poisonous but not profane');
88
88
// true (matched on arse)
89
89
```
90
90
91
+
### unicodeWordBoundaries
92
+
93
+
Controls whether word boundaries are Unicode-aware. By default this is set to `false` due to the performance impact.
94
+
95
+
- When `false` (default), whole-word matching uses ASCII-style boundaries (similar to `\b`) plus underscore `_` as a separator. This is fastest and ideal for ASCII inputs.
96
+
- When `true`, whole-word matching uses Unicode-aware boundaries so words with diacritics (e.g., `vehículo`, `horário`) and compound separators are handled correctly.
97
+
98
+
```JavaScript
99
+
constprofanity=newProfanity({
100
+
unicodeWordBoundaries:true,
101
+
wholeWord:true, // must be true for boundaries to work
102
+
});
103
+
104
+
profanity.exists('vehículo horario');
105
+
// false (does not match on "culo" inside "vehículo")
106
+
```
107
+
91
108
#### Compound Words
92
109
Profanity detection works on parts of compound words, rather than treating hyphenated or underscore-separated words as indivisible.
93
110
@@ -135,21 +152,6 @@ profanity.censor('I like big butts and I cannot lie', CensorType.AllVowels);
135
152
// I like big b$tts and I cannot lie
136
153
```
137
154
138
-
### unicodeWordBoundaries
139
-
140
-
Controls whether word boundaries are Unicode-aware. By default this is set to `false` due to the performance impact.
141
-
142
-
- When `false` (default), whole-word matching uses ASCII-style boundaries (similar to `\b`) plus underscore `_` as a separator. This is fastest and ideal for ASCII inputs.
143
-
- When `true`, whole-word matching uses Unicode-aware boundaries so words with diacritics (e.g., `vehículo`, `horário`) and compound separators are handled correctly.
144
-
145
-
```JavaScript
146
-
// Enable Unicode-aware boundaries when processing non-ASCII input
0 commit comments