Commit 7190213
fix: detect partial spacing profanity obfuscation (#44)
* fix: detect partial spacing profanity obfuscation
Profanity obfuscation using partial spacing was not being detected:
- "s hit" not detected as "shit"
- "f uck" not detected as "fuck"
- "t wat" not detected as "twat"
The isSpanningWordBoundary() method had overly strict logic that
rejected legitimate partial spacing patterns.
This fix modifies the method to check surrounding context instead
of relying on heuristics about single-character parts:
- If alphanumeric char immediately before match → embedded in word → reject
- If alphanumeric char immediately after match → embedded in word → reject
- Otherwise → standalone text, likely intentional obfuscation → allow
Added 6 new test cases for partial spacing detection.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: convert byte offset to character offset for multibyte support
preg_match_all returns byte offsets, but mb_substr/mb_strlen expect
character offsets. This fix converts the byte offset to a character
offset before performing boundary checks, ensuring correct behavior
with multibyte characters (accented letters, etc.).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>1 parent 5e1e0fc commit 7190213
2 files changed
+114
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
311 | 311 | | |
312 | 312 | | |
313 | 313 | | |
314 | | - | |
| 314 | + | |
315 | 315 | | |
316 | 316 | | |
317 | 317 | | |
| |||
406 | 406 | | |
407 | 407 | | |
408 | 408 | | |
409 | | - | |
| 409 | + | |
410 | 410 | | |
411 | | - | |
412 | | - | |
413 | | - | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
421 | | - | |
422 | | - | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
423 | 415 | | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
| 416 | + | |
429 | 417 | | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | | - | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
434 | 421 | | |
435 | | - | |
436 | | - | |
437 | | - | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
438 | 429 | | |
439 | | - | |
440 | | - | |
441 | | - | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
442 | 483 | | |
| 484 | + | |
443 | 485 | | |
444 | 486 | | |
| 487 | + | |
445 | 488 | | |
446 | 489 | | |
447 | 490 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
306 | 306 | | |
307 | 307 | | |
308 | 308 | | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
309 | 351 | | |
0 commit comments