Skip to content

Commit f056f5f

Browse files
deemonicclaude
andauthored
fix: detect spaced profanity like "f u c k i n g" (#36) (#37)
* fix: detect spaced profanity like "f u c k i n g" (#36) The isSpanningWordBoundary() method was incorrectly rejecting spaced-out profanity because it checked if the first or last part was a single character. This caused intentional obfuscation like "f u c k i n g" to be missed. The fix distinguishes between: - Intentional obfuscation: ALL parts are single chars → allow detection - Cross-word accidents: only SOME parts are single chars → reject Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: improve docstring for isSpanningWordBoundary method Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent c2b8d31 commit f056f5f

File tree

2 files changed

+43
-14
lines changed

2 files changed

+43
-14
lines changed

src/BlaspService.php

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -352,34 +352,47 @@ private function handle(): self
352352
}
353353

354354
/**
355-
* Check if a match inappropriately spans across word boundaries.
356-
*
357-
* @param string $matchedText The text that was matched by the regex
358-
* @return bool
355+
* Determine whether a matched substring inappropriately spans word boundaries (and should be treated as a cross-word match).
356+
*
357+
* @param string $matchedText The substring captured by the detector, possibly containing internal whitespace or obfuscation.
358+
* @return bool `true` if the match spans word boundaries and should be rejected, `false` otherwise.
359359
*/
360360
private function isSpanningWordBoundary(string $matchedText): bool
361361
{
362362
// If the match contains spaces, it might be spanning word boundaries
363363
if (preg_match('/\s+/', $matchedText)) {
364-
// Split by spaces to check the word structure
365364
$parts = preg_split('/\s+/', $matchedText);
366-
367-
// If we have multiple parts and the last part is just a single character,
368-
// it's likely the beginning of the next word
365+
369366
if (count($parts) > 1) {
367+
// Count how many parts are single characters
368+
$singleCharCount = 0;
369+
foreach ($parts as $part) {
370+
if (strlen($part) === 1 && preg_match('/[a-z]/i', $part)) {
371+
$singleCharCount++;
372+
}
373+
}
374+
375+
// If ALL parts are single characters, this is intentional obfuscation
376+
// (e.g., "f u c k i n g") - allow it
377+
if ($singleCharCount === count($parts)) {
378+
return false;
379+
}
380+
381+
// If SOME parts are single characters at edges, this is likely
382+
// a cross-word match (e.g., "t êt" from "pourrait être") - reject it
383+
$firstPart = $parts[0];
370384
$lastPart = end($parts);
385+
371386
if (strlen($lastPart) === 1 && preg_match('/[a-z]/i', $lastPart)) {
372-
return true; // Last part is single char - likely from next word
387+
return true;
373388
}
374-
375-
// Also check if first part is single char (less common but possible)
376-
$firstPart = $parts[0];
389+
377390
if (strlen($firstPart) === 1 && preg_match('/[a-z]/i', $firstPart)) {
378-
return true; // First part is single char - likely from previous word
391+
return true;
379392
}
380393
}
381394
}
382-
395+
383396
return false;
384397
}
385398

tests/BlaspCheckTest.php

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,4 +290,20 @@ public function test_multiple_profanities_with_spaces()
290290
$this->assertCount(2, $result->uniqueProfanitiesFound);
291291
$this->assertSame('This is a ******* **** sentence', $result->cleanString);
292292
}
293+
294+
public function test_spaced_profanity_with_substitution()
295+
{
296+
// Issue #36 - README example should be detected
297+
$result = $this->blaspService->check('This is f u c k 1 n g awesome!');
298+
299+
$this->assertTrue($result->hasProfanity);
300+
$this->assertStringContainsString('*', $result->cleanString);
301+
}
302+
303+
public function test_spaced_profanity_without_substitution()
304+
{
305+
$result = $this->blaspService->check('f u c k i n g');
306+
307+
$this->assertTrue($result->hasProfanity);
308+
}
293309
}

0 commit comments

Comments
 (0)