Skip to content

Bug: Format-preserving printer adds extra backslash when reprinting a modified string node with \\\ inside in pSingleQuotedString #1144

@maks-rafalko

Description

@maks-rafalko

Summary

When printFormatPreserving reprints a Scalar\String_ node whose value has changed, it falls back to pSingleQuotedString. That method's regex contains a third alternative (?<=\\)\\ that escapes a backslash simply because it follows another backslash, even when no escaping is required for correctness.

This turns valid non-canonical source like '\\\|' (3 backslashes) into '\\\\|' (4 backslashes), producing a different but semantically equivalent string.

See

protected function pSingleQuotedString(string $string): string {
// It is idiomatic to only escape backslashes when necessary, i.e. when followed by ', \ or
// the end of the string ('Foo\Bar' instead of 'Foo\\Bar'). However, we also don't want to
// produce an odd number of backslashes, so '\\\\a' should not get rendered as '\\\a', even
// though that would be legal.
$regex = '/\'|\\\\(?=[\'\\\\]|$)|(?<=\\\\)\\\\/';
return '\'' . preg_replace($regex, '\\\\$0', $string) . '\'';
}

Expected behaviour

The reprinted string preserves the original three-backslash encoding '\\\|' when format-preserving pretty printer is used.

Actual behaviour

The reprinted string uses four backslashes '\\\\|', adding one backslash that was not
present in the original source.

Root cause

Standard::pSingleQuotedString uses this regex:

$regex = '/\'|\\\\(?=[\'\\\\]|$)|(?<=\\\\)\\\\/';
//                                 ^^^^^^^^^^^^^^ third alternative

Possible fix

For format-preserving pretty printing, we can probably remove the part where extra \ is added:

// before 
$regex = '/\'|\\\\(?=[\'\\\\]|$)|(?<=\\\\)\\\\/';

// after 
$regex = '/\'|\\\\(?=[\'\\\\]|$)/';

How to reproduce? I will add the full script that demonstrates a bug, but the crucial thing is that we modify the string (regex) before pretty-printing

Reproducer
<?php

require_once __DIR__ . '/vendor/autoload.php';

use PhpParser\NodeFinder;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitor\CloningVisitor;
use PhpParser\ParserFactory;
use PhpParser\PrettyPrinter\Standard;

// ---------------------------------------------------------------------------
// Original PHP source containing the non-canonical regex.
// '\\\|' is 3 backslashes in source: '\\' (→ \) + '\|' (→ \| literally),
// so the string VALUE has 2 backslashes + pipe — valid but non-canonical.
// ---------------------------------------------------------------------------
$originalCode = <<<'PHP'
<?php
preg_match('/(?:.*controller\\\|.*controllers\\\)([\w\\\]+)controller$/iU', '', $m);
PHP;

$parser = (new ParserFactory())->createForHostVersion();
$originalStatements = $parser->parse($originalCode);
$originalTokens     = $parser->getTokens();

// Clone the AST (same as Infection's CloningVisitor step in the mutation pipeline).
$clonedStatements = (new NodeTraverser(new CloningVisitor()))->traverse($originalStatements);

// Find the String_ node in the clone and change its value to simulate
// PregMatchRemoveFlags removing the 'U' flag: '/…/iU' → '/…/i'.
$stringNode = (new NodeFinder())->findFirstInstanceOf(
    $clonedStatements,
    PhpParser\Node\Scalar\String_::class,
);

$originalValue      = $stringNode->value;           // current decoded value  (ends with /iU)
$stringNode->value  = rtrim($originalValue, 'U');   // mutated decoded value  (ends with /i)

echo "Original decoded value : " . json_encode($originalValue)     . "\n";
echo "Mutated  decoded value : " . json_encode($stringNode->value) . "\n\n";

// ---------------------------------------------------------------------------
// Print with the DEFAULT Standard printer (exhibits the bug).
// ---------------------------------------------------------------------------
$standardPrinter = new Standard();
$standardOutput  = $standardPrinter->printFormatPreserving(
    $clonedStatements,
    $originalStatements,
    $originalTokens,
);

// ---------------------------------------------------------------------------
// Print with the fixed printer (drops the (?<=\\)\\ alternative that causes
// the extra backslash).
// ---------------------------------------------------------------------------
$infectionPrinter = new class extends Standard {
    protected function pSingleQuotedString(string $string): string
    {
        return '\'' . preg_replace('/\'|\\\\(?=[\'\\\\]|$)/', '\\\\$0', $string) . '\'';
    }
};
$infectionOutput  = $infectionPrinter->printFormatPreserving(
    $clonedStatements,
    $originalStatements,
    $originalTokens,
);

echo "=== Standard printer (BUG: \\\\\\\\| instead of \\\\\\|) ===\n";
echo $standardOutput . "\n\n";

echo "=== InfectionPrettyPrinter (FIXED: \\\\\\| preserved) ===\n";
echo $infectionOutput . "\n\n";

// ---------------------------------------------------------------------------
// Explicit assertion so the script exits non-zero on failure.
// ---------------------------------------------------------------------------
$threeBackslashes = str_repeat('\\', 3);
$fourBackslashes  = str_repeat('\\', 4);

assert(
    str_contains($standardOutput, $fourBackslashes . '|'),
    'Expected Standard printer to produce 4 backslashes before |',
);
assert(
    str_contains($infectionOutput, $threeBackslashes . '|'),
    'Expected InfectionPrettyPrinter to produce 3 backslashes before |',
);
assert(
    !str_contains($infectionOutput, $fourBackslashes . '|'),
    'Expected InfectionPrettyPrinter NOT to produce 4 backslashes before |',
);

echo "All assertions passed.\n";

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions