Skip to content

Commit 09e7697

Browse files
committed
PHP 8.3 | Tokenizer/PHP: support "yield from" with comments
As discussed in and discovered via issue 529: * Prior to PHP 8.3, only whitespace was allowed between the `yield` and `from` keywords. A comment between the `yield` and `from` keywords in a `yield from` expression would result in a parse error. * As of PHP 8.3, this is no longer a parse error and both whitespace as well as comments are allowed between the `yield` and `from` and the complete expression is tokenized in PHP itself as one `T_YIELD_FROM` token. See: https://3v4l.org/2SI2Q#veol In the context of PHPCS this is problematic as comments should always have their own token to allow sniffs to examine them. Additionally, such comments may contain PHPCS ignore annotations, which, when not tokenized as a separate token, will not be respected. This commit adds support for this change in PHP 8.3 to PHP_CodeSniffer. It does contain an, albeit small, BC-break, due to the BC-break created by PHP. Previously in PHPCS: * A single line `yield from` expression would always tokenize as `T_YIELD_FROM`, independently of the type and amount of whitespace between the keywords. * A multi-line `yield` [new line]+ `from` expression would tokenize as multiple `T_YIELD_FROM` tokens, one for each line. * A `yield from` expression with a comment between the keywords was not supported. In PHP < 8.3, this meant that this would tokenize as `T_YIELD`, [`T_WHITESPACE`|T_COMMENT`]+, `T_STRING` (`from`). As of PHP 8.3, this was tokenized as one or more `T_YIELD_FROM` tokens (depending on single/multi-line) with the comment being tokenized as `T_YIELD_FROM` as well. This commit changes this as follows: * Single line `yield from` expression with only whitespace between the keywords: **no change**, this will still tokenize as a single `T_YIELD_FROM` token. * Multi-line `yield` [new line]+ `from` expressions and `yield from` expressions with a comment (both single line as well as multi-line) will now consistently be tokenized as `T_YIELD_FROM` (`yield`), [`T_WHITESPACE`|T_COMMENT`]+, `T_YIELD_FROM` (`from`). In practice, this means that: * Whitespace and comments between the keywords can now be examined and handled by relevant sniffs, which are likely to give more accurate results (fewer false negatives, like for tab indentation of a `from` keyword on a separate line). * The tokenization used by PHPCS is now consistent again for all supported PHP versions. * The PHP 8.3 change is now supported. It does mean that sniffs which explicitly handle multi-token `yield from` expressions, will need to be updated. In my opinion, adding this change in a minor is justified as: 1. The PHP 8.3 change can not be supported otherwise. 2. The impact is expected to be minimal anyhow as there are not many sniffs which specifically look for and handle `T_YIELD_FROM` tokens and those sniffs within PHPCS itself will be updated/adjusted in the same release. Also, the (negative) impact on _end-users_ of this BC-break is also expected to be minimal as a scan of the top 2000 projects listed on Packagist shows that in those project no multi-line/multi-token `yield from` expressions are used in the source code, which means that even when sniff code is not updated (yet) for the change in tokenization, the chances of an end-user getting incorrect results because of this are very slim as the code affected is just not written as multi-line/with comment that often. Includes tests. Fixes 529 Refs: * squizlabs/PHP_CodeSniffer 1524 (original polyfill code) * php/php-src 10125 * php/php-src 14926 * https://externals.io/message/124462 --- Information for standards maintainers The "yield from" _keyword_ could previously already consist of multiple T_YIELD_FROM tokens if the "keyword" was spread over multiple lines. Now, the tokens between the actual keywords will be tokenized as `T_WHITESPACE` and comment tokens. To find the last token for a `T_YIELD_FROM` "keyword", change old code like this: ```php $yieldFromEnd = $stackPtr; if (preg_match('`yield\s+from`', $tokens[$stackPtr]['content']) !== 1) { for ($yieldFromEnd = ($stackPtr + 1); $tokens[$yieldFromEnd]['code'] === T_YIELD_FROM; $yieldFromEnd++); --$yieldFromEnd; } ``` to ```php $yieldFromEnd = $stackPtr; if (strtolower(trim($tokens[$stackPtr]['content'])) === 'yield') { for ($i = ($stackPtr + 1); $i < $phpcsFile->numTokens; $i++) { if ($tokens[$i]['code'] === T_YIELD_FROM && strtolower(trim($tokens[$i]['content'])) === 'from') { $yieldFromEnd = $i; break; } if (isset(Tokens::$emptyTokens[$tokens[$i]['code']]) === false && $tokens[$i]['code'] !== T_YIELD_FROM) { // Shouldn't be possible. Just to be on the safe side. break; } } } ``` The above presumes that `$stackPtr` is set to a `T_YIELD_FROM` token. Also note that the second code snippet is largely cross-version compatible. It will work with older PHPCS versions with code compatible with PHP < 8.3 and will work on PHPCS 3.11.0+ for code compatible with all supported PHP versions.
1 parent aafc72f commit 09e7697

File tree

3 files changed

+328
-21
lines changed

3 files changed

+328
-21
lines changed

src/Tokenizers/PHP.php

Lines changed: 113 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1514,35 +1514,130 @@ protected function tokenize($string)
15141514
}//end if
15151515

15161516
/*
1517-
Before PHP 7.0, the "yield from" was tokenized as
1517+
Before PHP 7.0, "yield from" was tokenized as
15181518
T_YIELD, T_WHITESPACE and T_STRING. So look for
15191519
and change this token in earlier versions.
1520+
1521+
Before PHP 8.3, if there was a comment between the "yield" and "from" keywords,
1522+
it was tokenized as T_YIELD, T_WHITESPACE, T_COMMENT... and T_STRING.
1523+
We want to keep the tokenization of the tokens between, but need to change the
1524+
`T_YIELD` and `T_STRING` (from) keywords to `T_YIELD_FROM.
15201525
*/
15211526

1522-
if (PHP_VERSION_ID < 70000
1527+
if (PHP_VERSION_ID < 80300
15231528
&& $tokenIsArray === true
1524-
&& $token[0] === T_YIELD
1525-
&& isset($tokens[($stackPtr + 1)]) === true
1526-
&& isset($tokens[($stackPtr + 2)]) === true
1527-
&& $tokens[($stackPtr + 1)][0] === T_WHITESPACE
1528-
&& $tokens[($stackPtr + 2)][0] === T_STRING
1529-
&& strtolower($tokens[($stackPtr + 2)][1]) === 'from'
1529+
&& $token[0] === T_STRING
1530+
&& strtolower($token[1]) === 'from'
1531+
&& $finalTokens[$lastNotEmptyToken]['code'] === T_YIELD
15301532
) {
1531-
// Could be multi-line, so adjust the token stack.
1532-
$token[0] = T_YIELD_FROM;
1533-
$token[1] .= $tokens[($stackPtr + 1)][1].$tokens[($stackPtr + 2)][1];
1533+
$finalTokens[$lastNotEmptyToken]['code'] = T_YIELD_FROM;
1534+
$finalTokens[$lastNotEmptyToken]['type'] = 'T_YIELD_FROM';
1535+
1536+
$finalTokens[$newStackPtr] = [
1537+
'code' => T_YIELD_FROM,
1538+
'type' => 'T_YIELD_FROM',
1539+
'content' => $token[1],
1540+
];
1541+
$newStackPtr++;
15341542

15351543
if (PHP_CODESNIFFER_VERBOSITY > 1) {
1536-
for ($i = ($stackPtr + 1); $i <= ($stackPtr + 2); $i++) {
1537-
$type = Tokens::tokenName($tokens[$i][0]);
1538-
$content = Common::prepareForOutput($tokens[$i][1]);
1539-
echo "\t\t* token $i merged into T_YIELD_FROM; was: $type => $content".PHP_EOL;
1544+
echo "\t\t* token $lastNotEmptyToken (new stack) changed into T_YIELD_FROM; was: T_YIELD".PHP_EOL;
1545+
echo "\t\t* token $stackPtr changed into T_YIELD_FROM; was: T_STRING".PHP_EOL;
1546+
}
1547+
1548+
continue;
1549+
} else if (PHP_VERSION_ID >= 70000
1550+
&& $tokenIsArray === true
1551+
&& $token[0] === T_YIELD_FROM
1552+
&& strpos($token[1], $this->eolChar) !== false
1553+
&& preg_match('`^yield\s+from$`i', $token[1]) === 1
1554+
) {
1555+
/*
1556+
In PHP 7.0+, a multi-line "yield from" (without comment) tokenizes as a single
1557+
T_YIELD_FROM token, but we want to split it and tokenize the whitespace
1558+
separately for consistency.
1559+
*/
1560+
1561+
$finalTokens[$newStackPtr] = [
1562+
'code' => T_YIELD_FROM,
1563+
'type' => 'T_YIELD_FROM',
1564+
'content' => substr($token[1], 0, 5),
1565+
];
1566+
$newStackPtr++;
1567+
1568+
$tokenLines = explode($this->eolChar, substr($token[1], 5, -4));
1569+
$numLines = count($tokenLines);
1570+
$newToken = [
1571+
'type' => 'T_WHITESPACE',
1572+
'code' => T_WHITESPACE,
1573+
'content' => '',
1574+
];
1575+
1576+
foreach ($tokenLines as $i => $line) {
1577+
$newToken['content'] = $line;
1578+
if ($i === ($numLines - 1)) {
1579+
if ($line === '') {
1580+
break;
1581+
}
1582+
} else {
1583+
$newToken['content'] .= $this->eolChar;
15401584
}
1585+
1586+
$finalTokens[$newStackPtr] = $newToken;
1587+
$newStackPtr++;
15411588
}
15421589

1543-
$tokens[($stackPtr + 1)] = null;
1544-
$tokens[($stackPtr + 2)] = null;
1545-
}
1590+
$finalTokens[$newStackPtr] = [
1591+
'code' => T_YIELD_FROM,
1592+
'type' => 'T_YIELD_FROM',
1593+
'content' => substr($token[1], -4),
1594+
];
1595+
$newStackPtr++;
1596+
1597+
if (PHP_CODESNIFFER_VERBOSITY > 1) {
1598+
echo "\t\t* token $stackPtr split into 'yield', one or more whitespace tokens and 'from'".PHP_EOL;
1599+
}
1600+
1601+
continue;
1602+
} else if (PHP_VERSION_ID >= 80300
1603+
&& $tokenIsArray === true
1604+
&& $token[0] === T_YIELD_FROM
1605+
&& preg_match('`^yield[ \t]+from$`i', $token[1]) !== 1
1606+
&& stripos($token[1], 'yield') === 0
1607+
) {
1608+
/*
1609+
Since PHP 8.3, "yield from" allows for comments and will
1610+
swallow the comment in the `T_YIELD_FROM` token.
1611+
We need to split this up to allow for sniffs handling comments.
1612+
*/
1613+
1614+
$finalTokens[$newStackPtr] = [
1615+
'code' => T_YIELD_FROM,
1616+
'type' => 'T_YIELD_FROM',
1617+
'content' => substr($token[1], 0, 5),
1618+
];
1619+
$newStackPtr++;
1620+
1621+
$yieldFromSubtokens = @token_get_all("<?php\n".substr($token[1], 5, -4));
1622+
// Remove the PHP open tag token.
1623+
array_shift($yieldFromSubtokens);
1624+
// Add the "from" keyword.
1625+
$yieldFromSubtokens[] = [
1626+
0 => T_YIELD_FROM,
1627+
1 => substr($token[1], -4),
1628+
];
1629+
1630+
// Inject the new tokens into the token stack.
1631+
array_splice($tokens, ($stackPtr + 1), 0, $yieldFromSubtokens);
1632+
$numTokens = count($tokens);
1633+
1634+
if (PHP_CODESNIFFER_VERBOSITY > 1) {
1635+
echo "\t\t* token $stackPtr split into parts (yield from with comment)".PHP_EOL;
1636+
}
1637+
1638+
unset($yieldFromSubtokens);
1639+
continue;
1640+
}//end if
15461641

15471642
/*
15481643
Before PHP 5.6, the ... operator was tokenized as three

tests/Core/Tokenizers/PHP/YieldTest.inc

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,31 @@ function generator()
2222

2323
FROM
2424
gen2();
25+
26+
/* testYieldFromSplitByComment */
27+
yield /* comment */ from gen2();
28+
29+
/* testYieldFromWithTrailingComment */
30+
yield // comment
31+
from gen2();
32+
33+
/* testYieldFromWithTrailingAnnotation */
34+
yield // phpcs:ignore Stnd.Cat.Sniff -- for reasons.
35+
from gen2();
36+
37+
/* testYieldFromSplitByNewLineAndComments */
38+
yield
39+
/* comment line 1
40+
line 2 */
41+
// another comment
42+
from
43+
gen2();
44+
45+
/* testYieldFromSplitByNewLineAndAnnotation */
46+
YIELD
47+
// @phpcs:disable Stnd.Cat.Sniff -- for reasons.
48+
From
49+
gen2();
2550
}
2651

2752
/* testYieldUsedAsClassName */

tests/Core/Tokenizers/PHP/YieldTest.php

Lines changed: 190 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -165,22 +165,209 @@ public function testYieldFromKeywordMultiToken($testMarker, $expectedTokens)
165165
public static function dataYieldFromKeywordMultiToken()
166166
{
167167
return [
168-
'yield from with new line' => [
168+
'yield from with new line' => [
169169
'testMarker' => '/* testYieldFromSplitByNewLines */',
170170
'expectedTokens' => [
171171
[
172172
'type' => 'T_YIELD_FROM',
173-
'content' => 'yield
173+
'content' => 'yield',
174+
],
175+
[
176+
'type' => 'T_WHITESPACE',
177+
'content' => '
178+
',
179+
],
180+
[
181+
'type' => 'T_WHITESPACE',
182+
'content' => '
183+
',
184+
],
185+
[
186+
'type' => 'T_WHITESPACE',
187+
'content' => ' ',
188+
],
189+
[
190+
'type' => 'T_YIELD_FROM',
191+
'content' => 'FROM',
192+
],
193+
[
194+
'type' => 'T_WHITESPACE',
195+
'content' => '
196+
',
197+
],
198+
],
199+
],
200+
'yield from with comment' => [
201+
'testMarker' => '/* testYieldFromSplitByComment */',
202+
'expectedTokens' => [
203+
[
204+
'type' => 'T_YIELD_FROM',
205+
'content' => 'yield',
206+
],
207+
[
208+
'type' => 'T_WHITESPACE',
209+
'content' => ' ',
210+
],
211+
[
212+
'type' => 'T_COMMENT',
213+
'content' => '/* comment */',
214+
],
215+
[
216+
'type' => 'T_WHITESPACE',
217+
'content' => ' ',
218+
],
219+
[
220+
'type' => 'T_YIELD_FROM',
221+
'content' => 'from',
222+
],
223+
[
224+
'type' => 'T_WHITESPACE',
225+
'content' => ' ',
226+
],
227+
],
228+
],
229+
'yield from with trailing comment' => [
230+
'testMarker' => '/* testYieldFromWithTrailingComment */',
231+
'expectedTokens' => [
232+
[
233+
'type' => 'T_YIELD_FROM',
234+
'content' => 'yield',
235+
],
236+
[
237+
'type' => 'T_WHITESPACE',
238+
'content' => ' ',
239+
],
240+
[
241+
'type' => 'T_COMMENT',
242+
'content' => '// comment
243+
',
244+
],
245+
[
246+
'type' => 'T_WHITESPACE',
247+
'content' => ' ',
248+
],
249+
[
250+
'type' => 'T_YIELD_FROM',
251+
'content' => 'from',
252+
],
253+
[
254+
'type' => 'T_WHITESPACE',
255+
'content' => ' ',
256+
],
257+
],
258+
],
259+
'yield from with trailing annotation' => [
260+
'testMarker' => '/* testYieldFromWithTrailingAnnotation */',
261+
'expectedTokens' => [
262+
[
263+
'type' => 'T_YIELD_FROM',
264+
'content' => 'yield',
265+
],
266+
[
267+
'type' => 'T_WHITESPACE',
268+
'content' => ' ',
269+
],
270+
[
271+
'type' => 'T_PHPCS_IGNORE',
272+
'content' => '// phpcs:ignore Stnd.Cat.Sniff -- for reasons.
174273
',
175274
],
275+
[
276+
'type' => 'T_WHITESPACE',
277+
'content' => ' ',
278+
],
279+
[
280+
'type' => 'T_YIELD_FROM',
281+
'content' => 'from',
282+
],
283+
[
284+
'type' => 'T_WHITESPACE',
285+
'content' => ' ',
286+
],
287+
],
288+
],
289+
'yield from with new line and comment' => [
290+
'testMarker' => '/* testYieldFromSplitByNewLineAndComments */',
291+
'expectedTokens' => [
176292
[
177293
'type' => 'T_YIELD_FROM',
294+
'content' => 'yield',
295+
],
296+
[
297+
'type' => 'T_WHITESPACE',
178298
'content' => '
179299
',
180300
],
301+
[
302+
'type' => 'T_WHITESPACE',
303+
'content' => ' ',
304+
],
305+
[
306+
'type' => 'T_COMMENT',
307+
'content' => '/* comment line 1
308+
',
309+
],
310+
[
311+
'type' => 'T_COMMENT',
312+
'content' => ' line 2 */',
313+
],
314+
[
315+
'type' => 'T_WHITESPACE',
316+
'content' => '
317+
',
318+
],
319+
[
320+
'type' => 'T_WHITESPACE',
321+
'content' => ' ',
322+
],
323+
[
324+
'type' => 'T_COMMENT',
325+
'content' => '// another comment
326+
',
327+
],
328+
[
329+
'type' => 'T_WHITESPACE',
330+
'content' => ' ',
331+
],
332+
[
333+
'type' => 'T_YIELD_FROM',
334+
'content' => 'from',
335+
],
336+
[
337+
'type' => 'T_WHITESPACE',
338+
'content' => '
339+
',
340+
],
341+
],
342+
],
343+
'yield from with new line and annotation' => [
344+
'testMarker' => '/* testYieldFromSplitByNewLineAndAnnotation */',
345+
'expectedTokens' => [
346+
[
347+
'type' => 'T_YIELD_FROM',
348+
'content' => 'YIELD',
349+
],
350+
[
351+
'type' => 'T_WHITESPACE',
352+
'content' => '
353+
',
354+
],
355+
[
356+
'type' => 'T_WHITESPACE',
357+
'content' => ' ',
358+
],
359+
[
360+
'type' => 'T_PHPCS_DISABLE',
361+
'content' => '// @phpcs:disable Stnd.Cat.Sniff -- for reasons.
362+
',
363+
],
364+
[
365+
'type' => 'T_WHITESPACE',
366+
'content' => ' ',
367+
],
181368
[
182369
'type' => 'T_YIELD_FROM',
183-
'content' => ' FROM',
370+
'content' => 'From',
184371
],
185372
[
186373
'type' => 'T_WHITESPACE',

0 commit comments

Comments
 (0)