Skip to content

Commit 2dad945

Browse files
committed
utf8.c: White-space only
Outdent and reflow some comments and code in preparation for them to be moved out of the loop
1 parent b12a54f commit 2dad945

File tree

1 file changed

+41
-44
lines changed

1 file changed

+41
-44
lines changed

utf8.c

Lines changed: 41 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -2534,55 +2534,52 @@ Perl_utf8_to_bytes(pTHX_ U8 *s, STRLEN *lenp)
25342534
continue;
25352535
}
25362536

2537-
/* Here, it is malformed. This shouldn't happen on EBCDIC, and on
2538-
* ASCII platforms, we know that the only start bytes in the text
2539-
* are C2 and C3, and the code above has made sure that it doesn't
2540-
* end with a start byte. That means the only malformations that
2541-
* are possible are a start byte without a continuation (either
2542-
* followed by another start byte or an invariant) or an unexpected
2543-
* continuation.
2544-
*
2545-
* We have to undo all we've done before, back down to the first
2546-
* UTF-8 variant. Note that each 2-byte variant we've done so far
2547-
* (converted to single byte) slides things to the left one byte,
2548-
* and so we have bytes that haven't been written over.
2549-
*
2550-
* Here, 'd' points to the next position to overwrite, and 's'
2551-
* points to the first invalid byte. That means 'd's contents
2552-
* haven't been changed yet, nor has anything else beyond it in the
2553-
* string. In restoring to the original contents, we don't need to
2554-
* do anything past (d-1).
2555-
*
2556-
* In particular, the bytes from 'd' to 's' have not been changed.
2557-
* This loop uses a new variable 's1' (to avoid confusing 'source'
2558-
* and 'destination') set to 'd', and moves 's' and 's1' in lock
2559-
* step back so that afterwards, 's1' points to the first changed
2560-
* byte that will be the source for the first byte (or bytes) at
2561-
* 's' that need to be changed back. Note that s1 can expand to
2562-
* two bytes */
2563-
U8 * s1 = d;
2564-
while (s >= d) {
2537+
/* Here, it is malformed. This shouldn't happen on EBCDIC, and on ASCII
2538+
* platforms, we know that the only start bytes in the text are C2 and C3,
2539+
* and the code above has made sure that it doesn't end with a start byte.
2540+
* That means the only malformations that are possible are a start byte
2541+
* without a continuation (either followed by another start byte or an
2542+
* invariant) or an unexpected continuation.
2543+
*
2544+
* We have to undo all we've done before, back down to the first UTF-8
2545+
* variant. Note that each 2-byte variant we've done so far (converted to
2546+
* single byte) slides things to the left one byte, and so we have bytes
2547+
* that haven't been written over.
2548+
*
2549+
* Here, 'd' points to the next position to overwrite, and 's' points to
2550+
* the first invalid byte. That means 'd's contents haven't been changed
2551+
* yet, nor has anything else beyond it in the string. In restoring to the
2552+
* original contents, we don't need to do anything past (d-1).
2553+
*
2554+
* In particular, the bytes from 'd' to 's' have not been changed. This
2555+
* loop uses a new variable 's1' (to avoid confusing 'source' and
2556+
* 'destination') set to 'd', and moves 's' and 's1' in lock step back so
2557+
* that afterwards, 's1' points to the first changed byte that will be the
2558+
* source for the first byte (or bytes) at 's' that need to be changed
2559+
* back. Note that s1 can expand to two bytes */
2560+
U8 * s1 = d;
2561+
while (s >= d) {
2562+
s--;
2563+
if (! UVCHR_IS_INVARIANT(*s1)) {
25652564
s--;
2566-
if (! UVCHR_IS_INVARIANT(*s1)) {
2567-
s--;
2568-
}
2569-
s1--;
25702565
}
2566+
s1--;
2567+
}
25712568

2572-
/* Do the changing back */
2573-
while (s1 >= first_variant) {
2574-
if (UVCHR_IS_INVARIANT(*s1)) {
2575-
*s-- = *s1--;
2576-
}
2577-
else {
2578-
*s-- = UTF8_EIGHT_BIT_LO(*s1);
2579-
*s-- = UTF8_EIGHT_BIT_HI(*s1);
2580-
s1--;
2581-
}
2569+
/* Do the changing back */
2570+
while (s1 >= first_variant) {
2571+
if (UVCHR_IS_INVARIANT(*s1)) {
2572+
*s-- = *s1--;
25822573
}
2574+
else {
2575+
*s-- = UTF8_EIGHT_BIT_LO(*s1);
2576+
*s-- = UTF8_EIGHT_BIT_HI(*s1);
2577+
s1--;
2578+
}
2579+
}
25832580

2584-
*lenp = ((STRLEN) -1);
2585-
return NULL;
2581+
*lenp = ((STRLEN) -1);
2582+
return NULL;
25862583
}
25872584

25882585
/* Success! */

0 commit comments

Comments
 (0)