Fix issues with AsyncBufferSequence.LineSequence #91

jakepetroules · 2025-06-23T23:50:53Z

The main problem is that the internal buffer used by AsyncBufferSequence.LineSequence may end on the boundary of a line ending sequence -- this impacts primarily the UTF-8 encoding, but also impacts UTF-16 and UTF-32 with the \r\n sequence specifically. When this condition occurs, the range check prevents the peek-ahead to the next 1 or 2 bytes and prevents a complete line from being returned to the client.

The buffer size is based on the page size by default, which on my macOS and Linux systems respectively are 16384 and 4096. This led to testLineSequence failing frequently on Linux due to the greater likelihood of a line ending being split across a multiple of the buffer size.

To fix this, we always load more bytes into the buffer until the buffer no longer ends with a potential partial line ending sequence (unless we hit EOF which correctly causes an early return).

Additionally, the testLineSequence test could generate empty lines, which meant that it was possible to have a line ending with \r followed by an (empty) line ending with \n, indistinguishable from a single line ending with \r\n. This is a problem for any line ending sequences where one is a prefix of another -- \r and \r\n are the only ones which meet this criteria. To fix that, prevent the test from ever generating an empty buffer, and since it does so by restricting the character set to Latin, will never again produce the problematic sequence.

Also switch testTeardownSequence to use AsyncBufferSequence.LineSequence instead of its custom line splitting logic. This ensures the test works correctly regardless of buffer size, even with a contrived buffer size of 1.

Closes #78

jakepetroules · 2025-06-23T23:55:57Z

To test this, I hardcoded the buffer size to 1 to help ensure it was working correctly including in cases where it needed to read from the underlying buffer multiple times in a row for the UTF-8 line separator and paragraph separator sequences.

Tests/SubprocessTests/SubprocessTests+Unix.swift

Sources/Subprocess/AsyncBufferSequence.swift

The main problem is that the internal buffer used by AsyncBufferSequence.LineSequence may end on the boundary of a line ending sequence -- this impacts primarily the UTF-8 encoding, but also impacts UTF-16 and UTF-32 with the \r\n sequence specifically. When this condition occurs, the range check prevents the peek-ahead to the next 1 or 2 bytes and prevents a complete line from being returned to the client. The buffer size is based on the page size by default, which on my macOS and Linux systems respectively are 16384 and 4096. This led to testLineSequence failing frequently on Linux due to the greater likelihood of a line ending being split across a multiple of the buffer size. To fix this, we always load more bytes into the buffer until the buffer no longer ends with a potential partial line ending sequence (unless we hit EOF which correctly causes an early return). Additionally, the testLineSequence test could generate empty lines, which meant that it was possible to have a line ending with \r followed by an (empty) line ending with \n, indistinguishable from a single line ending with \r\n. This is a problem for any line ending sequences where one is a prefix of another -- \r and \r\n are the only ones which meet this criteria. To fix that, prevent the test from ever generating an empty buffer, and since it does so by restricting the character set to Latin, will never again produce the problematic sequence. Also switch testTeardownSequence to use AsyncBufferSequence.LineSequence instead of its custom line splitting logic. This ensures the test works correctly regardless of buffer size, even with a contrived buffer size of 1. Closes #78

jakepetroules · 2025-06-27T18:23:25Z

Following some feedback from @itingliu, I ended up restructuring the code to make it much easier to follow and validate. The previous iteration of this patch also still failed in rare cases, but the latest version I've had running the test in a loop for probably 12 hours straight with no issues, so this should be quite solid now.

jakepetroules requested a review from iCharlesHu as a code owner June 23, 2025 23:50

jakepetroules force-pushed the eng/PR-fix-linesequence branch from bd41657 to 4be075c Compare June 24, 2025 00:55

itingliu reviewed Jun 27, 2025

View reviewed changes

Tests/SubprocessTests/SubprocessTests+Unix.swift Show resolved Hide resolved

Sources/Subprocess/AsyncBufferSequence.swift Outdated Show resolved Hide resolved

jakepetroules force-pushed the eng/PR-fix-linesequence branch from 4be075c to 1caa7a6 Compare June 27, 2025 08:28

jakepetroules force-pushed the eng/PR-fix-linesequence branch from 1caa7a6 to 6e8dab1 Compare June 27, 2025 08:31

owenv approved these changes Jun 28, 2025

View reviewed changes

jakepetroules merged commit 5ab5f7b into main Jun 28, 2025
21 checks passed

jakepetroules deleted the eng/PR-fix-linesequence branch June 28, 2025 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issues with AsyncBufferSequence.LineSequence #91

Fix issues with AsyncBufferSequence.LineSequence #91

Uh oh!

jakepetroules commented Jun 23, 2025

Uh oh!

jakepetroules commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

jakepetroules commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Fix issues with AsyncBufferSequence.LineSequence #91

Fix issues with AsyncBufferSequence.LineSequence #91

Uh oh!

Conversation

jakepetroules commented Jun 23, 2025

Uh oh!

jakepetroules commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

jakepetroules commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!