Skip to content

Fix issues with AsyncBufferSequence.LineSequence #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 28, 2025

Conversation

jakepetroules
Copy link
Contributor

The main problem is that the internal buffer used by AsyncBufferSequence.LineSequence may end on the boundary of a line ending sequence -- this impacts primarily the UTF-8 encoding, but also impacts UTF-16 and UTF-32 with the \r\n sequence specifically. When this condition occurs, the range check prevents the peek-ahead to the next 1 or 2 bytes and prevents a complete line from being returned to the client.

The buffer size is based on the page size by default, which on my macOS and Linux systems respectively are 16384 and 4096. This led to testLineSequence failing frequently on Linux due to the greater likelihood of a line ending being split across a multiple of the buffer size.

To fix this, we always load more bytes into the buffer until the buffer no longer ends with a potential partial line ending sequence (unless we hit EOF which correctly causes an early return).

Additionally, the testLineSequence test could generate empty lines, which meant that it was possible to have a line ending with \r followed by an (empty) line ending with \n, indistinguishable from a single line ending with \r\n. This is a problem for any line ending sequences where one is a prefix of another -- \r and \r\n are the only ones which meet this criteria. To fix that, prevent the test from ever generating an empty buffer, and since it does so by restricting the character set to Latin, will never again produce the problematic sequence.

Also switch testTeardownSequence to use AsyncBufferSequence.LineSequence instead of its custom line splitting logic. This ensures the test works correctly regardless of buffer size, even with a contrived buffer size of 1.

Closes #78

@jakepetroules
Copy link
Contributor Author

To test this, I hardcoded the buffer size to 1 to help ensure it was working correctly including in cases where it needed to read from the underlying buffer multiple times in a row for the UTF-8 line separator and paragraph separator sequences.

@jakepetroules jakepetroules force-pushed the eng/PR-fix-linesequence branch from bd41657 to 4be075c Compare June 24, 2025 00:55
@jakepetroules jakepetroules force-pushed the eng/PR-fix-linesequence branch from 4be075c to 1caa7a6 Compare June 27, 2025 08:28
The main problem is that the internal buffer used by AsyncBufferSequence.LineSequence may end on the boundary of a line ending sequence -- this impacts primarily the UTF-8 encoding, but also impacts UTF-16 and UTF-32 with the \r\n sequence specifically. When this condition occurs, the range check prevents the peek-ahead to the next 1 or 2 bytes and prevents a complete line from being returned to the client.

The buffer size is based on the page size by default, which on my macOS and Linux systems respectively are 16384 and 4096. This led to testLineSequence failing frequently on Linux due to the greater likelihood of a line ending being split across a multiple of the buffer size.

To fix this, we always load more bytes into the buffer until the buffer no longer ends with a potential partial line ending sequence (unless we hit EOF which correctly causes an early return).

Additionally, the testLineSequence test could generate empty lines, which meant that it was possible to have a line ending with \r followed by an (empty) line ending with \n, indistinguishable from a single line ending with \r\n. This is a problem for any line ending sequences where one is a prefix of another -- \r and \r\n are the only ones which meet this criteria. To fix that, prevent the test from ever generating an empty buffer, and since it does so by restricting the character set to Latin, will never again produce the problematic sequence.

Also switch testTeardownSequence to use AsyncBufferSequence.LineSequence instead of its custom line splitting logic. This ensures the test works correctly regardless of buffer size, even with a contrived buffer size of 1.

Closes #78
@jakepetroules jakepetroules force-pushed the eng/PR-fix-linesequence branch from 1caa7a6 to 6e8dab1 Compare June 27, 2025 08:31
@jakepetroules
Copy link
Contributor Author

Following some feedback from @itingliu, I ended up restructuring the code to make it much easier to follow and validate. The previous iteration of this patch also still failed in rare cases, but the latest version I've had running the test in a loop for probably 12 hours straight with no issues, so this should be quite solid now.

@jakepetroules jakepetroules merged commit 5ab5f7b into main Jun 28, 2025
21 checks passed
@jakepetroules jakepetroules deleted the eng/PR-fix-linesequence branch June 28, 2025 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

testLineSequence is flaky, at least on Linux
3 participants