Fix issues with AsyncBufferSequence.LineSequence

jakepetroules · jakepetroules · commit 4be075c7a54a · 2025-06-23T17:55:04.000-07:00
The main problem is that the internal buffer used by AsyncBufferSequence.LineSequence may end on the boundary of a line ending sequence -- this impacts primarily the UTF-8 encoding, but also impacts UTF-16 and UTF-32 with the \r\n sequence specifically. When this condition occurs, the range check prevents the peek-ahead to the next 1 or 2 bytes and prevents a complete line from being returned to the client. The buffer size is based on the page size by default, which on my macOS and Linux systems respectively are 16384 and 4096. This led to testLineSequence failing frequently on Linux due to the greater likelihood of a line ending being split across a multiple of the buffer size. To fix this, we always load more bytes into the buffer until the buffer no longer ends with a potential partial line ending sequence (unless we hit EOF which correctly causes an early return). Additionally, the testLineSequence test could generate empty lines, which meant that it was possible to have a line ending with \r followed by an (empty) line ending with \n, indistinguishable from a single line ending with \r\n. This is a problem for any line ending sequences where one is a prefix of another -- \r and \r\n are the only ones which meet this criteria. To fix that, prevent the test from ever generating an empty buffer, and since it does so by restricting the character set to Latin, will never again produce the problematic sequence. Also switch testTeardownSequence to use AsyncBufferSequence.LineSequence instead of its custom line splitting logic. This ensures the test works correctly regardless of buffer size, even with a contrived buffer size of 1. Closes #78
diff --git a/Sources/Subprocess/AsyncBufferSequence.swift b/Sources/Subprocess/AsyncBufferSequence.swift
@@ -214,14 +214,35 @@ extension AsyncBufferSequence {
                 }
 
                 while true {
+                    // Whether the buffer ends with a potential partial line ending sequence.
+                    // In this case we need to buffer more content until we have a complete line ending sequence so that peeking works properly.
+                    func bufferHasPotentialPartialLineSequence() -> Bool {
+                        guard Encoding.CodeUnit.self is UInt8.Type else {
+                            // Only the \r\n sequence requires multiple code units in the UTF-16 and UTF-32 encodings
+                            return self.buffer.last == carriageReturn
+                        }
+                        switch self.buffer.last {
+                        case carriageReturn, newLine1, lineSeparator1, paragraphSeparator1:
+                            return true
+                        default:
+                            switch self.buffer.suffix(2) {
+                            case [lineSeparator1, lineSeparator2], [paragraphSeparator1, paragraphSeparator2]:
+                                return true
+                            default:
+                                return false
+                            }
+                        }
+                    }
                     // Step 1: Load more buffer if needed
-                    if self.startIndex >= self.buffer.count {
+                    var partialLineSequence = bufferHasPotentialPartialLineSequence()
+                    while partialLineSequence || self.startIndex >= self.buffer.count {
                         guard let nextBuffer = try await loadBuffer() else {
                             // We have no more data
                             // Return the remaining data
                             return yield(at: self.buffer.count)
                         }
                         self.buffer += nextBuffer
+                        partialLineSequence = bufferHasPotentialPartialLineSequence()
                     }
                     // Step 2: Iterate through buffer to find next line
                     var currentIndex: Int = self.startIndex
diff --git a/Tests/SubprocessTests/SubprocessTests+Unix.swift b/Tests/SubprocessTests/SubprocessTests+Unix.swift
@@ -760,15 +760,8 @@ extension SubprocessUnixTests {
                 }
                 group.addTask {
                     var outputs: [String] = []
-                    for try await bit in standardOutput {
-                        let bitString = bit.withUnsafeBytes { ptr in
-                            return String(decoding: ptr, as: UTF8.self)
-                        }.trimmingCharacters(in: .whitespacesAndNewlines)
-                        if bitString.contains("\n") {
-                            outputs.append(contentsOf: bitString.split(separator: "\n").map { String($0) })
-                        } else {
-                            outputs.append(bitString)
-                        }
+                    for try await line in standardOutput.lines() {
+                        outputs.append(line.trimmingCharacters(in: .newlines))
                     }
                     #expect(outputs == ["saw SIGQUIT", "saw SIGTERM", "saw SIGINT"])
                 }
@@ -881,17 +874,21 @@ extension SubprocessUnixTests {
             let length: Int
             switch size {
             case .large:
-                length = Int(Double.random(in: 1.0 ..< 2.0) * Double(readBufferSize))
+                length = Int(Double.random(in: 1.0 ..< 2.0) * Double(readBufferSize)) + 1
             case .medium:
-                length = Int(Double.random(in: 0.2 ..< 1.0) * Double(readBufferSize))
+                length = Int(Double.random(in: 0.2 ..< 1.0) * Double(readBufferSize)) + 1
             case .small:
-                length = Int.random(in: 0 ..< 16)
+                length = Int.random(in: 1 ..< 16)
             }
 
             var buffer: [UInt8] = Array(repeating: 0, count: length)
             for index in 0 ..< length {
                 buffer[index] = UInt8.random(in: range)
             }
+            // Buffer cannot be empty or a line with a \r ending followed by an empty one with a \n ending would be indistinguishable.
+            // This matters for any line ending sequences where one line ending sequence is the prefix of another. \r and \r\n are the
+            // only two which meet this criteria.
+            precondition(!buffer.isEmpty)
             return buffer
         }
 
@@ -954,6 +951,8 @@ extension SubprocessUnixTests {
         ) { execution, standardOutput in
             var index = 0
             for try await line in standardOutput.lines(encoding: UTF8.self) {
+                defer { index += 1 }
+                try #require(index < testCases.count, "Received more lines than expected")
                 #expect(
                     line == testCases[index].value,
                     """
@@ -963,7 +962,6 @@ extension SubprocessUnixTests {
                     Line Ending \(Array(testCases[index].newLine.utf8))
                     """
                 )
-                index += 1
             }
         }
         try FileManager.default.removeItem(at: testFilePath)