Skip to content

fix(internal/syncwriter): stop corrupting test output on sink write errors#226

Merged
EhabY merged 1 commit into
mainfrom
fix/syncwriter-stderr-corruption
Jun 5, 2026
Merged

fix(internal/syncwriter): stop corrupting test output on sink write errors#226
EhabY merged 1 commit into
mainfrom
fix/syncwriter-stderr-corruption

Conversation

@EhabY
Copy link
Copy Markdown
Contributor

@EhabY EhabY commented Jun 3, 2026

The problem

internal/syncwriter reported sink write and sync failures with the builtin println. That call writes raw bytes to fd 2, and it emits the message and its trailing newline as two separate, unsynchronized syscalls. Under concurrency, a test framework's --- PASS: line can land between those two writes, so test2json never records the terminal action and tools like gotestsum flag an innocent, passing test as failed or unknown. We hit this in coder/coder, where a passing TestRetryWithInterval was reported as a failure.

Fixes #225.

The fix

Replace println with one fmt.Fprintf(os.Stderr, ...). The write can't be split across syscalls, so a reported error can no longer wedge into another test's result line, which keeps the result marker at the start of a line.

This is the minimal change. Suppressing the benign "reader gone" errors that triggered the original flake is being handled on the coder/coder side instead (a writer that opts out of those errors), so slog does not hide the fact that a user-assigned sink failed to receive output.

Evidence from a real CI failure

From this coder/coder run (2026-05-28, commit 094fe97). gotestsum marked a passing TestRetryWithInterval and all its subtests (unknown):

=== FAIL: cli TestRetryWithInterval (unknown)

because a concurrent port-forward -v test's teardown logged a closed-pipe error through the old println, which split the line:

sloghuman: failed to write entry: io: read/write on closed pipe--- PASS: TestRetryWithInterval (0.00s)

The --- PASS: marker is no longer at the start of a line, so test2json dropped the terminal action. This change keeps the newline attached so the marker always starts a line.

Tests

  • TestWriter_defaultErrorReportsThroughStderr: the default handler reports through the os.Stderr variable in one newline-terminated write, and fails if reverted to println.

go vet ./..., gofmt/gofumpt, and go test -race ./internal/syncwriter/ are clean, and the full module suite passes.

Picking this up in coder/coder

coder/coder pins cdr.dev/slog/v3 v3.0.0 with no replace, so it won't update on its own. Once this merges and a tag like v3.0.1 is cut, a follow-up PR there bumps the dependency with go get cdr.dev/slog/v3@v3.0.1 && go mod tidy.

Investigation and decision log

Root cause (internal/syncwriter/syncwriter.go on main): the New error handler ran println(fmt.Sprintf(...)). println writes to fd 2 out of sync with the test stream and splits the text and newline across two syscalls, which is what lets the interleave drop test2json terminal actions.

Scope: an earlier revision also suppressed benign io.ErrClosedPipe/syscall.EPIPE write errors inside slog. Per maintainer feedback, that was dropped: if a user-assigned sink returns an error, slog should not hide it. The opt-out belongs in the consumer (coder/coder), which is where the closing writer lives.

Test scope: reproducing the interleave itself is inherently racy, so the test asserts a deterministic, revert-sensitive property instead: diagnostics flow through the redirectable os.Stderr variable rather than the raw println builtin.


Generated by Coder Agents on behalf of @EhabY.

@EhabY EhabY self-assigned this Jun 3, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 3, 2026

Coverage Report for CI Build 1

Coverage increased (+0.008%) to 96.034%

Details

  • Coverage increased (+0.008%) from the base build.
  • Patch coverage: 3 of 3 lines across 1 file are fully covered (100%).
  • No coverage regressions found.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 933
Covered Lines: 896
Line Coverage: 96.03%
Coverage Strength: 25.36 hits per line

💛 - Coveralls

@EhabY EhabY force-pushed the fix/syncwriter-stderr-corruption branch 2 times, most recently from 4302134 to 93b5136 Compare June 3, 2026 13:50
Copy link
Copy Markdown
Contributor Author

EhabY commented Jun 3, 2026

Worth flagging that the single-write change to os.Stderr is what actually fixes the test2json corruption (it stops println from splitting a message across syscalls). On top of that, this PR also suppresses benign "reader gone" write errors (io.ErrClosedPipe, syscall.EPIPE), so that e.g. background goroutines logging into a pipe a test already closed don't emit anything.

That suppression is a judgment call, so I'd like your preference between:

  1. Keep it as is — suppress "reader gone" write errors and route everything else through a single os.Stderr write. Quietest; the tradeoff is that a destination dying unexpectedly (e.g. in production) drops logs with no signal.
  2. Single-write only — drop the suppression and keep reporting every write error as one clean, newline-terminated line. Still fixes the corruption; failures stay visible, at the cost of noise when a sink's reader goes away.

For what it's worth, I scoped the current suppression to the true "reader gone" errors and deliberately left os.ErrClosed reported, since that usually means a write to an already-closed writer and is worth surfacing.

@EhabY EhabY requested a review from mafredri June 4, 2026 10:01
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The println change is fine, but can't make a judgement call about pipe handling without seeing the code that triggered it.

Comment thread internal/syncwriter/syncwriter.go Outdated
@EhabY EhabY requested a review from mafredri June 4, 2026 13:22
@EhabY EhabY force-pushed the fix/syncwriter-stderr-corruption branch 2 times, most recently from caf495c to 7674226 Compare June 4, 2026 22:08
Comment thread internal/syncwriter/syncwriter_test.go Outdated
The error handler used the builtin println, which writes to fd 2
unsynchronized with the test stream and emits the message and its
trailing newline as two separate syscalls. Under concurrency a test
framework's "--- PASS:"/"--- FAIL:" line can land between those two
writes, so test2json never records that test's terminal action and
tools like gotestsum report an innocent, passing test as unknown or
failed.

Format the message and its newline into one fmt.Fprintf to os.Stderr so
the write can't be split, which keeps the result marker at the start of
a line.

Fixes #225.
@EhabY EhabY force-pushed the fix/syncwriter-stderr-corruption branch from 7674226 to d24a8a4 Compare June 5, 2026 08:35
@EhabY EhabY merged commit 08c064d into main Jun 5, 2026
1 check passed
@EhabY EhabY deleted the fix/syncwriter-stderr-corruption branch June 5, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

syncwriter writes sink errors to raw stderr via println, corrupting go test/test2json output

3 participants