Skip to content

fix: advance read session state for ignored command records#259

Merged
infiniteregrets merged 1 commit intomainfrom
fix/advance-state-for-ignored-command-records
Mar 29, 2026
Merged

fix: advance read session state for ignored command records#259
infiniteregrets merged 1 commit intomainfrom
fix/advance-state-for-ignored-command-records

Conversation

@infiniteregrets
Copy link
Copy Markdown
Member

@infiniteregrets infiniteregrets commented Mar 29, 2026

closes #235

Summary

  • When IgnoreCommandRecords is set, the streaming read loop skipped handleRecord entirely for command records, failing to advance nextSeq, recordsRead, and bytesRead.
  • On retry/reconnect, the session would re-request from the stale position, re-processing the same command records (infinite loop if stream is all command records).
  • Extracted state advancement into advanceState() and call it for ignored command records before continue, so the session position always advances.

Test plan

  • Verify go build ./... passes
  • Confirm that a streaming read with IgnoreCommandRecords: true correctly advances NextReadPosition() past command records
  • Confirm retries after disconnect do not re-fetch already-seen command records

🤖 Generated with Claude Code

When IgnoreCommandRecords is set, the loop skipped handleRecord
entirely for command records, failing to advance nextSeq and counters.
On retry, the session would re-request from the stale position,
re-processing the same command records. Extract state advancement into
advanceState and call it for ignored command records before continuing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@infiniteregrets infiniteregrets requested a review from a team as a code owner March 29, 2026 09:42
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 29, 2026

Greptile Summary

This PR fixes a real bug in the streaming read loop: when IgnoreCommandRecords is set, command records were silently skipped with continue but nextSeq, recordsRead, and bytesRead were never advanced. On a disconnect/retry the session would re-request from the stale position, potentially looping forever if the stream consisted entirely of command records.

The fix extracts the state-advancement block from handleRecord into a new advanceState helper and calls it for ignored command records before the continue. The refactoring is clean, the mutex usage is unchanged, and the ordering (state advanced after sending for data records, immediately for ignored records) is correct.

Key changes

  • New advanceState(record SequencedRecord) helper encapsulates the stateMu-guarded state update.
  • handleRecord now delegates its state update to advanceState (no behavioral change for data records).
  • Ignored command records now call advanceState before continue, ensuring nextSeq always advances.
  • As a positive side-effect, madeProgress in the retry loop (recordsRead > recordsBefore) is now true even when a batch contains only command records, preventing spurious consecutiveFailures increments.

Observations

  • recordsRead/bytesRead now include ignored command records, which correctly tracks server-side Count/Bytes budget for reconnect arithmetic, but can cause limitsReached() to stop a session before the caller receives Count data records if the stream has a dense prefix of command records. This is worth a clarifying comment.
  • The PR description's test-plan checklist is entirely unchecked and no new unit tests cover the reconnect-with-command-records path.

Confidence Score: 5/5

Safe to merge — the core fix is correct and all remaining findings are P2 style/documentation suggestions.

The fix correctly addresses a real infinite-reconnect bug by ensuring nextSeq is advanced for ignored command records. The refactoring is clean, mutex usage is unchanged, and the retry/reconnect arithmetic becomes more correct. The two remaining comments are P2: one is a suggestion to document the Count/Bytes semantics for ignored records, and one requests unit-test coverage for the regression path. Neither blocks merge.

No files require special attention; all concerns are in s2/read.go and are minor.

Important Files Changed

Filename Overview
s2/read.go Extracts state-advancement into advanceState() and calls it for command records when IgnoreCommandRecords=true, fixing an infinite-reconnect loop; core logic is correct with minor semantic implications for Count/Bytes limits worth documenting.

Sequence Diagram

sequenceDiagram
    participant R as streamReader.run
    participant RO as runOnce
    participant S as Server

    Note over R,S: IgnoreCommandRecords=true

    R->>S: GET /records?seq=0&count=10
    S-->>RO: batch [cmd@0, cmd@1, data@2, ...]
    loop for each record
        alt IsCommandRecord (old)
            Note over RO: continue (state NOT advanced)<br/>nextSeq stays at 0!
        else IsCommandRecord (new)
            RO->>RO: advanceState(record)<br/>nextSeq = seqNum+1
            Note over RO: continue
        else data record
            RO->>RO: handleRecord → advanceState(record)
        end
    end
    RO-->>R: disconnect error

    alt Old (buggy)
        R->>S: GET /records?seq=0&count=10 ♻️ infinite loop
    else New (fixed)
        R->>R: limitsReached() check
        R->>S: GET /records?seq=nextSeq&count=remaining ✅
    end
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: s2/read.go
Line: 555-564

Comment:
**`recordsRead`/`bytesRead` count toward `Count`/`Bytes` limits for ignored records**

`advanceState` increments `recordsRead` and `bytesRead` for command records that are never delivered to the caller's `Records()` channel. This has two downstream effects worth verifying are intentional:

1. **`limitsReached()` can fire before the caller receives `Count` data records.** If the user creates a session with `Count=10, IgnoreCommandRecords=true` and the stream has 10 command records followed by 10 data records, `limitsReached()` returns `true` after the first 10 and the session stops — delivering 0 records to the consumer.

2. **`buildAttemptOptions` computes `remaining = baseCount - recordsRead`.** Because `recordsRead` now includes command records, the reconnect request's `Count` parameter is reduced by the number of ignored command records. Since the server counts all records (including command records) toward its own `Count` limit, this is the correct server-side arithmetic — but it means a `Count=10` session may ultimately surface fewer than 10 data records to the application when command records are present.

This is likely the intended trade-off (correct server-side budget tracking beats the previous infinite-loop), but it changes observable behavior for anyone relying on `Count` meaning "N data records". A comment explaining the semantics (or a note in the `ReadOptions.Count` doc) would help future readers.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: s2/read.go
Line: 542-547

Comment:
**No unit test for the reconnect-with-command-records regression**

The PR description lists three test-plan items but none are checked, and the existing test file (`read_session_retry_test.go`) does not include a case that exercises the specific scenario being fixed: a stream reader with `IgnoreCommandRecords=true` that receives command-only (or command-leading) batches and then reconnects.

Without this test the regression could silently return. Consider adding a test along the lines of:
- Simulate a batch of command records followed by a disconnect.
- Assert `NextReadPosition()` advances past all command records.
- Assert a second run (or the retry path) does not re-request from the stale position.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: advance read session state for igno..." | Re-trigger Greptile

Comment on lines +555 to +564
func (r *streamReader) advanceState(record SequencedRecord) {
r.stateMu.Lock()
r.recordsRead++
r.bytesRead += MeteredSequencedRecordBytes(record)
r.lastRecordTime = time.Now()
r.nextSeq = record.SeqNum + 1
r.hasNextSeq = true
r.nextTS = record.Timestamp
r.stateMu.Unlock()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 recordsRead/bytesRead count toward Count/Bytes limits for ignored records

advanceState increments recordsRead and bytesRead for command records that are never delivered to the caller's Records() channel. This has two downstream effects worth verifying are intentional:

  1. limitsReached() can fire before the caller receives Count data records. If the user creates a session with Count=10, IgnoreCommandRecords=true and the stream has 10 command records followed by 10 data records, limitsReached() returns true after the first 10 and the session stops — delivering 0 records to the consumer.

  2. buildAttemptOptions computes remaining = baseCount - recordsRead. Because recordsRead now includes command records, the reconnect request's Count parameter is reduced by the number of ignored command records. Since the server counts all records (including command records) toward its own Count limit, this is the correct server-side arithmetic — but it means a Count=10 session may ultimately surface fewer than 10 data records to the application when command records are present.

This is likely the intended trade-off (correct server-side budget tracking beats the previous infinite-loop), but it changes observable behavior for anyone relying on Count meaning "N data records". A comment explaining the semantics (or a note in the ReadOptions.Count doc) would help future readers.

Prompt To Fix With AI
This is a comment left during a code review.
Path: s2/read.go
Line: 555-564

Comment:
**`recordsRead`/`bytesRead` count toward `Count`/`Bytes` limits for ignored records**

`advanceState` increments `recordsRead` and `bytesRead` for command records that are never delivered to the caller's `Records()` channel. This has two downstream effects worth verifying are intentional:

1. **`limitsReached()` can fire before the caller receives `Count` data records.** If the user creates a session with `Count=10, IgnoreCommandRecords=true` and the stream has 10 command records followed by 10 data records, `limitsReached()` returns `true` after the first 10 and the session stops — delivering 0 records to the consumer.

2. **`buildAttemptOptions` computes `remaining = baseCount - recordsRead`.** Because `recordsRead` now includes command records, the reconnect request's `Count` parameter is reduced by the number of ignored command records. Since the server counts all records (including command records) toward its own `Count` limit, this is the correct server-side arithmetic — but it means a `Count=10` session may ultimately surface fewer than 10 data records to the application when command records are present.

This is likely the intended trade-off (correct server-side budget tracking beats the previous infinite-loop), but it changes observable behavior for anyone relying on `Count` meaning "N data records". A comment explaining the semantics (or a note in the `ReadOptions.Count` doc) would help future readers.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 542 to 547
ignoreCommandRecords := r.baseOpts != nil && r.baseOpts.IgnoreCommandRecords
for _, record := range batch.Records {
if ignoreCommandRecords && record.IsCommandRecord() {
r.advanceState(record)
continue
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No unit test for the reconnect-with-command-records regression

The PR description lists three test-plan items but none are checked, and the existing test file (read_session_retry_test.go) does not include a case that exercises the specific scenario being fixed: a stream reader with IgnoreCommandRecords=true that receives command-only (or command-leading) batches and then reconnects.

Without this test the regression could silently return. Consider adding a test along the lines of:

  • Simulate a batch of command records followed by a disconnect.
  • Assert NextReadPosition() advances past all command records.
  • Assert a second run (or the retry path) does not re-request from the stale position.
Prompt To Fix With AI
This is a comment left during a code review.
Path: s2/read.go
Line: 542-547

Comment:
**No unit test for the reconnect-with-command-records regression**

The PR description lists three test-plan items but none are checked, and the existing test file (`read_session_retry_test.go`) does not include a case that exercises the specific scenario being fixed: a stream reader with `IgnoreCommandRecords=true` that receives command-only (or command-leading) batches and then reconnects.

Without this test the regression could silently return. Consider adding a test along the lines of:
- Simulate a batch of command records followed by a disconnect.
- Assert `NextReadPosition()` advances past all command records.
- Assert a second run (or the retry path) does not re-request from the stale position.

How can I resolve this? If you propose a fix, please make it concise.

@infiniteregrets infiniteregrets merged commit 2a65131 into main Mar 29, 2026
10 checks passed
@infiniteregrets infiniteregrets deleted the fix/advance-state-for-ignored-command-records branch March 29, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Detail Bug] Streaming reads with IgnoreCommandRecords can retry from the wrong position and loop on command-only streams

1 participant