Skip to content

fix: skip oversized JSONL lines instead of failing the session#20

Merged
wesm merged 3 commits intomainfrom
large-buffers
Feb 24, 2026
Merged

fix: skip oversized JSONL lines instead of failing the session#20
wesm merged 3 commits intomainfrom
large-buffers

Conversation

@wesm
Copy link
Copy Markdown
Owner

@wesm wesm commented Feb 24, 2026

Summary

  • Replace bufio.Scanner with a custom lineReader that uses bufio.Reader.ReadLine() to skip lines exceeding 64MB instead of aborting the entire session parse
  • Propagate I/O errors through lineReader.Err() (matching bufio.Scanner pattern) rather than silently swallowing them
  • Replace all three scan sites (Claude parser, Claude hints extractor, Codex parser)

Closes #12

Test plan

  • TestLineReader — normal lines, oversized skip, all oversized, empty input, blank lines, no trailing newline, exact limit, one over limit
  • TestLineReaderIOError — custom io.Reader that yields data then returns a non-EOF error; verifies lines are returned and Err() exposes the I/O error
  • TestParseCodexSessionOversizedLineSkipped — integration test placing an oversized line between normal JSONL entries
  • All Go tests pass (CGO_ENABLED=1 go test -tags fts5 ./...)

🤖 Generated with Claude Code

Replace bufio.Scanner with a lineReader that silently skips lines
exceeding 64MB and continues parsing. Previously, a single
oversized tool_result line (e.g. 22MB command output) caused the
entire session to be dropped with "bufio.Scanner: token too long".

The lineReader starts with a 64KB buffer and grows on demand.
Oversized lines are skipped — the session appears with a gap
but the remaining messages are preserved.

Closes #12

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Feb 24, 2026

roborev: Combined Review (608c5f4)

Summary: This commit introduces a custom lineReader to handle oversized JSONL lines, but introduces significant regressions related to silent I/O error swallowing and untracked data loss
.

High

1. Silent swallowing of non-EOF read errors

  • Locations: internal/parser/linereader.go:30-33, internal/parser/claude.go:47-48, internal/parser/codex.go:5 09-510
  • Description: The next() function in lineReader silently ignores all errors returned by reading operations, mapping any failure to a clean end-of-file condition (return "", false). This means underlying I/O errors (such as disk failures or dropped network shares) are
    completely hidden, causing partial or truncated data to be ingested as if the file was parsed successfully. This is a regression from the previous bufio.Scanner implementation, which allowed checking scanner.Err().
  • Remediation: Update next() to propagate errors to the caller (e.g.,
    func (lr *lineReader) next() (string, bool, error)), or add an Err() error method to lineReader to capture and surface non-EOF errors. Ensure that the caller parsers check for this error and fail the parsing process instead of silently returning partial results.

Medium

2. Silent discarding of oversized lines

  • Locations: internal/parser/linereader.go:26, internal/parser/linereader.go:65-71, internal/parser/parser_test.go:1432

  • Description: Lines exceeding the maxLineSize limit are silently discarded. Dropping data without telemetry or logging makes it difficult to debug missing session data or identify systemic issues with log generation. Furthermore, if the system relies on these parsed logs for tracking agent behavior or security auditing, this creates an evasion vector:
    an attacker could intentionally pad sensitive actions to exceed the size limit, effectively hiding the action from the system.

  • Remediation: Expose visibility into dropped lines by tracking skipped-line count, offset, and size. At a minimum, log a warning or emit a metric when an oversized line is skipped. For security-
    critical ingestion, consider adding a configurable strict mode that fails the parse entirely rather than quietly dropping data.


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

lineReader.next() was treating all read errors as EOF, silently
discarding I/O failures. Add err field and Err() method (matching
bufio.Scanner pattern), restore post-loop error checks in both
parsers, and add TestLineReaderIOError.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Feb 24, 2026

roborev: Combined Review (fdf516d)

Summary Verdict: The review identified 2 medium severity issues related to error handling and silent data skipping, while the core line-reader logic appears secure against memory exhaustion.

Medium

  1. Oversized lines are silently skipped
    • Files: internal/parser/linereader.go :77, internal/parser/linereader.go:80, internal/parser/codex.go:507, internal/parser/claude.go:46
    • Issue: Oversized lines are silently skipped and parsing continues successfully.
  • Risk: If session JSONL files are attacker-controlled (or used for audit/forensics), a malicious actor can place critical records in lines exceeding maxLineSize to evade detection or alter downstream security analysis without triggering failure.
    • Remediation: Fail closed by default on oversized lines,
      or return structured warnings/errors (line numbers + skipped count) that callers must handle explicitly. Consider a strict mode for security-sensitive consumers.
  1. I/O errors can be silently dropped in Claude project-hint extraction
    • Files: internal/parser/claude.go: 169
    • Issue: The ExtractClaudeProjectHints function uses lineReader but does not check lr.Err() after the lr.next() loop. If a read fails, it can return partial/empty hints without signaling an I/O failure.

Remediation: Add a post-loop lr.Err() check and propagate or log it. If the function currently cannot return an error, update its signature or add explicit error reporting.


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

The hints extractor used lineReader but didn't check lr.Err()
after the loop, silently dropping I/O failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Feb 24, 2026

roborev: Combined Review (b8180c0)

The PR introduces a medium-severity issue where oversized lines are silently dropped, potentially masking data loss or malicious activity.

Medium

Silent data loss when oversized lines are skipped
Files:

Issue: lineReader intentionally drops oversized JSONL lines (fail-open) with
no signal to callers. Sessions can appear "successful" while missing messages. If an attacker can influence session files, they can hide specific records by making them exceed maxLineSize, causing downstream analysis, audit, and security checks to miss those events without an explicit parse failure.

Remediation: Track the
skipped-line count in lineReader and surface it as an error or warning to callers (e.g., log once per file, or return in parser metadata). Consider adding a strict mode that fails on any skipped line for security/audit use cases, and optionally log sanitized line numbers for skipped records.


Synthesized from 4 reviews (agents: codex, gemini | types: default, security)

@wesm wesm merged commit 0af2152 into main Feb 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sync fails on sessions with JSONL lines exceeding 20MB scanner buffer limit

1 participant