Add --log-buffer for periodic buffer sync during scans by tbroadley · Pull Request #287 · meridianlabs-ai/inspect_scout

tbroadley · 2026-02-17T00:21:35Z

~~TODO think about races between flush and other threads that are actively writing scan results to disk and maybe writing to _summary.json and _errors.jsonl.~~ Rafael left some notes about this below.

Summary

Adds --log-buffer N option to scout scan and scout scan resume that flushes intermediate results to the scan directory every N transcripts
Enables crash-resilient resume by reading already-synced transcript IDs from existing parquets on resume, even if the local buffer was lost
Runs scanner_table() compaction on a worker thread during flush to avoid blocking the event loop

Test plan

Existing tests pass (8/8 in test_recorder_buffer.py)
New tests: test_is_recorded_with_synced_ids, test_flush_writes_parquets_and_summary, test_read_synced_ids_from_parquets, test_resume_skips_synced_transcripts
ruff check and mypy pass on all modified files
Full test suite: 1102 passed (3 pre-existing failures unrelated to this change)

🤖 Generated with Claude Code

Flush intermediate scan results to the destination directory every N transcripts, providing crash-resilient resume and intermediate visibility. - RecorderBuffer accepts synced_ids to skip already-synced transcripts - FileRecorder.flush() compacts buffer parquets to scan dir - FileRecorder.resume() reads synced transcript IDs from existing parquets - --log-buffer / SCOUT_SCAN_LOG_BUFFER available on scan and scan resume CLI - scanner_table() runs on a worker thread during flush to avoid blocking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

scanner_table() compaction now runs on the event loop thread instead of a worker thread. This prevents races with concurrent record() calls that write new per-transcript parquets to the buffer directory. The remaining await points (fs.write_file) can still allow summary drift, but this is benign for resumption since _read_synced_ids reads from parquets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

QuantumLove · 2026-02-18T10:51:57Z

Hey @tbroadley I looked over your PR to get it review ready. Sadly I can't push to it

I am a bit too lazy to fork and open a PR against your PR but this is the only change I would make:

Validate --log-buffer requires positive integer                                                
                                                                                                                    
  Changes:                                                                                                          
  - src/inspect_scout/_cli/scan.py: Change --log-buffer from type=int to type=click.IntRange(min=1)                 
  - src/inspect_scout/_cli/scan_resume.py: Same change   
  - CHANGELOG.md - Feature entry
  - docs/workflow.qmd - User documentation                                                           
                                                                                                                    
  Why: --log-buffer 0 would cause flush on every transcript (poor performance). Using click.IntRange(min=1) rejects 
  zero/negative values at CLI level while still allowing the option to be omitted.

The Race condition and self-inconsistent state analysis

Thought about with CC whether scanner_table() should run inline (current, PR description is outdated) or on a worker thread.

The Three Potential Inconsistencies

Race	What happens	Impact
1. Parquet misses transcripts	Transcripts written during compaction not in parquet	Re-scanned on resume
2. Summary inflated	Summary count > parquet row count	Downstream confusion
3. Errors orphaned	Errors reference missing transcripts	Downstream confusion

Analysis

Race 1 (inline eliminates this): Losing a few transcripts per flush is acceptable. The real value of --log-buffer is avoiding losing 99% of work on crash - not perfect consistency.

Races 2/3 (happen regardless due to await points): More concerning if downstream services can't reconcile discrepancies. But the window is tiny (~1-10ms during await points).

All races: Fixed on resume. _read_synced_ids() reads parquets as source of truth.

Decision

Keep inline execution. Reasons:

Implementation is already correct, let's say 500ms per S3 write should be okay since you can tune the size of the buffer
Lock/queue approaches are overkill for minimal benefit
Resume reconciles all inconsistencies
Current behavior is well-documented in code comments

Note

This is my first time looking at this codebase

QuantumLove · 2026-02-18T14:35:10Z

Test Report: --log-buffer Feature (PR #287)

Feature Under Test

The --log-buffer N CLI option enables periodic flushing of scan results to the destination directory every N transcripts, providing crash-resilient resume for long-running scans.

Test Environment

Transcripts: Local test data (examples/scanner/logs)
Scanner: reward_hacking (boolean LLM scanner)
Destination: S3 bucket (staging environment)
Model: openai/gpt-4o-mini
Buffer interval: --log-buffer 1 (flush every transcript)

Test Procedure

Started scan with --log-buffer 1 writing to S3
Monitored S3 - observed parquet file growing incrementally:
- Initial: 89KB
- After ~30 transcripts: 100KB
- After ~50 transcripts: 103KB
Killed process at 50% completion (109/216 transcripts)
Ran scout scan resume pointing to the S3 scan directory
Verified resume started exactly at 50% (109/216), skipping already-synced transcripts

Findings

┌──────────────────────────────────────────┬─────────┐
│ Test Case │ Result │
├──────────────────────────────────────────┼─────────┤
│ Parquets flush to S3 incrementally │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Summary/errors sync to S3 │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Resume reads synced IDs from parquets │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Resume skips already-scanned transcripts │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Resume continues from interruption point │ ✅ Pass │
└──────────────────────────────────────────┴─────────┘

Conclusion

The --log-buffer feature works as designed. Crash-resilient resume from S3 storage is functional - interrupted scans can be resumed without re-processing already-flushed transcripts.

QuantumLove · 2026-02-18T14:36:49Z

I also did this with more data and it correctly flushed (does flushed have the right connotation here?) multiple parquet files

src/inspect_scout/_cli/scan_resume.py

src/inspect_scout/_scan.py

- Save log_buffer to ScanOptions so it persists in _scan.json and is automatically restored on resume (addresses tbroadley review) - Change --log-buffer from type=int to click.IntRange(min=1) to reject zero/negative values at CLI level (addresses QuantumLove suggestion) - Update docstrings per review suggestions - Regenerate openapi.json and generated.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When resuming a scan on a different machine (or after disk loss), the local buffer is empty. Without seeding, scanner_table() compaction would overwrite remote parquets with only newly-scanned results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tbroadley and others added 3 commits February 16, 2026 16:21

Format test file with ruff

6f7e4da

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tbroadley commented Feb 18, 2026

View reviewed changes

src/inspect_scout/_cli/scan_resume.py Show resolved Hide resolved

src/inspect_scout/_scan.py Outdated Show resolved Hide resolved

src/inspect_scout/_scan.py Outdated Show resolved Hide resolved

tbroadley and others added 4 commits February 18, 2026 16:35

Format _scan.py with ruff

42374b7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update openapi.json and generated.ts for latest inspect_ai

7cdf214

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add --log-buffer for periodic buffer sync during scans#287

Add --log-buffer for periodic buffer sync during scans#287
tbroadley wants to merge 7 commits intomeridianlabs-ai:mainfrom
tbroadley:thomas/sync-buffer

tbroadley commented Feb 17, 2026 •

edited

Loading

Uh oh!

QuantumLove commented Feb 18, 2026 •

edited

Loading

Uh oh!

QuantumLove commented Feb 18, 2026

Uh oh!

QuantumLove commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

tbroadley commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

QuantumLove commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Race condition and self-inconsistent state analysis

The Three Potential Inconsistencies

Analysis

Decision

Note

Uh oh!

QuantumLove commented Feb 18, 2026

Uh oh!

QuantumLove commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tbroadley commented Feb 17, 2026 •

edited

Loading

QuantumLove commented Feb 18, 2026 •

edited

Loading