Skip to content

Comments

Add --log-buffer for periodic buffer sync during scans#287

Draft
tbroadley wants to merge 7 commits intomeridianlabs-ai:mainfrom
tbroadley:thomas/sync-buffer
Draft

Add --log-buffer for periodic buffer sync during scans#287
tbroadley wants to merge 7 commits intomeridianlabs-ai:mainfrom
tbroadley:thomas/sync-buffer

Conversation

@tbroadley
Copy link
Contributor

@tbroadley tbroadley commented Feb 17, 2026

TODO think about races between flush and other threads that are actively writing scan results to disk and maybe writing to _summary.json and _errors.jsonl. Rafael left some notes about this below.

Summary

  • Adds --log-buffer N option to scout scan and scout scan resume that flushes intermediate results to the scan directory every N transcripts
  • Enables crash-resilient resume by reading already-synced transcript IDs from existing parquets on resume, even if the local buffer was lost
  • Runs scanner_table() compaction on a worker thread during flush to avoid blocking the event loop

Test plan

  • Existing tests pass (8/8 in test_recorder_buffer.py)
  • New tests: test_is_recorded_with_synced_ids, test_flush_writes_parquets_and_summary, test_read_synced_ids_from_parquets, test_resume_skips_synced_transcripts
  • ruff check and mypy pass on all modified files
  • Full test suite: 1102 passed (3 pre-existing failures unrelated to this change)

🤖 Generated with Claude Code

tbroadley and others added 3 commits February 16, 2026 16:21
Flush intermediate scan results to the destination directory every N
transcripts, providing crash-resilient resume and intermediate visibility.

- RecorderBuffer accepts synced_ids to skip already-synced transcripts
- FileRecorder.flush() compacts buffer parquets to scan dir
- FileRecorder.resume() reads synced transcript IDs from existing parquets
- --log-buffer / SCOUT_SCAN_LOG_BUFFER available on scan and scan resume CLI
- scanner_table() runs on a worker thread during flush to avoid blocking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
scanner_table() compaction now runs on the event loop thread instead of
a worker thread. This prevents races with concurrent record() calls that
write new per-transcript parquets to the buffer directory. The remaining
await points (fs.write_file) can still allow summary drift, but this is
benign for resumption since _read_synced_ids reads from parquets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@QuantumLove
Copy link

QuantumLove commented Feb 18, 2026

Hey @tbroadley I looked over your PR to get it review ready. Sadly I can't push to it

I am a bit too lazy to fork and open a PR against your PR but this is the only change I would make:

Validate --log-buffer requires positive integer                                                
                                                                                                                    
  Changes:                                                                                                          
  - src/inspect_scout/_cli/scan.py: Change --log-buffer from type=int to type=click.IntRange(min=1)                 
  - src/inspect_scout/_cli/scan_resume.py: Same change   
  - CHANGELOG.md - Feature entry
  - docs/workflow.qmd - User documentation                                                           
                                                                                                                    
  Why: --log-buffer 0 would cause flush on every transcript (poor performance). Using click.IntRange(min=1) rejects 
  zero/negative values at CLI level while still allowing the option to be omitted.    

The Race condition and self-inconsistent state analysis

Thought about with CC whether scanner_table() should run inline (current, PR description is outdated) or on a worker thread.

The Three Potential Inconsistencies

Race What happens Impact
1. Parquet misses transcripts Transcripts written during compaction not in parquet Re-scanned on resume
2. Summary inflated Summary count > parquet row count Downstream confusion
3. Errors orphaned Errors reference missing transcripts Downstream confusion

Analysis

Race 1 (inline eliminates this): Losing a few transcripts per flush is acceptable. The real value of --log-buffer is avoiding losing 99% of work on crash - not perfect consistency.

Races 2/3 (happen regardless due to await points): More concerning if downstream services can't reconcile discrepancies. But the window is tiny (~1-10ms during await points).

All races: Fixed on resume. _read_synced_ids() reads parquets as source of truth.

Decision

Keep inline execution. Reasons:

  1. Implementation is already correct, let's say 500ms per S3 write should be okay since you can tune the size of the buffer
  2. Lock/queue approaches are overkill for minimal benefit
  3. Resume reconciles all inconsistencies
  4. Current behavior is well-documented in code comments

Note

This is my first time looking at this codebase

@QuantumLove
Copy link

Test Report: --log-buffer Feature (PR #287)

Feature Under Test

The --log-buffer N CLI option enables periodic flushing of scan results to the destination directory every N transcripts, providing crash-resilient resume for long-running scans.

Test Environment

  • Transcripts: Local test data (examples/scanner/logs)
  • Scanner: reward_hacking (boolean LLM scanner)
  • Destination: S3 bucket (staging environment)
  • Model: openai/gpt-4o-mini
  • Buffer interval: --log-buffer 1 (flush every transcript)

Test Procedure

  1. Started scan with --log-buffer 1 writing to S3
  2. Monitored S3 - observed parquet file growing incrementally:
    - Initial: 89KB
    - After ~30 transcripts: 100KB
    - After ~50 transcripts: 103KB
  3. Killed process at 50% completion (109/216 transcripts)
  4. Ran scout scan resume pointing to the S3 scan directory
  5. Verified resume started exactly at 50% (109/216), skipping already-synced transcripts

Findings

┌──────────────────────────────────────────┬─────────┐
│ Test Case │ Result │
├──────────────────────────────────────────┼─────────┤
│ Parquets flush to S3 incrementally │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Summary/errors sync to S3 │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Resume reads synced IDs from parquets │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Resume skips already-scanned transcripts │ ✅ Pass │
├──────────────────────────────────────────┼─────────┤
│ Resume continues from interruption point │ ✅ Pass │
└──────────────────────────────────────────┴─────────┘

Conclusion

The --log-buffer feature works as designed. Crash-resilient resume from S3 storage is functional - interrupted scans can be resumed without re-processing already-flushed transcripts.

@QuantumLove
Copy link

I also did this with more data and it correctly flushed (does flushed have the right connotation here?) multiple parquet files

tbroadley and others added 4 commits February 18, 2026 16:35
- Save log_buffer to ScanOptions so it persists in _scan.json and is
  automatically restored on resume (addresses tbroadley review)
- Change --log-buffer from type=int to click.IntRange(min=1) to reject
  zero/negative values at CLI level (addresses QuantumLove suggestion)
- Update docstrings per review suggestions
- Regenerate openapi.json and generated.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When resuming a scan on a different machine (or after disk loss), the
local buffer is empty. Without seeding, scanner_table() compaction
would overwrite remote parquets with only newly-scanned results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants