Skip to content

fix : Prevent unbounded recorder memory growth in long simulation runs with JSONL streaming mode#183

Open
apfine wants to merge 3 commits intomesa:mainfrom
apfine:recorder-memory-fix
Open

fix : Prevent unbounded recorder memory growth in long simulation runs with JSONL streaming mode#183
apfine wants to merge 3 commits intomesa:mainfrom
apfine:recorder-memory-fix

Conversation

@apfine
Copy link

@apfine apfine commented Mar 10, 2026

I found that the current event recorder keeps all events in memory until the final dump, which can become a serious memory issue for long-running simulations.

While keeping the default recorder behavior unchanged, I added an opt-in streaming mode so users can record simulation outputs without letting recorder memory grow unbounded.

Summary

In this PR, I added a bounded-memory streaming mode to SimulationRecorder to address recorder memory growth during long-running simulations.

I kept the default behavior unchanged, but users can now opt into a JSONL-backed recording mode that streams raw events to disk instead of retaining the full event history in memory.

Problem

Before this change, SimulationRecorder stored every recorded event in self.events until save() was called. In long simulations or high-frequency event scenarios, that made recorder memory usage grow linearly with runtime.

What I Added

1. Optional streaming storage mode

I added support for:

  • storage_mode="memory": existing behavior, still the default
  • storage_mode="jsonl": stream events to a JSONL file as they are recorded

This lets users avoid keeping all raw events in memory for large runs.

2. Bounded in-memory event retention

I added a max_events_in_memory option for streaming mode so users can keep only a recent rolling window of events in RAM.

This is useful for:

  • lightweight debugging
  • recent-event inspection
  • reducing memory growth while still preserving the full event stream on disk

3. Summary tracking without depending on full in-memory history

I updated the recorder to maintain lightweight aggregate state incrementally, including:

  • total event count
  • unique agent IDs
  • per-agent summaries
  • event types
  • first/last timestamps
  • active steps

This keeps exported metadata, summaries, and stats correct even when the full event history is not retained in memory.

4. Streaming-compatible auto-save support

I updated record_model auto-save integration to use a recorder-level has_recorded_events check instead of relying on recorder.events, so the decorator still behaves correctly in streaming mode.

Why I Chose JSONL

I chose JSONL because it is:

  • append-friendly
  • easy to inspect manually
  • dependency-free
  • compatible with the existing JSON-based recording flow

This keeps the change small, practical, and low-risk while solving the memory-growth issue.

Backward Compatibility

I designed this change to be non-breaking:

  • the default behavior is still storage_mode="memory"
  • existing users do not need to change anything
  • JSON and pickle export behavior remains intact
  • the new mode is opt-in

Tests I Added

I added targeted tests for:

  • initialization in jsonl streaming mode
  • bounded in-memory retention in streaming mode
  • exporting full JSON output from streamed event history

Validation

I ran:

  • targeted recorder tests
  • full repository test suite
  • repo-wide Ruff checks

All of them passed.

Example Usage

recorder = SimulationRecorder(
    model=my_model,
    output_dir="recordings",
    storage_mode="jsonl",
    max_events_in_memory=100,
)

@apfine apfine changed the title fix : Prevent unbounded recorder memory growth with JSONL streaming mode fix : Prevent unbounded recorder memory growth in long simulation runs with JSONL streaming mode Mar 10, 2026
@apfine
Copy link
Author

apfine commented Mar 10, 2026

@BhoomiAgrawal12 @wang-boyu @EwoutH
Would be happy to hear your thoughts !!

@apfine
Copy link
Author

apfine commented Mar 10, 2026

@IlamaranMagesh what do you say ??

Would be happy to know your thoughts !!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this PR includes changes from #181. To make the review cleaner, would you mind changing the base of this PR to the branch for #181? That way, we only see the 'diff' of the new changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a silly mistake of mine !!

Thankyou for your review

I will revert back with changes

Copy link

@IlamaranMagesh IlamaranMagesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I've added some comments and additionally, I would wait for #181 to be closed by the maintainers. If this PR is not bounded to #181, you need to make changes on top of mesa:main not on your feature-branch.

@apfine apfine force-pushed the recorder-memory-fix branch from a54a396 to 90f2a58 Compare March 13, 2026 19:00
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 221e71b9-a38b-4e69-9678-847ebbbf8031

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Flake8 can be used to improve the quality of Python code reviews.

Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script.

To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root.

See Flake8 Documentation for more details.

@apfine
Copy link
Author

apfine commented Mar 13, 2026

Thank you, I've added some comments and additionally, I would wait for #181 to be closed by the maintainers. If this PR is not bounded to #181, you need to make changes on top of mesa:main not on your feature-branch.

I have separated it from the #181 so please review it separately , I would be grateful @IlamaranMagesh

@apfine apfine requested a review from IlamaranMagesh March 13, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants