fix : Prevent unbounded recorder memory growth in long simulation runs with JSONL streaming mode#183
fix : Prevent unbounded recorder memory growth in long simulation runs with JSONL streaming mode#183
Conversation
|
@BhoomiAgrawal12 @wang-boyu @EwoutH |
|
@IlamaranMagesh what do you say ?? Would be happy to know your thoughts !! |
There was a problem hiding this comment.
Its a silly mistake of mine !!
Thankyou for your review
I will revert back with changes
a54a396 to
90f2a58
Compare
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip Flake8 can be used to improve the quality of Python code reviews.Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script. To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root. See Flake8 Documentation for more details. |
I have separated it from the #181 so please review it separately , I would be grateful @IlamaranMagesh |
I found that the current event recorder keeps all events in memory until the final dump, which can become a serious memory issue for long-running simulations.
While keeping the default recorder behavior unchanged, I added an opt-in streaming mode so users can record simulation outputs without letting recorder memory grow unbounded.
Summary
In this PR, I added a bounded-memory streaming mode to
SimulationRecorderto address recorder memory growth during long-running simulations.I kept the default behavior unchanged, but users can now opt into a JSONL-backed recording mode that streams raw events to disk instead of retaining the full event history in memory.
Problem
Before this change,
SimulationRecorderstored every recorded event inself.eventsuntilsave()was called. In long simulations or high-frequency event scenarios, that made recorder memory usage grow linearly with runtime.What I Added
1. Optional streaming storage mode
I added support for:
storage_mode="memory": existing behavior, still the defaultstorage_mode="jsonl": stream events to a JSONL file as they are recordedThis lets users avoid keeping all raw events in memory for large runs.
2. Bounded in-memory event retention
I added a
max_events_in_memoryoption for streaming mode so users can keep only a recent rolling window of events in RAM.This is useful for:
3. Summary tracking without depending on full in-memory history
I updated the recorder to maintain lightweight aggregate state incrementally, including:
This keeps exported metadata, summaries, and stats correct even when the full event history is not retained in memory.
4. Streaming-compatible auto-save support
I updated
record_modelauto-save integration to use a recorder-levelhas_recorded_eventscheck instead of relying onrecorder.events, so the decorator still behaves correctly in streaming mode.Why I Chose JSONL
I chose JSONL because it is:
This keeps the change small, practical, and low-risk while solving the memory-growth issue.
Backward Compatibility
I designed this change to be non-breaking:
storage_mode="memory"Tests I Added
I added targeted tests for:
jsonlstreaming modeValidation
I ran:
All of them passed.
Example Usage