sim-rs: Improve simulation performance, observability, and memory management by sandtreader · Pull Request #821 · input-output-hk/ouroboros-leios

sandtreader · 2026-03-18T16:02:20Z

Summary

This PR addresses three categories of issues encountered during long-running simulations:
a clock coordinator performance bottleneck, unbounded memory growth from accumulated
transactions, and gaps in end-of-run observability.

Performance: Clock coordinator optimisation

The ClockCoordinator previously received FinishTask events through its mpsc channel —
the same channel used for actor registration and time-wait requests. In simulations with
high CPU task throughput, this created a bottleneck: every task completion required the
coordinator to wake, match the event, and update its counter.

Fix: Replace the channel-based FinishTask path with an AtomicUsize counter
decremented directly by the completing actor, plus a tokio::sync::Notify to wake the
coordinator only when it's actually blocked waiting for tasks to drain. This eliminates
per-task-completion channel traffic entirely.

Also fixes a bug in ClockBarrier::wait() where ts == self.now() should have been
ts <= self.now(), causing waits for already-passed timestamps to block instead of
completing immediately.

Memory: Slot-based transaction pruning

During long simulations (hundreds of slots), every node's
txs: HashMap<TransactionId, TransactionView> grows without bound — nodes accumulate
every transaction they've ever seen. For 100-node simulations running 500+ slots, this
causes significant memory pressure.

Fix: Add a prune_old_txs() pass at the start of each slot that removes transactions
older than linear-tx-max-age-slots (configurable, default disabled). Transactions still
in the mempool are retained regardless of age. The Mempool struct gains a
HashSet<TransactionId> for O(1) membership checks.

The age threshold of 23 slots used in linear.yaml is derived from: vote stage (5) +
diffuse stage (5) + 3× header diffusion time (3) + buffer (10).

Observability improvements

Suppress IB stats for non-IB variants: LeiosVariant::has_ibs() gates IB-related
statistics, so variants like FullWithoutIbs and Linear no longer show misleading
zero-IB stats.
Report uncertified EBs: Counts and reports EBs whose votes fell below the
certification threshold.
Vote failure breakdown: End-of-run stats now show a per-reason breakdown of why
votes were not generated (e.g., InvalidSlot, ExtraIB, MissingIB).

Cleanup

Remove the vestigial -t/--timescale CLI flag — the simulator runs in virtual time and
this flag was unused.

New configuration

Parameter	Type	Default	Description
`linear-tx-max-age-slots`	`number \| null`	`null`	Max slot age before a TX is eligible for pruning. `null` disables pruning.

Test plan

cargo test passes
Run a long simulation (500+ slots) and verify memory usage stabilises
Verify end-of-run stats show vote failure breakdown and uncertified EB count
Verify IB stats are suppressed for Linear variant

🤖 Generated with Claude Code

The flag was parsed but never wired into the simulation. The sim uses virtual time and already runs as fast as possible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Transactions in the per-node txs HashMap were never cleaned up, causing unbounded memory growth proportional to node count * total transactions. Prune transactions that are both older than a configurable max age and no longer present in the mempool. The mempool check ensures TXs that could still be included in future EBs (if previous voting failed) are retained. New config option `linear-tx-max-age-slots` (default: null/disabled). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

VTBundleNotGenerated events were already emitted but silently ignored by EventMonitor. Count them by NoVoteReason and print a breakdown after the vote stats, making issues like LateRBHeader immediately visible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Linear Leios and Stracciatella variants don't produce IBs, so the end-of-run stats were printing empty/NaN IB lines. Gate all IB stats, IB-in-EB stats, IB latency, and IB network messages on a new LeiosVariant::has_ibs() method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

finish_task() previously sent a ClockEvent::FinishTask through the same mpsc channel as Wait/CancelWait events, creating contention. Now it does an atomic fetch_sub and signals a Notify, letting the coordinator wake without channel round-trips. Also handles the resulting race where time can advance before a Wait event arrives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sandtreader and others added 6 commits March 12, 2026 11:28

Remove dead -t/--timescale CLI flag

8aeaac6

The flag was parsed but never wired into the simulation. The sim uses virtual time and already runs as fast as possible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Report uncertified EBs that didn't reach vote threshold

5df7249

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sandtreader requested review from ch1bo March 18, 2026 16:28

Bump version to 1.4.1

9a4009c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sim-rs: Improve simulation performance, observability, and memory management#821

sim-rs: Improve simulation performance, observability, and memory management#821
sandtreader wants to merge 7 commits intomainfrom
prc/sim-performance

sandtreader commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sandtreader commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance: Clock coordinator optimisation

Memory: Slot-based transaction pruning

Observability improvements

Cleanup

New configuration

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sandtreader commented Mar 18, 2026 •

edited

Loading