Skip to content

quic: fix crash scenario by handling stale pkt_meta in stream retry path, add unit tests#8923

Open
cmoyes-jump wants to merge 3 commits intomainfrom
cmoyes/fix-quic-actor-crash
Open

quic: fix crash scenario by handling stale pkt_meta in stream retry path, add unit tests#8923
cmoyes-jump wants to merge 3 commits intomainfrom
cmoyes/fix-quic-actor-crash

Conversation

@cmoyes-jump
Copy link
Contributor

@cmoyes-jump cmoyes-jump commented Mar 16, 2026

Our QUIC protocol implementation keeps a ledger of every packet it sent (pkt_meta). When a peer ACK's a packet, its ledger entry is crossed off. When a packet is presumed lost, the ledger entry triggers a re-transmit.

RFC 9002 Section 6 "Loss Detection" Subsection 1 "Acknowledgment-Based Detection" states that "a packet is declared lost if ... the packet was sent kPacketThreshold packets before an acknowledged packet (Section 6.1.1), or it was sent long enough in the past (Section 6.1.2)."

A single stream's data can span multiple packets. Suppose e.g. packet 5 carries bytes [0, 100) and that packet 8 carries bytes [100, 200). Each gets its own ledger entry. With kPacketThreshold=3, this means an ACK for e.g. for packets 9 through 11 force-declares packet 5 lost (three packets above it were ACK'ed), but packet 8 is not declared lost because it's only one below the ACK'ed range. Note that QUIC does not require contiguous packets to contain contiguous byte ranges. For example, packets 6-7 could be another ACK, a PING, or data for a different stream on the same connection.

Here's an example sequence that currently crashes with FD_LOG_CRIT due to the assertion on ( !( (stream->tx_sent < stream->tx_buf.head) | (stream->state & FD_QUIC_STREAM_STATE_TX_FIN) ) ):

  1. Packet 5's ledger entry is force-retried. The data is re-sent in new packets 12-13, and packet 5's entry is ripped off the ledger.
  2. Packets 12-13 are ACK'ed. The stream's "delivered" watermark (unacked_low) advances to 200. Every byte is now confirmed delivered.
  3. Packet 8's ledger entry is still on the ledger. It was never ACK'ed and never force-declared lost. When it eventually times out, the retry code looks at it and panics because the stream has nothing left to send, but we have a ledger entry that says it does.

Fix: Before the panic, check whether the ledger entry is simply stale (i.e. the delivery watermark already covers its byte range). If so, skip the retry, and let the entry be cleaned up normally. With this fix, the existing FD_FD_LOG_CRIT assertion can still be preserved below this newly-added check as a safety net for any genuinely unexpected state.

@cmoyes-jump cmoyes-jump self-assigned this Mar 16, 2026
@cmoyes-jump cmoyes-jump marked this pull request as ready for review March 16, 2026 21:13
Copilot AI review requested due to automatic review settings March 16, 2026 21:13
@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.060219 s 0.06021 s -0.015%
backtest mainnet-368528500-perf snapshot load 3.178 s 2.503 s -21.240%
backtest mainnet-368528500-perf total elapsed 60.218505 s 60.209728 s -0.015%
firedancer mem usage with mainnet.toml 985.41 GiB 985.41 GiB 0.000%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a QUIC crash in fd_quic_pkt_meta_retry when ACK-driven loss detection leaves behind a stale pkt_meta for stream data that has already been fully acknowledged via retransmission. It adds a targeted unit-test suite to exercise stream ACK/loss/retry lifecycle behavior and prevent regressions.

Changes:

  • Add a stale-pkt_meta guard in the stream retry path to skip retries when the stream has no remaining data/FIN to send.
  • Introduce a new unit test test_quic_pkt_meta_lifecycle covering stream lifecycle, ACK ordering/gaps, retry behavior, and the stale-pkt_meta regression scenario.
  • Register the new unit test in the QUIC tests makefile.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/waltz/quic/fd_quic.c Adds a stale-pkt_meta skip check in the stream retry path to prevent a crash on completed streams.
src/waltz/quic/tests/test_quic_pkt_meta_lifecycle.c New lifecycle/regression unit tests for pkt_meta handling, including the stale-pkt_meta crash reproduction.
src/waltz/quic/tests/Local.mk Adds the new unit test target to the build/test list.

You can also share your feedback on Copilot code review. Take the survey.

@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.083002 s 0.083023 s 0.025%
backtest mainnet-368528500-perf snapshot load 3.825 s 3.389 s -11.399%
backtest mainnet-368528500-perf total elapsed 83.001841 s 83.022605 s 0.025%
firedancer mem usage with mainnet.toml 985.41 GiB 985.41 GiB 0.000%

@cmoyes-jump cmoyes-jump marked this pull request as draft March 16, 2026 22:26
@cmoyes-jump cmoyes-jump marked this pull request as ready for review March 17, 2026 21:31
Copilot AI review requested due to automatic review settings March 17, 2026 21:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a QUIC retransmission crash by treating certain expired pkt_meta entries as stale when the associated stream data has already been fully acknowledged, and adds a new unit test intended to prevent regressions in pkt_meta lifecycle handling.

Changes:

  • Add a stale-pkt_meta guard in fd_quic_pkt_meta_retry() to skip retrying stream data that is already fully ACKed.
  • Introduce a new test_quic_pkt_meta_lifecycle unit test covering stream send/ACK/retry flows and the stale-pkt_meta scenario.
  • Register the new unit test target in the QUIC tests Local.mk.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/waltz/quic/fd_quic.c Skips retry work for stale stream pkt_meta entries to avoid completed-stream assertions/crashes.
src/waltz/quic/tests/test_quic_pkt_meta_lifecycle.c Adds lifecycle + regression coverage for ACK/loss/retry interactions around pkt_meta.
src/waltz/quic/tests/Local.mk Adds the new unit test build target.

You can also share your feedback on Copilot code review. Take the survey.

@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.121277 s 0.121138 s -0.115%
backtest mainnet-406545575-perf snapshot load 5.224 s 3.528 s -32.466%
backtest mainnet-406545575-perf total elapsed 121.276869 s 121.138305 s -0.114%
firedancer mem usage with mainnet.toml 1105.43 GiB 1105.43 GiB 0.000%

Copilot AI review requested due to automatic review settings March 19, 2026 18:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a QUIC retransmit crash by detecting and skipping stale pkt_meta entries during stream retry, and expands/refactors multiple runtime components (program cache, leaders/stakes plumbing, shred validation, etc.) to align with newer protocol/runtime behaviors.

Changes:

  • QUIC: Skip retries for stale stream pkt_meta whose byte range is already covered by the stream delivery watermark.
  • Runtime: Large refactor of program cache APIs/record handling (introducing explicit record closing), plus substantial updates across system/stake/vote plumbing and leader/stake weight handling.
  • Tests/tooling: Add/adjust unit tests, fuzz target(s), and CI/backtest scripts/configuration.

Reviewed changes

Copilot reviewed 150 out of 195 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/waltz/quic/tests/Local.mk Adds new QUIC unit test target for pkt_meta lifecycle
src/waltz/quic/fd_quic.c Fixes stale pkt_meta retry crash; tightens free list validation
src/util/wksp/fd_wksp_user.c Adds TLS flag to silence OOM warnings (conditionally)
src/util/tmpl/fd_map_chain_para.c Adds MAP_CUSTOM_CHAIN option to customize chain header generation
src/flamenco/vm/syscall/test_cpi_shared_data_addr.c Updates progcache initialization/join APIs used by tests
src/flamenco/types/fd_types_custom.h Adds dummy pubkey constant for compressed stake weights
src/flamenco/stakes/fd_vote_stakes.c Minor formatting change
src/flamenco/stakes/fd_stakes.h Adds stakes APIs and constants; removes old prototypes
src/flamenco/stakes/Local.mk Builds fd_stakes only when FD_HAS_DOUBLE
src/flamenco/runtime/tests/test_accounts_resize_delta.c Updates progcache + runtime stack allocation APIs in tests
src/flamenco/runtime/tests/run_ledger_backtest.sh Changes epoch timing reporting
src/flamenco/runtime/tests/fd_txn_harness.c Updates progcache usage/compaction to new APIs
src/flamenco/runtime/tests/fd_solfuzz.h Removes progcache admin from runner struct
src/flamenco/runtime/tests/fd_instr_harness.c Updates progcache usage + ALUT/v4 loader ownership plumbing
src/flamenco/runtime/tests/fd_dump_pb.c Switches to fd_alut.h; fixes stake delegation iteration constness; adjusts epoch credits source
src/flamenco/runtime/test_runtime_alut.c Switches ALUT include to fd_alut.h
src/flamenco/runtime/test_instr_acct_bounds.c Updates progcache lifecycle in tests
src/flamenco/runtime/test_hashes.c Adds unit test for fd_hashes_apply_hard_forks
src/flamenco/runtime/test_deprecate_rent_exemption_threshold.c Updates stake delegations API usage; updates runtime stack allocation
src/flamenco/runtime/test_bundle_exec.c Updates runtime stack allocation APIs in test
src/flamenco/runtime/program/vote/fd_vote_state_v4.c Uses bank rent query instead of sysvar cache rent read
src/flamenco/runtime/program/vote/fd_vote_state_v3.c Uses bank rent query instead of sysvar cache rent read
src/flamenco/runtime/program/test_vote_program.c Updates progcache + runtime stack allocation in tests
src/flamenco/runtime/program/fd_vote_program.h Removes fd_vote_convert_to_current from public header
src/flamenco/runtime/program/fd_vote_program.c Removes fd_vote_convert_to_current implementation
src/flamenco/runtime/program/fd_system_program.h Adds create-account-allow-prefund handler decl; reflows prototypes
src/flamenco/runtime/program/fd_system_program.c Implements create_account_allow_prefund and dispatch case
src/flamenco/runtime/program/fd_stake_program.h Deletes old stake program header
src/flamenco/runtime/program/fd_loader_v4_program.c Updates progcache pull API and ensures record close
src/flamenco/runtime/program/fd_config_program.h Deletes old config program header
src/flamenco/runtime/program/fd_bpf_loader_program.c Updates progcache record accessors and ensures record close
src/flamenco/runtime/program/fd_address_lookup_table_program.h Deletes old ALUT program header
src/flamenco/runtime/program/Local.mk Removes config/stake/alut program build; adds new unit test target for allow-prefund
src/flamenco/runtime/fd_system_ids_pp.h Removes some buffer IDs no longer referenced
src/flamenco/runtime/fd_system_ids.h Removes externs for deprecated buffer addresses
src/flamenco/runtime/fd_system_ids.c Removes deprecated buffer constants; keeps remaining IDs
src/flamenco/runtime/fd_runtime_const.h Updates runtime vote/stake account bounds and documentation
src/flamenco/runtime/fd_runtime.h Removes stake_program scratch union from runtime struct
src/flamenco/runtime/fd_hashes.h Adds public API doc + prototype for hard fork mixing
src/flamenco/runtime/fd_hashes.c Implements fd_hashes_apply_hard_forks
src/flamenco/runtime/fd_executor.c Removes old native programs; updates perfect-hash table; adds bundle duplicate-msg-hash check
src/flamenco/runtime/fd_cost_tracker.c Extends account-data allocation size costing for new system instruction
src/flamenco/runtime/fd_alut_interp.h Moves ALUT helper logic into interpreter header (and removes old dependency)
src/flamenco/rewards/fd_stake_rewards.c Replaces index pool impl with linear array allocation and counter
src/flamenco/progcache/test_progcache_common.c Updates progcache test env to new shmem + join APIs
src/flamenco/progcache/fd_progcache_user.h Major progcache API changes: join/leave signature, record handle semantics, metrics, record close
src/flamenco/progcache/fd_progcache_rec.h Refactors record layout; adds lock; moves value storage to wksp gaddr; refactors accessors
src/flamenco/progcache/fd_progcache_rec.c Implements new value alloc/free and record load/noexec helpers
src/flamenco/progcache/fd_progcache_admin.h Refactors admin API to operate on fd_progcache_join_t; adds metrics + wksp metrics update
src/flamenco/progcache/Local.mk Adds new progcache core object; removes racesan unit test target
src/flamenco/leaders/test_leaders.c Adds tests for leader membership bitset
src/flamenco/leaders/fd_multi_epoch_leaders.h Updates stake weight storage bound for compressed stake weights
src/flamenco/leaders/fd_multi_epoch_leaders.c Validates against new compressed stake weight bound
src/flamenco/leaders/fd_leaders_base.h Adjusts stake/vote bounds and documents compressed stake weights scheme
src/flamenco/leaders/fd_leaders.h Adds leader membership bitset APIs + footprint update
src/flamenco/leaders/fd_leaders.c Builds and stores leader membership bitset
src/flamenco/gossip/fd_gossip.h Updates stakes update API to consume fd_vote_stake_weight_t
src/flamenco/gossip/fd_gossip.c Refactors stake map/pool sizing; dedups stakes keyed on identity; skips dummy weights
src/flamenco/genesis/fd_genesis_create.c Switches stake include to new stakes header
src/flamenco/features/fd_features_generated.h Adds new feature flag: create_account_allow_prefund; updates set size/id
src/flamenco/features/fd_features_generated.c Adds new feature ID; updates an existing feature ID and marks some cleaned_up
src/flamenco/fd_rwlock.h Adds tryread/trywrite/demote; fixes fd_rwlock_new return; adds fences/volatile stores
src/flamenco/accdb/fd_accdb_lineage.h Treats root XID as always present in lineage queries
src/flamenco/accdb/fd_accdb_lineage.c Removes implicit root insertion into fork list
src/flamenco/accdb/fd_accdb_admin_v1.c Fixes typo; moves child head/tail reset after clearing list
src/discoh/bank/fd_bank_tile.c Removes temporary “simple vote” compute override logic
src/discof/txsend/fd_txsend_tile.c Updates epoch stake count bound check
src/discof/tower/fd_tower_tile.c Replaces stake_ci usage with multi-epoch leaders
src/discof/rpc/fd_rpc_tile.c Fixes JSON-RPC error code sign
src/discof/restore/utils/fd_ssmsg.h Hard-codes snapshot manifest bounds with FIXMEs for larger support
src/discof/restore/utils/fd_ssload.c Uses root stake delegations update; bounds epoch credits loading
src/discof/restore/fd_snapin_tile_funk.c Updates OOM error message to point to accounts sizing
src/discof/restore/fd_snapin_tile.c Applies hard fork mixing into computed bank hash
src/discof/resolv/fd_resolv_tile.c Switches to fd_alut.h include
src/discof/replay/fd_sched.h Adds fd_sched_metrics_write prototype
src/discof/repair/fuzz_repair_serde.c Adds fuzz target for ping serde roundtrip
src/discof/repair/fd_repair.h Exposes ping de/ser APIs
src/discof/repair/fd_repair.c Implements ping de/ser
src/discof/repair/fd_policy.h Updates high-level policy docs; changes peer ping state field
src/discof/repair/fd_policy.c Updates peer init for ping state field
src/discof/repair/Local.mk Adds fuzz test target for repair serde
src/discof/poh/fd_poh.h Clarifies microblock limit semantics; removes slot_done comment/field
src/discof/poh/fd_poh.c Adjusts done_packing accounting to reserve phantom “microblock”
src/discof/gossip/fd_gossip_tile.h Removes stake conversion scratch; hard-codes snapshot bounds with FIXMEs
src/discof/gossip/fd_gossip_tile.c Uses vote stake weights directly; adjusts snapshot bounds assertion; removes prior workspace alloc
src/discof/forest/fd_forest.h Updates documentation for new repair policy/forest iteration behavior
src/discof/forest/fd_forest.c Removes some FD_TEST assertions; replaces one with comment
src/discof/fd_accdb_topo.c Updates progcache topo init to new join signature (no locks obj)
src/discof/execrp/fd_execrp_tile.c Updates progcache metrics; adds TLS OOM silencing
src/discof/execle/fd_execle_tile.c Fixes typo in comment
src/disco/topo/fd_topob.c Resets agave affinity count during auto-layout
src/disco/shred/test_stake_ci.c Updates test bounds for stake/validator counts
src/disco/shred/test_shred_dest_conformance.c Increases stake_ci buffers to 64 MiB
src/disco/shred/test_shred_dest.c Minor assignment formatting
src/disco/shred/test_fec_resolver.c Fixes parity shred test setup for fec_set_idx/idx
src/disco/shred/fd_stake_ci.h Updates MAX footprint and stake weight array sizing bound
src/disco/shred/fd_stake_ci.c Accepts compressed weights; filters dummy stakes; updates bounds checks
src/disco/shred/fd_shred_tile.c Wires discard_unexpected_data_complete_shreds feature into fec_resolver
src/disco/shred/fd_shred_dest.c Reformat scratch alloc append lines
src/disco/shred/fd_fec_resolver.c Tightens shred validation (idx/fec_set_idx invariants, merkle depth checks, recovered shred checks)
src/disco/plugin/fd_plugin_tile.c Updates peer count bounds checks
src/disco/metrics/generated/fd_metrics_shred.h Updates histogram max for contact info count
src/disco/metrics/generated/fd_metrics_replay.c Adds scheduler + progcache metrics entries
src/disco/metrics/generated/fd_metrics_repair.h Adds repair failed-sigverify metric; updates total
src/disco/metrics/generated/fd_metrics_repair.c Adds repair failed-sigverify metric
src/disco/metrics/generated/fd_metrics_execrp.c Replaces old progcache metrics with expanded set
src/disco/gui/fd_gui_peers.h Sizes GUI epoch stakes by MAX_STAKED_LEADERS
src/disco/gui/fd_gui_peers.c Filters dummy stake weights; enforces new bounds
src/disco/gui/fd_gui.h Updates GUI peer count bound
src/disco/gui/fd_gui.c Updates GUI bound handling and txn flags init logic
src/disco/gui/dist_stable/version Updates bundled GUI build version hash
src/disco/gui/dist_stable/index.html Updates bundled GUI asset URLs/preloads for new build
src/ballet/secp256r1/fd_secp256r1_s2n.c Adds mixed-add helper to handle doubling/identity case in double-scalar mul; whitespace cleanups
src/app/shared/fd_config.c Validates new program cache config parameters
src/app/firedancer/topology.c Replaces progcache backing objects with new progcache topo obj model
src/app/firedancer/main.c Registers progcache topology callbacks
src/app/firedancer/config/default.toml Changes default program cache heap size
src/app/firedancer/callbacks.c Adds progcache object callbacks for topo init
src/app/firedancer-dev/main.h Registers progcache topology callbacks in dev build
src/app/firedancer-dev/commands/sim.c Increases replay_epoch link MTU for compressed stakes
src/app/firedancer-dev/commands/send_test/send_test_helpers.c Minor formatting
src/app/firedancer-dev/commands/backtest.c Updates progcache topo uses to new obj model
src/app/fdctl/with-version.mk Switches to Agave-aware version parsing/encoding
src/app/fdctl/with-agave.mk Adds Agave semver/prerelease parsing helpers
src/app/fdctl/topology.c Updates link MTUs for new larger stake/peer bounds
contrib/test/test-vectors-commit-sha.txt Updates test vectors commit SHA
contrib/test/run_solcap_tests.sh Updates ledger/test configuration; changes solcap-tools usage
contrib/test/ledger_common.sh Makes echo_error robust when arg2 missing
contrib/tag-release.py Updates prerelease patch parsing logic
config/everything.mk Excludes fuzz targets from coverage collection
agave Updates Agave submodule revision
.github/workflows/benchmark.yml Updates benchmark ledger slots
.github/workflows/backtest.yml Increases CI timeouts
Comments suppressed due to low confidence (1)

src/discof/repair/fd_repair.c:1

  • fd_repair_ping_de rejects non-FD_REPAIR_KIND_PING kinds, but fd_repair_ping_ser will serialize whatever ping->kind contains. For a pair of *_ping_* APIs, this asymmetry is surprising and can lead to non-roundtrippable output. Consider enforcing ping->kind == FD_REPAIR_KIND_PING in the serializer (return error otherwise), or rename/generalize the APIs if the intent is to support additional kinds.

You can also share your feedback on Copilot code review. Take the survey.

@cmoyes-jump cmoyes-jump force-pushed the cmoyes/fix-quic-actor-crash branch from c597e64 to 89f6550 Compare March 19, 2026 18:17
@github-actions
Copy link

Test Vectors Coverage

Overall (lines / functions / branches across all instrumented files)

Metric Base PR Delta
Lines 15.30% (35172/229153) 16.00% (35358/220840) +0.70% 📈
Functions 0.40% (4354/1088855) 0.40% (4461/1084295) +0.00% ➡️
Branches N/A N/A N/A

Per directory (line coverage, recursive)

Directory Base PR Delta
flamenco/vm 61.21% (2167/3540) 61.21% (2167/3540) +0.00% ➡️
flamenco/vm/syscall 80.27% (1790/2230) 80.27% (1790/2230) +0.00% ➡️
flamenco/runtime 47.53% (9402/19780) 54.81% (9315/16996) +7.27% 📈
flamenco/runtime/program 46.09% (4391/9527) 63.17% (4309/6821) +17.08% 📈
ballet/bls 0.00% (0/294) 0.00% (0/294) +0.00% ➡️
ballet/bn254 91.38% (1760/1926) 91.38% (1760/1926) +0.00% ➡️
ballet/ed25519 50.63% (1129/2230) 52.73% (1129/2141) +2.10% 📈
ballet/secp256k1 92.44% (379/410) 97.18% (379/390) +4.74% 📈
ballet/secp256r1 93.46% (143/153) 92.47% (172/186) -0.99% 📉
ballet/zksdk 80.95% (973/1202) 80.95% (973/1202) +0.00% ➡️

Full HTML report: coverage report.

@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.121185 s 0.121181 s -0.003%
backtest mainnet-406545575-perf snapshot load 5.235 s 3.587 s -31.480%
backtest mainnet-406545575-perf total elapsed 121.18523 s 121.180571 s -0.004%
firedancer mem usage with mainnet.toml 1105.43 GiB 1105.43 GiB 0.000%

Copilot AI review requested due to automatic review settings March 19, 2026 23:29
@cmoyes-jump cmoyes-jump force-pushed the cmoyes/fix-quic-actor-crash branch from 89f6550 to 7d5acd2 Compare March 19, 2026 23:29
@github-actions
Copy link

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.121054 s 0.121247 s 0.159%
backtest mainnet-406545575-perf snapshot load 5.11 s 3.377 s -33.914%
backtest mainnet-406545575-perf total elapsed 121.053665 s 121.246995 s 0.160%
firedancer mem usage with mainnet.toml 1105.43 GiB 1105.43 GiB 0.000%

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a QUIC retransmission crash caused by stale pkt_meta entries that can remain after stream data has already been fully acknowledged via retransmission. It adds a guard in the retry path to skip retrying such stale entries and introduces a dedicated unit test suite covering stream lifecycle, ACK handling, and pkt_meta retry/loss behavior (including the regression).

Changes:

  • Prevent fd_quic_pkt_meta_retry from attempting to retry stale stream pkt_meta entries when the stream has no remaining data to send.
  • Add test_quic_pkt_meta_lifecycle unit test covering stream send/ACK cleanup, ACK ordering/gaps, retry behavior, and a regression test for the stale pkt_meta crash.
  • Register the new unit test target in the QUIC tests Local.mk.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/waltz/quic/fd_quic.c Adds a stale-pkt-meta guard in the stream retry path to avoid retrying completed streams.
src/waltz/quic/tests/test_quic_pkt_meta_lifecycle.c Adds comprehensive unit tests for pkt_meta lifecycle and the crash regression scenario.
src/waltz/quic/tests/Local.mk Hooks the new lifecycle unit test into the build and test run list.

Comment on lines +112 to +122
ulong last_pkt = conn->pkt_number[ APP_PN_SPACE ];
for( uint j=0; j<n; j++ ) {
conn->flags = ( conn->flags & ~FD_QUIC_CONN_FLAGS_PING_SENT ) | FD_QUIC_CONN_FLAGS_PING;
conn->upd_pkt_number = FD_QUIC_PKT_NUM_PENDING;
sandbox->wallclock += (long)1e6;
conn->svc_meta.next_timeout = sandbox->wallclock;
fd_quic_svc_timers_schedule( state->svc_timers, conn, sandbox->wallclock );
fd_quic_service( quic, sandbox->wallclock );
while( fd_quic_sandbox_next_packet( sandbox ) ) {}
last_pkt = conn->pkt_number[ APP_PN_SPACE ] - 1UL;
}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_pings sets last_pkt = conn->pkt_number[ APP_PN_SPACE ] - 1UL without verifying that fd_quic_service actually transmitted a packet. If no packet is sent (e.g. due to pkt_meta exhaustion or other gating), this underflows and returns a bogus packet number. Consider asserting conn->pkt_number[APP_PN_SPACE] advanced (similar to other QUIC tests), or compute last_pkt from emitted packets / remove the return value if it’s not needed.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants