Fix TOCTOU vulnerability in unused_port module #8016

sashaodessa · 2025-09-09T15:24:09Z

Replaces unsafe port-finding functions with secure bound socket APIs to eliminate race conditions.

Problem

The existing unused_tcp*_port() and unused_udp*_port() functions had a classic Time-of-Check-Time-of-Use (TOCTOU) vulnerability:

Functions would bind to port 0, get the assigned port number, then immediately close the socket
Between getting the port number and later binding to it, another process could claim the same port
The LRU cache provided false security and didn't prevent the race condition

Solution

Added new safe APIs that return already-bound sockets:

bind_tcp4_any() / bind_tcp6_any() → returns TcpListener
bind_udp4_any() / bind_udp6_any() → returns UdpSocket

* Add co-author to mergify commits. * Remove unnecessary pull request rules from mergify config. * Revert automation removals

cla-assistant · 2025-09-09T15:24:22Z

All committers have signed the CLA.

cla-assistant · 2025-09-09T15:24:26Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

michaelsproul · 2025-09-09T23:44:00Z

Thanks!

We are aware of this issue and had been phasing these functions out in favour of using 0 ports explicitly. The hard part is replacing the uses of the old functions in code that assumes it can determine an unused port without binding a socket, e.g. here:

lighthouse/lighthouse/tests/beacon_node.rs

Lines 955 to 970 in fd10b63

    
           let port = unused_tcp4_port().expect("Unable to find unused port."); 
        
           CommandLineTest::new() 
        
               .flag("port", Some(port.to_string().as_str())) 
        
               .flag("allow-insecure-genesis-sync", None) 
        
               .run() 
        
               .with_config(|config| { 
        
                   assert_eq!( 
        
                       config.network.listen_addrs().v4().map(|listen_addr| ( 
        
                           listen_addr.disc_port, 
        
                           listen_addr.quic_port, 
        
                           listen_addr.tcp_port 
        
                       )), 
        
                       // quic_port should be (tcp_port + 1) if tcp_port is given as non-zero. 
        
                       Some((port, port + 1, port)) 
        
                   ); 
        
               });

If you would be interested in coming up with clever fixes for these cases, that would be extremely helpful. That would then allow us to delete rather than deprecate the unsafe functions.

michaelsproul · 2025-09-09T23:45:15Z

Most of the uses are in tests, but it's still important for us to get rid of them (if possible), as they sometimes contribute to CI flakiness.

sashaodessa · 2025-09-10T20:44:41Z

@daveth3t3chg33k

michaelsproul · 2025-09-11T01:06:23Z

@sashaodessa Can you also please rebase on unstable? All PRs need to be merged to unstable

sashaodessa · 2025-09-14T08:04:36Z

@michaelsproul

michaelsproul · 2025-09-14T12:34:28Z

Looks good at a glance, I've requested a review from some our networking experts on the changes that impact prod

Fulu uses `getPayloadV5`, this PR updates the notifier logging prior to the fork. Co-Authored-By: Jimmy Chen <[email protected]>

AgeManning

These changes look good to me.

Agree a future PR to remove deprecation would be nice.

AgeManning · 2025-09-14T23:39:30Z

.github/mergify.yml

+      
+      {% for commit in commits | unique(attribute='email_author') %}
+      Co-Authored-By: {{ commit.author }} <{{ commit.email_author }}>
+      {% endfor %}


Are these changes relevant?

No this is some leftover stuff from stable that hasn't been backmerged to unstable yet (see https://github.com/sigp/lighthouse/commits/stable/), it should be deleted from this PR.

michaelsproul · 2025-09-14T23:49:01Z

I think we can probably just delete the deprecated functions in this PR, AFAICT all (?) usages have been removed

Attempt to address performance issues caused by importing the same block multiple times. - Check fork choice "after" obtaining the fork choice write lock in `BeaconChain::import_block`. We actually use an upgradable read lock, but this is semantically equivalent (the upgradable read has the advantage of not excluding regular reads). The hope is that this change has several benefits: 1. By preventing duplicate block imports we save time repeating work inside `import_block` that is unnecessary, e.g. writing the state to disk. Although the store itself now takes some measures to avoid re-writing diffs, it is even better if we avoid a disk write entirely. 2. By returning `DuplicateFullyImported`, we reduce some duplicated work downstream. E.g. if multiple threads importing columns trigger `import_block`, now only _one_ of them will get a notification of the block import completing successfully, and only this one will run `recompute_head`. This should help avoid a situation where multiple beacon processor workers are consumed by threads blocking on the `recompute_head_lock`. However, a similar block-fest is still possible with the upgradable fork choice lock (a large number of threads can be blocked waiting for the first thread to complete block import). Co-Authored-By: Michael Sproul <[email protected]>

Co-Authored-By: Eitan Seri- Levi <[email protected]>

…nd instrument tracing (sigp#8052) Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>

…igp#8056) Bump discv5 version Co-Authored-By: Josh King <[email protected]>

Lookup sync has a cache of block roots "failed_chains". If a peer triggers a lookup for a block or descendant of a root in failed_chains the lookup is dropped and the peer penalized. However blocks are inserted into failed_chains for a single reason: - If a chain is longer than 32 blocks the lookup is dropped to prevent OOM risks. However the peer is not at fault, since discovering an unknown chain longer than 32 blocks is not malicious. We just drop the lookup to sync the blocks from range forward sync. This discrepancy is probably an oversight when changing old code. Before we used to add blocks that failed too many times to process to that cache. However, we don't do that anymore. Adding a block that fails too many times to process is an optimization to save resources in rare cases where peers keep sending us invalid blocks. In case that happens, today we keep trying to process the block, downscoring the peers and eventually disconnecting them. _IF_ we found that optimization to be necessary we should merge this PR (_Stricter match of BlockError in lookup sync_) first. IMO we are fine without the failed_chains cache and the ignored_chains cache will be obsolete with [tree sync](sigp#7678) as the OOM risk of long lookup chains does not exist anymore. Closes sigp#7577 Rename `failed_chains` for `ignored_chains` and don't penalize peers that trigger lookups for those blocks Co-Authored-By: dapplion <[email protected]>

…sigp#8058) Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: gitToki <[email protected]>

* Update issue template * Delete old issue template

Co-Authored-By: Eitan Seri- Levi <[email protected]>

Fix a memory leak in the reprocess queue. If the vec of attestation IDs for a block is never evicted from the reprocess queue by a `BlockImported` event, then it stays in the map forever consuming memory. The fix is to remove the entry when its last attestation times out. We do similarly for light client updates. In practice this will only occur if there is a race between adding an attestation to the queue and processing the `BlockImported` event, or if there are attestations for block roots that we never import (e.g. random block roots, block roots of invalid blocks). Co-Authored-By: Michael Sproul <[email protected]>

Ensure that we don't log a warning for HTTP 202s, which are expected on the blinded block endpoints after Fulu. Co-Authored-By: Michael Sproul <[email protected]>

Co-Authored-By: Eitan Seri- Levi <[email protected]>

No issue Use HTTPS for dependency Co-Authored-By: Antonio Viggiano <[email protected]>

Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Eitan Seri- Levi <[email protected]> Co-Authored-By: Eitan Seri-Levi <[email protected]>

…uled (sigp#8109) sigp#8105 (to be confirmed) I noticed a large number of failed discovery requests after deploying latest `unstable` to some of our testnet and mainnet nodes. This is because of a recent PeerDAS change to attempt to maintain sufficient peers across data column subnets - this shouldn't be enabled on network without peerdas scheduled, otherwise it will keep retrying discovery on these subnets and never succeed. Also removed some unused files. Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>

…igp#8112) - PR sigp#8045 introduced a regression of how lookup sync interacts with the da_checker. Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip. I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events: - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block - Create lookup and leave in AwaitingDownload(block in processing cache) state - Block from HTTP API finishes importing - Lookup is left stuck Closes sigp#8104 - sigp#8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy. For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves sigp#8104 by allowing lookup sync to import the block twice in that case. Co-Authored-By: dapplion <[email protected]>

Holesky - sigp#8096 Hoodi - sigp#8097 Sepolia - sigp#8099 Testnet configs for Holesky, Hoodi and Sepolia Holesky - eth-clients/holesky#132 Hoodi - eth-clients/hoodi#21 Sepolia - eth-clients/sepolia#111 Co-Authored-By: Eitan Seri- Levi <[email protected]>

As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead. - [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly - [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware. - [x] Re-work and simplify the beacon proposer cache to be fork-aware. - [x] Optimise `with_proposer_cache` to use `OnceCell`. - [x] All tests passing. - [x] Resolve all remaining `FIXME(sproul)`s. - [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`. - [x] End-to-end regression test. - [x] Test on pre-Fulu network. - [x] Test on post-Fulu network. Co-Authored-By: Michael Sproul <[email protected]>

@eserilev

N/A In sigp#8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right `Fork` for signature verification. This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork ``` Sep 26 14:24:00.088 DEBUG Rejected gossip block error: "InvalidSignature(ProposerSignature)", graffiti: "grandine-geth-super-1", slot: 640 Sep 26 14:24:00.099 WARN Could not verify block for gossip. Rejecting the block error: InvalidSignature(ProposerSignature) ``` I'm not completely sure this is the correct fix, but this fixes the issue with `InvalidProposerSignature` on the holesky shadow fork. Thanks to @eserilev for helping debug this Co-Authored-By: Pawan Dhananjay <[email protected]>

Follow-up to the bug fixed in: - sigp#8121 This fixes the root cause of that bug, which was introduced by me in: - sigp#8101 Lion identified the issue here: - sigp#8101 (comment) In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`. I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised). Co-Authored-By: Michael Sproul <[email protected]>

Testnet release for the upcoming Fusaka fork. Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>

Co-Authored-By: Michael Sproul <[email protected]>

Bump `superstruct` to the latest release `0.10.0`. This version uses a later version of `darling` which is helpful for sigp#8125 Co-Authored-By: Mac L <[email protected]>

- [x] Remove the unnecessary `_MILLIS` suffix from `MAXIMUM_GOSSIP_CLOCK_DISPARITY` - [x] Add missing Deneb preset `KZG_COMMITMENT_INCLUSION_PROOF_DEPTH`, not to be confused with `KZG_COMMITMENTS_INCLUSION_PROOF_DEPTH` (plural) from Fulu... Co-Authored-By: Michael Sproul <[email protected]>

@michaelsproul

@michaelsproul noticed this warning on a devnet-3 node ``` Oct 01 16:37:29.896 WARN Error when importing rpc custody columns error: ParentUnknown { parent_root: 0xe4cc85a2137b76eb083d7076255094a90f10caaec0afc8fd36807db742f6ff13 }, block_hash: 0x43ce63b2344990f5f4d8911b8f14e3d3b6b006edc35bbc833360e667df0edef7 ``` We're also seeing similar `WARN` logs for blobs on our live nodes. It's normal to get parent unknown in lookups and it's handled here https://github.com/sigp/lighthouse/blob/a134d43446f776fe2a84f420854afbff76ca93d8/beacon_node/network/src/sync/block_lookups/mod.rs#L611-L619 These shouldn't be a `WARN`, and we also log the same error in block lookups at `DEBUG` level here: https://github.com/sigp/lighthouse/blob/a134d43446f776fe2a84f420854afbff76ca93d8/beacon_node/network/src/sync/block_lookups/mod.rs#L643-L648 So i've removed these extra WARN logs. I've also lower the level of an `ERROR` log when unable to serve data column root requests - it's unexpected, but is unlikely to impact the nodes performance, so I think we can downgrade this. Co-Authored-By: Jimmy Chen <[email protected]>

Co-Authored-By: Eitan Seri- Levi <[email protected]>

N/A Post fulu, we should be calling the v2 api on the relays that doesn't return the blobs/data columns. However, we decided to start hitting the v2 api as soon as fulu is scheduled to avoid unexpected surprises at the fork. In the ACDT call, it seems like most clients are calling v2 only after the fulu fork. This PR aims to be the best of both worlds where we fallback to hitting v1 api if v2 fails. This way, we know beforehand if relays don't support it and can potentially alert them. Co-Authored-By: Pawan Dhananjay <[email protected]>

Use quoted integers for `state.proposer_lookahead` when serializing JSON. This is standard for all integer fields, but was missed for the newly added proposer lookahead. I noticed this issue while inspecting the head state on a local devnet. I'm glad we found this before someone reported it :P Co-Authored-By: Michael Sproul <[email protected]>

Closes sigp#8131 - [x] Remove deprecated flags from beacon_node/src/cli.rs: - [x] eth1-purge-cache - [x] eth1-blocks-per-log-query - [x] eth1-cache-follow-distance - [x] disable-deposit-contract-sync - [x] light-client-server - [x] Remove deprecated flags from lighthouse/src/main.rs: - [x] logfile - [x] terminal-total-difficulty-override - [x] terminal-block-hash-override - [x] terminal-block-hash-epoch-override - [x] safe-slots-to-import-optimistically - [x] Remove references to deprecated flags in config.rs files - [x] Remove warning messages for deprecated flags in main.rs - [x] Update/remove related tests in beacon_node.rs Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>

Closes: - sigp#4412 This should reduce Lighthouse's block proposal times on Holesky and prevent us getting reorged. - [x] Allow the head state to be advanced further than 1 slot. This lets us avoid epoch processing on hot paths including block production, by having new epoch boundaries pre-computed and available in the state cache. - [x] Use the finalized state to prune the op pool. We were previously using the head state and trying to infer slashing/exit relevance based on `exit_epoch`. However some exit epochs are far in the future, despite occurring recently. Co-Authored-By: Michael Sproul <[email protected]>

N/A Update c-kzg and rust-eth-kzg to their latest versions. Also removes the patch version hardcoding in Cargo.toml. Co-Authored-By: Pawan Dhananjay <[email protected]>

* sigp#8085 Co-Authored-By: Tan Chee Keong <[email protected]> Co-Authored-By: chonghe <[email protected]>

…of ignoring it (sigp#8179) This issue was identified during the fusaka audit competition. The [`verify_parent_block_and_finalized_descendant`](https://github.com/sigp/lighthouse/blob/62d9302e0f9dd9f94d0325411a3029b36ad90685/beacon_node/beacon_chain/src/data_column_verification.rs#L606-L627) in data column gossip verification currently load the parent first before checking if the column descends from the finalized root. However, the `fork_choice.get_block(&block_parent_root)` function also make the same check internally: https://github.com/sigp/lighthouse/blob/8a4f6cf0d5b6b261b2c3439ce7c05383a53d30c5/consensus/fork_choice/src/fork_choice.rs#L1242-L1249 Therefore, if the column does not descend from the finalized root, we return an `UnknownParent` error, before hitting the `is_finalized_checkpoint_or_descendant` check just below. Which means we `IGNORE` the gossip message instead `REJECT`, and the gossip peer is not _immediately_ penalised. This deviates from the spec. However, worth noting that lighthouse will currently attempt to request the parent from this peer, and if the peer is not able to serve the parent, it gets penalised with a `LowToleranceError`, and will get banned after ~5 occurences. https://github.com/sigp/lighthouse/blob/ffa7b2b2b9e3b4e70678e2c749b8bc45234febd7/beacon_node/network/src/sync/network_context.rs#L1530-L1532 This PR will penalise the bad peer immediately instead of performing block lookups before penalising it. Co-Authored-By: Jimmy Chen <[email protected]>

Update `ForkName::latest_stable` to Fulu, reflecting our plan to stabilise Fulu in the immediate future! This will lead to some more tests running with Fulu rather than Electra. Co-Authored-By: Michael Sproul <[email protected]>

michaelsproul · 2025-10-10T12:33:34Z

@sashaodessa Can you please update this PR with the latest changes from unstable?

sashaodessa · 2025-10-10T12:36:15Z

@sashaodessa Can you please update this PR with the latest changes from unstable?

@michaelsproul Hi, I flooded my MacBook and took it to a repair shop. Unfortunately, I only have my phone now. I`ll try from another device

michaelsproul · 2025-10-13T23:27:48Z

This diff is way too large now

Add co-author to mergify commits (sigp#7993)

fd10b63

* Add co-author to mergify commits. * Remove unnecessary pull request rules from mergify config. * Revert automation removals

michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. Networking labels Sep 9, 2025

sashaodessa requested a review from jxs as a code owner September 10, 2025 08:26

This comment was marked as spam.

Sign in to view

fix TOCTOU vulnerability in unused_port module

148997b

sashaodessa force-pushed the stable branch from d64758b to 148997b Compare September 11, 2025 08:10

michaelsproul changed the base branch from stable to unstable September 14, 2025 12:34

Update engine methods in notifier (sigp#8038)

b817851

Fulu uses `getPayloadV5`, this PR updates the notifier logging prior to the fork. Co-Authored-By: Jimmy Chen <[email protected]>

AgeManning reviewed Sep 14, 2025

View reviewed changes

michaelsproul and others added 11 commits September 16, 2025 04:10

Remove column reconstruction when processing rpc requests (sigp#8051)

4409500

Co-Authored-By: Eitan Seri- Levi <[email protected]>

Reduce reconstruction queue capacity (sigp#8053)

aba3627

Co-Authored-By: Eitan Seri- Levi <[email protected]>

Add instrumentation to recompute_head_at_slot (sigp#8049)

242bdfc

Co-Authored-By: Eitan Seri- Levi <[email protected]>

Enable reconstruction for nodes custodying more than 50% of columns a…

3de646c

…nd instrument tracing (sigp#8052) Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>

chore: Bump discv5 and remove generic DefaultProtocolId in metrics (s…

191570e

…igp#8056) Bump discv5 version Co-Authored-By: Josh King <[email protected]>

fix(rate_limiter): add missing prune calls for light client protocols (…

5928407

…sigp#8058) Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: gitToki <[email protected]>

Update issue template (sigp#7938)

3cb7e59

* Update issue template * Delete old issue template

Prevent silently dropping cell proof chunks (sigp#8023)

521be2b

Co-Authored-By: Eitan Seri- Levi <[email protected]>

michaelsproul and others added 25 commits September 22, 2025 11:03

Merge remote-tracking branch 'origin/stable' into unstable

c1fb060

Refine HTTP status logs (sigp#8098)

1dbc4f8

Ensure that we don't log a warning for HTTP 202s, which are expected on the blinded block endpoints after Fulu. Co-Authored-By: Michael Sproul <[email protected]>

Reduce TARGET_BACKFILL_SLOTS in checkpoint sync test (sigp#8102)

7a7fe96

Co-Authored-By: Eitan Seri- Levi <[email protected]>

Use HTTPS for xdelta3 in Cargo.toml (sigp#8094)

d80c0ff

No issue Use HTTPS for dependency Co-Authored-By: Antonio Viggiano <[email protected]>

Run reconstruction inside a scoped rayon pool (sigp#8075)

af27402

Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Eitan Seri- Levi <[email protected]> Co-Authored-By: Eitan Seri-Levi <[email protected]>

Release v8.0.0 rc.0 (sigp#8127)

e5b4983

Testnet release for the upcoming Fusaka fork. Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>

Update book for DB schema v28 (sigp#8132)

9c6d331

Co-Authored-By: Michael Sproul <[email protected]>

Bump superstruct to 0.10.0 (sigp#8133)

af5cbfb

Bump `superstruct` to the latest release `0.10.0`. This version uses a later version of `darling` which is helpful for sigp#8125 Co-Authored-By: Mac L <[email protected]>

Fulu ASCII art (sigp#8151)

4eb8960

Co-Authored-By: Eitan Seri- Levi <[email protected]>

Bump kzg library versions (sigp#8174)

8e382ce

N/A Update c-kzg and rust-eth-kzg to their latest versions. Also removes the patch version hardcoding in Cargo.toml. Co-Authored-By: Pawan Dhananjay <[email protected]>

Implement /eth/v1/beacon/blobs endpoint (sigp#8103)

3110ca3

* sigp#8085 Co-Authored-By: Tan Chee Keong <[email protected]> Co-Authored-By: chonghe <[email protected]>

Merge branch 'stable' into unstable

5825bfa

sashaodessa requested a review from michaelsproul October 10, 2025 12:46

Fix TOCTOU vulnerability in unused_port module #8016

Are you sure you want to change the base?

Fix TOCTOU vulnerability in unused_port module #8016

Uh oh!

Conversation

sashaodessa commented Sep 9, 2025

Problem

Solution

Uh oh!

cla-assistant bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cla-assistant bot commented Sep 9, 2025

Uh oh!

michaelsproul commented Sep 9, 2025

Uh oh!

michaelsproul commented Sep 9, 2025

Uh oh!

This comment was marked as spam.

Uh oh!

sashaodessa commented Sep 10, 2025

Uh oh!

michaelsproul commented Sep 11, 2025

Uh oh!

sashaodessa commented Sep 14, 2025

Uh oh!

michaelsproul commented Sep 14, 2025

Uh oh!

AgeManning left a comment

Choose a reason for hiding this comment

Uh oh!

AgeManning Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

michaelsproul Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

michaelsproul commented Sep 14, 2025

Uh oh!

michaelsproul commented Oct 10, 2025

Uh oh!

sashaodessa commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelsproul commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

cla-assistant bot commented Sep 9, 2025 •

edited

Loading

sashaodessa commented Oct 10, 2025 •

edited

Loading