-
Notifications
You must be signed in to change notification settings - Fork 926
Fix TOCTOU vulnerability in unused_port module #8016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
* Add co-author to mergify commits. * Remove unnecessary pull request rules from mergify config. * Revert automation removals
|
Thanks! We are aware of this issue and had been phasing these functions out in favour of using 0 ports explicitly. The hard part is replacing the uses of the old functions in code that assumes it can determine an unused port without binding a socket, e.g. here: lighthouse/lighthouse/tests/beacon_node.rs Lines 955 to 970 in fd10b63
If you would be interested in coming up with clever fixes for these cases, that would be extremely helpful. That would then allow us to delete rather than deprecate the unsafe functions. |
Most of the uses are in tests, but it's still important for us to get rid of them (if possible), as they sometimes contribute to CI flakiness. |
@sashaodessa Can you also please rebase on |
Looks good at a glance, I've requested a review from some our networking experts on the changes that impact prod |
Fulu uses `getPayloadV5`, this PR updates the notifier logging prior to the fork. Co-Authored-By: Jimmy Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me.
Agree a future PR to remove deprecation would be nice.
{% for commit in commits | unique(attribute='email_author') %} | ||
Co-Authored-By: {{ commit.author }} <{{ commit.email_author }}> | ||
{% endfor %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is some leftover stuff from stable
that hasn't been backmerged to unstable
yet (see https://github.com/sigp/lighthouse/commits/stable/), it should be deleted from this PR.
I think we can probably just delete the deprecated functions in this PR, AFAICT all (?) usages have been removed |
Attempt to address performance issues caused by importing the same block multiple times. - Check fork choice "after" obtaining the fork choice write lock in `BeaconChain::import_block`. We actually use an upgradable read lock, but this is semantically equivalent (the upgradable read has the advantage of not excluding regular reads). The hope is that this change has several benefits: 1. By preventing duplicate block imports we save time repeating work inside `import_block` that is unnecessary, e.g. writing the state to disk. Although the store itself now takes some measures to avoid re-writing diffs, it is even better if we avoid a disk write entirely. 2. By returning `DuplicateFullyImported`, we reduce some duplicated work downstream. E.g. if multiple threads importing columns trigger `import_block`, now only _one_ of them will get a notification of the block import completing successfully, and only this one will run `recompute_head`. This should help avoid a situation where multiple beacon processor workers are consumed by threads blocking on the `recompute_head_lock`. However, a similar block-fest is still possible with the upgradable fork choice lock (a large number of threads can be blocked waiting for the first thread to complete block import). Co-Authored-By: Michael Sproul <[email protected]>
Co-Authored-By: Eitan Seri- Levi <[email protected]>
Co-Authored-By: Eitan Seri- Levi <[email protected]>
Co-Authored-By: Eitan Seri- Levi <[email protected]>
…nd instrument tracing (sigp#8052) Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>
…igp#8056) Bump discv5 version Co-Authored-By: Josh King <[email protected]>
Lookup sync has a cache of block roots "failed_chains". If a peer triggers a lookup for a block or descendant of a root in failed_chains the lookup is dropped and the peer penalized. However blocks are inserted into failed_chains for a single reason: - If a chain is longer than 32 blocks the lookup is dropped to prevent OOM risks. However the peer is not at fault, since discovering an unknown chain longer than 32 blocks is not malicious. We just drop the lookup to sync the blocks from range forward sync. This discrepancy is probably an oversight when changing old code. Before we used to add blocks that failed too many times to process to that cache. However, we don't do that anymore. Adding a block that fails too many times to process is an optimization to save resources in rare cases where peers keep sending us invalid blocks. In case that happens, today we keep trying to process the block, downscoring the peers and eventually disconnecting them. _IF_ we found that optimization to be necessary we should merge this PR (_Stricter match of BlockError in lookup sync_) first. IMO we are fine without the failed_chains cache and the ignored_chains cache will be obsolete with [tree sync](sigp#7678) as the OOM risk of long lookup chains does not exist anymore. Closes sigp#7577 Rename `failed_chains` for `ignored_chains` and don't penalize peers that trigger lookups for those blocks Co-Authored-By: dapplion <[email protected]>
…sigp#8058) Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: gitToki <[email protected]>
* Update issue template * Delete old issue template
Co-Authored-By: Eitan Seri- Levi <[email protected]>
Fix a memory leak in the reprocess queue. If the vec of attestation IDs for a block is never evicted from the reprocess queue by a `BlockImported` event, then it stays in the map forever consuming memory. The fix is to remove the entry when its last attestation times out. We do similarly for light client updates. In practice this will only occur if there is a race between adding an attestation to the queue and processing the `BlockImported` event, or if there are attestations for block roots that we never import (e.g. random block roots, block roots of invalid blocks). Co-Authored-By: Michael Sproul <[email protected]>
Ensure that we don't log a warning for HTTP 202s, which are expected on the blinded block endpoints after Fulu. Co-Authored-By: Michael Sproul <[email protected]>
Co-Authored-By: Eitan Seri- Levi <[email protected]>
No issue Use HTTPS for dependency Co-Authored-By: Antonio Viggiano <[email protected]>
Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Eitan Seri- Levi <[email protected]> Co-Authored-By: Eitan Seri-Levi <[email protected]>
…uled (sigp#8109) sigp#8105 (to be confirmed) I noticed a large number of failed discovery requests after deploying latest `unstable` to some of our testnet and mainnet nodes. This is because of a recent PeerDAS change to attempt to maintain sufficient peers across data column subnets - this shouldn't be enabled on network without peerdas scheduled, otherwise it will keep retrying discovery on these subnets and never succeed. Also removed some unused files. Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>
…igp#8112) - PR sigp#8045 introduced a regression of how lookup sync interacts with the da_checker. Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip. I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events: - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block - Create lookup and leave in AwaitingDownload(block in processing cache) state - Block from HTTP API finishes importing - Lookup is left stuck Closes sigp#8104 - sigp#8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy. For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves sigp#8104 by allowing lookup sync to import the block twice in that case. Co-Authored-By: dapplion <[email protected]>
Holesky - sigp#8096 Hoodi - sigp#8097 Sepolia - sigp#8099 Testnet configs for Holesky, Hoodi and Sepolia Holesky - eth-clients/holesky#132 Hoodi - eth-clients/hoodi#21 Sepolia - eth-clients/sepolia#111 Co-Authored-By: Eitan Seri- Levi <[email protected]>
As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead. - [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly - [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware. - [x] Re-work and simplify the beacon proposer cache to be fork-aware. - [x] Optimise `with_proposer_cache` to use `OnceCell`. - [x] All tests passing. - [x] Resolve all remaining `FIXME(sproul)`s. - [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`. - [x] End-to-end regression test. - [x] Test on pre-Fulu network. - [x] Test on post-Fulu network. Co-Authored-By: Michael Sproul <[email protected]>
N/A In sigp#8101 , when we modified the logic to get the proposer index post fulu, we seem to have missed advancing the state at the fork boundaries to get the right `Fork` for signature verification. This led to lighthouse failing all gossip verification right after transitioning to fulu that was observed on the holesky shadow fork ``` Sep 26 14:24:00.088 DEBUG Rejected gossip block error: "InvalidSignature(ProposerSignature)", graffiti: "grandine-geth-super-1", slot: 640 Sep 26 14:24:00.099 WARN Could not verify block for gossip. Rejecting the block error: InvalidSignature(ProposerSignature) ``` I'm not completely sure this is the correct fix, but this fixes the issue with `InvalidProposerSignature` on the holesky shadow fork. Thanks to @eserilev for helping debug this Co-Authored-By: Pawan Dhananjay <[email protected]>
Follow-up to the bug fixed in: - sigp#8121 This fixes the root cause of that bug, which was introduced by me in: - sigp#8101 Lion identified the issue here: - sigp#8101 (comment) In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`. I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised). Co-Authored-By: Michael Sproul <[email protected]>
Testnet release for the upcoming Fusaka fork. Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>
Co-Authored-By: Michael Sproul <[email protected]>
Bump `superstruct` to the latest release `0.10.0`. This version uses a later version of `darling` which is helpful for sigp#8125 Co-Authored-By: Mac L <[email protected]>
- [x] Remove the unnecessary `_MILLIS` suffix from `MAXIMUM_GOSSIP_CLOCK_DISPARITY` - [x] Add missing Deneb preset `KZG_COMMITMENT_INCLUSION_PROOF_DEPTH`, not to be confused with `KZG_COMMITMENTS_INCLUSION_PROOF_DEPTH` (plural) from Fulu... Co-Authored-By: Michael Sproul <[email protected]>
@michaelsproul noticed this warning on a devnet-3 node ``` Oct 01 16:37:29.896 WARN Error when importing rpc custody columns error: ParentUnknown { parent_root: 0xe4cc85a2137b76eb083d7076255094a90f10caaec0afc8fd36807db742f6ff13 }, block_hash: 0x43ce63b2344990f5f4d8911b8f14e3d3b6b006edc35bbc833360e667df0edef7 ``` We're also seeing similar `WARN` logs for blobs on our live nodes. It's normal to get parent unknown in lookups and it's handled here https://github.com/sigp/lighthouse/blob/a134d43446f776fe2a84f420854afbff76ca93d8/beacon_node/network/src/sync/block_lookups/mod.rs#L611-L619 These shouldn't be a `WARN`, and we also log the same error in block lookups at `DEBUG` level here: https://github.com/sigp/lighthouse/blob/a134d43446f776fe2a84f420854afbff76ca93d8/beacon_node/network/src/sync/block_lookups/mod.rs#L643-L648 So i've removed these extra WARN logs. I've also lower the level of an `ERROR` log when unable to serve data column root requests - it's unexpected, but is unlikely to impact the nodes performance, so I think we can downgrade this. Co-Authored-By: Jimmy Chen <[email protected]>
Co-Authored-By: Eitan Seri- Levi <[email protected]>
N/A Post fulu, we should be calling the v2 api on the relays that doesn't return the blobs/data columns. However, we decided to start hitting the v2 api as soon as fulu is scheduled to avoid unexpected surprises at the fork. In the ACDT call, it seems like most clients are calling v2 only after the fulu fork. This PR aims to be the best of both worlds where we fallback to hitting v1 api if v2 fails. This way, we know beforehand if relays don't support it and can potentially alert them. Co-Authored-By: Pawan Dhananjay <[email protected]>
Use quoted integers for `state.proposer_lookahead` when serializing JSON. This is standard for all integer fields, but was missed for the newly added proposer lookahead. I noticed this issue while inspecting the head state on a local devnet. I'm glad we found this before someone reported it :P Co-Authored-By: Michael Sproul <[email protected]>
Closes sigp#8131 - [x] Remove deprecated flags from beacon_node/src/cli.rs: - [x] eth1-purge-cache - [x] eth1-blocks-per-log-query - [x] eth1-cache-follow-distance - [x] disable-deposit-contract-sync - [x] light-client-server - [x] Remove deprecated flags from lighthouse/src/main.rs: - [x] logfile - [x] terminal-total-difficulty-override - [x] terminal-block-hash-override - [x] terminal-block-hash-epoch-override - [x] safe-slots-to-import-optimistically - [x] Remove references to deprecated flags in config.rs files - [x] Remove warning messages for deprecated flags in main.rs - [x] Update/remove related tests in beacon_node.rs Co-Authored-By: Jimmy Chen <[email protected]> Co-Authored-By: Jimmy Chen <[email protected]>
Closes: - sigp#4412 This should reduce Lighthouse's block proposal times on Holesky and prevent us getting reorged. - [x] Allow the head state to be advanced further than 1 slot. This lets us avoid epoch processing on hot paths including block production, by having new epoch boundaries pre-computed and available in the state cache. - [x] Use the finalized state to prune the op pool. We were previously using the head state and trying to infer slashing/exit relevance based on `exit_epoch`. However some exit epochs are far in the future, despite occurring recently. Co-Authored-By: Michael Sproul <[email protected]>
N/A Update c-kzg and rust-eth-kzg to their latest versions. Also removes the patch version hardcoding in Cargo.toml. Co-Authored-By: Pawan Dhananjay <[email protected]>
* sigp#8085 Co-Authored-By: Tan Chee Keong <[email protected]> Co-Authored-By: chonghe <[email protected]>
…of ignoring it (sigp#8179) This issue was identified during the fusaka audit competition. The [`verify_parent_block_and_finalized_descendant`](https://github.com/sigp/lighthouse/blob/62d9302e0f9dd9f94d0325411a3029b36ad90685/beacon_node/beacon_chain/src/data_column_verification.rs#L606-L627) in data column gossip verification currently load the parent first before checking if the column descends from the finalized root. However, the `fork_choice.get_block(&block_parent_root)` function also make the same check internally: https://github.com/sigp/lighthouse/blob/8a4f6cf0d5b6b261b2c3439ce7c05383a53d30c5/consensus/fork_choice/src/fork_choice.rs#L1242-L1249 Therefore, if the column does not descend from the finalized root, we return an `UnknownParent` error, before hitting the `is_finalized_checkpoint_or_descendant` check just below. Which means we `IGNORE` the gossip message instead `REJECT`, and the gossip peer is not _immediately_ penalised. This deviates from the spec. However, worth noting that lighthouse will currently attempt to request the parent from this peer, and if the peer is not able to serve the parent, it gets penalised with a `LowToleranceError`, and will get banned after ~5 occurences. https://github.com/sigp/lighthouse/blob/ffa7b2b2b9e3b4e70678e2c749b8bc45234febd7/beacon_node/network/src/sync/network_context.rs#L1530-L1532 This PR will penalise the bad peer immediately instead of performing block lookups before penalising it. Co-Authored-By: Jimmy Chen <[email protected]>
Update `ForkName::latest_stable` to Fulu, reflecting our plan to stabilise Fulu in the immediate future! This will lead to some more tests running with Fulu rather than Electra. Co-Authored-By: Michael Sproul <[email protected]>
@sashaodessa Can you please update this PR with the latest changes from |
@michaelsproul Hi, I flooded my MacBook and took it to a repair shop. Unfortunately, I only have my phone now. I`ll try from another device |
This diff is way too large now |
Replaces unsafe port-finding functions with secure bound socket APIs to eliminate race conditions.
Problem
The existing
unused_tcp*_port()
andunused_udp*_port()
functions had a classic Time-of-Check-Time-of-Use (TOCTOU) vulnerability:Solution
Added new safe APIs that return already-bound sockets:
bind_tcp4_any()
/bind_tcp6_any()
→ returnsTcpListener
bind_udp4_any()
/bind_udp6_any()
→ returnsUdpSocket