Skip to content

Conversation

alxiong
Copy link
Contributor

@alxiong alxiong commented Sep 11, 2025

Closes https://app.asana.com/1/1208976916964769/project/1210099833957576/task/1211291499200146

An alternative approach to #495

This PR:

  • Add a websocket endpoint in configs (requiring chain node to expose both http and ws endpoint)
  • Use pubsub transport provider to subscribe to the specific event CommitteeCreated, and trigger Sequencer.set_next_committee() upon receiving info about the next committee
  • update integration test to test this trigger logic
  • Remove test-configs/c1 after the just test-dyn-comm test,

Copy link
Contributor

@lukeiannucci lukeiannucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just took an initial pass looks good so far

@alxiong
Copy link
Contributor Author

alxiong commented Sep 12, 2025

This is my attempt at using your cool task runner. 3d05908

lmk where can I improve or incorrect usage? @twittner

@lukeiannucci
Copy link
Contributor

This is my attempt at using your cool task runner. 3d05908

lmk where can I improve or incorrect usage? @twittner

so i figured out it doesnt like passing in mnemonic, that messes with the parsing. also if you kill a task like you were killing yapper that the will kill the group of processes and fail.

@alxiong
Copy link
Contributor Author

alxiong commented Sep 15, 2025

✅ DONE:
I have some local changes, here's my plan:

  • add "register threshold enc key" function to register.rs binary
  • add "optionally fetch enc key from contract instead of directly from nodes' http-api" to yapper
  • add an extra step to register threshold enc key in just test-dyn-comm and later force yapper to read from contract, which ensure that second committee is working properly (with key reshared, not just an parallel new timeboost instance)

@alxiong
Copy link
Contributor Author

alxiong commented Sep 15, 2025

Based on the log, the problem is that we are not setting the prev_sailfish_committee and prev_decrypt_committee properly for the c1 committee.

Fixed in d07f98e.

Got a new error: when new decrypt committee started running, there are unknown peer,
Screenshot 2025-09-16 at 12 12 45 AM

I probably messed some logic up, will continue tmr morning.


Update: problem fixed ✅
f9651fa

Previous problems are:

  • previously I sleep for too long that I didn't leave enough time for c0 to finish resharing.
  • a committee of 3 is too small for resharing threshold to work, so change to 4 node committee.
  • c0 must run long enough, other wise the default until = 1000 might end before UseCommittee happens. thus I set to --until 1600 which is safe. (in CI it's slower, so 20 sec only elapse 300+ rounds, but locally it's fast, i want both to pass, thus set it conservative)

(Ah, btw, && rm -rf test-configs/c1 suffix does work. @lukeiannucci, see 9e56d2f)

#[clap(long)]
parent_url: Option<Url>,

/// KeyManager contract address on the parent chain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would recommend a rename to key_manager_contract_proxy so that devs are aware that we're interacting with the key manager contract implementation via a proxy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intention, but from the perspective of non-contract dev/ops, they shouldn't care about proxy. They are given an address (by the devops or from the deploy-contract script) and treat that as the KeyManager contract. Proxy pattern should be hidden from them imo.

in the *-contract code, I'm fine with being more explicit about _proxy for clarity tho.
So I will the naming as is here.

Copy link
Contributor

@lukeiannucci lukeiannucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me!

Comment on lines 171 to 173
let tag = if chain_id == 31337 || chain_id == 1337 {
// local test chain, we start scanning from the genesis
BlockNumberOrTag::Number(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could those test chain configs not directly set the block tag to 0?

Copy link
Contributor Author

@alxiong alxiong Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they could. but i thought that tag was used for some other purpose? but yah, I will update the test-config instead.

Copy link
Contributor

@lukeiannucci lukeiannucci Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are used for the delayed inbox, i dont think we should set it to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have an additional field in the config, would you like that?

I hope there's an api like block_number_from_timestamp() provided by alloy Provider, but they don't. Otherwise we could set the default starting to effective timestamp of the last committee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually i implemented a binary-search-based get_block_number_from_timestamp() in 4547bcb so that users don't need to set that extra field in config.

    /// Returns the smallest block number whose timestamp is >= `target_ts` through binary search.
    /// Useful to determine `from_block` input of `Self::event_stream()` subscription.
    pub async fn get_block_number_by_timestamp(
        &self,
        target_ts: u64,
    ) -> anyhow::Result<Option<u64>> {

// then later:
        let from_block = provider
            .get_block_number_by_timestamp(start_ts.into())
            .await?
            .unwrap_or_default();
        let events = provider
            .event_stream::<CommitteeCreated>(
                config.key_manager_contract,
                BlockNumberOrTag::Number(from_block),
            )
            .await
            .map_err(|e| {
                error!("Failed to create CommitteeCreated stream: {:?}", e);
                e
            })?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: switch to cleaner contract-assisted approach in c153532, as suggested in #502 (comment)

Co-authored-by: Toralf Wittner <[email protected]>
Copy link
Contributor

@twittner twittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alxiong!

@alxiong alxiong merged commit a8ea3d3 into main Sep 19, 2025
7 of 8 checks passed
@alxiong alxiong deleted the ax/event-stream branch September 19, 2025 14:13
twittner added a commit that referenced this pull request Sep 30, 2025
This test is redundant since [PR #502][1] was merged and is often
failing on CI. The reason for its failure is a race condition: sometimes
not all nodes finish the resharing because of all the DKG requests they
send, less than t + 1 are answered because peers are themselves in an
incomplete DKG state or they are not in a running state. In any case they
are not replying. The test is then ultimately hanging and timing out
because it attempts to subsequently collect the inclusion list from all
decrypters and some can not provideit.

The test would need to be fixed by allowing enough time for the
decrypters of committee 1 to have their local DKG state ready before
the members of committee 2 send their DKG requests. For production it is
assumed that between scheduling the next committee and activating it,
sufficient time is given.

Rather than fixing the test this PR removes it because -- as mentioned
above -- it is redundant with another test.

[1]: #502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants