Suzuka Mempool Horizontal Scaling #341

l-monninger · 2024-08-12T14:16:05Z

l-monninger
Aug 12, 2024

Summary

The mempool has been a consistent source of failure for the Suzuka Full Node. However, it may be possible to horizontally scale without drastic modifications. How do we do this?

Below are some initial suggestions.

Just Add More Full Nodes

The architecture is already such that each full node provides a horizontal scaling benefit on transaction ingress. That is, each full node will run its own mempool and independently be able to validate transactions, pipe them into blocks, and propose those blocks to the DA. The only issue with this, is that these nodes are not particularly inexpensive to operate.

Light and Proxy Nodes

Assume the following requirements:

We must still serve a complete Aptos API with consistent state at each node.
This includes handling transaction ingress via a compliant mempool API.

The second of these requirements makes this especially tricky because full transaction validation requires a view into Aptos state via the DB context. That is, we would need to still have full execution state. The only way to make this lighter would this be to perform state synchronization. As has been proposed in RFC17 this can involve into more complete light nodes.

With the synchronization in place, we could then remove the cost of execution and actual database writes--instead serving an Aptos API based on synchronized state in all places except for the mempool.

Concurrent Hash Map

The Aptos mempool API relies on two patterns which are not ideal for mempool throughput:

Submissions and requests for confirmation flow via the same channel.
The underlying HashMap used for transcation_by_hash does not enable concurrent reads. We for example, currently pass along one mutable reference into a processing loop.

To better scale the mempool, we could:

Refactor to effeciently split submissions and confirmations into separate channels.
Use an underlying concurrent_hash_map to process submissions and requests for confirmation simultaneously.

We may be able to re-evaluate our usage of the Aptos mempool to determine whether we actually need to rely on it. Currently, we use it for transaction validation while the actual block building reads from a pipe which doesn't rely on mempool internals.

At best, we would expect performance to increase by a multiple of the cores.

musitdev · 2024-08-14T07:28:19Z

musitdev
Aug 14, 2024
Collaborator

Mempool roles

In our architecture each node has its own mempool so it's a more local Tx storage than a mempool as we usually use in blockchain. Tx are no distributed. Sequencer own their Tx, and they are distributed between execution node with Da layer.
So our storage has this role:

store the Tx during all the Tx processing so that they are available all the time. This storage can be only in memory if we accept that submited Tx not processed can be lost if a crash arrives. In fact, I'm not sure if it's important to save them because if a crash arrives it will take more than the viability of the Tx to restart. So this storage can be processed in memory.
store the Tx after it has been processed for history purpose. This storage can be done in another server, for example a postgresql server. If it crashes, it doesn't interfere with the blockchain processing, so it's not inside the critical path of blockchain viability. One Posgresql server for all nodes is then acceptable.
Provide Tx for the Aptos Rest API: See the discussion after but short terms we can populate the current mempool only for this purpose to have the time to plug the Rest API to the postgres DB for finalized Tx and node local memory pool for processing Tx.

Aptos component use

As I understand, the initial idea was to get most of the Aptos component to gain time and plug our process inside because we thought they can be adapted to our needs. From the last month dev experience, I see that it's more difficult to do it than it was expected. We do a lot of change in Aptos code, and it's hard to plug, for example, our reverse block algo which is a key specific component of our algo. It makes me think that we start to spend more time to adapt than to develop the core component we need. Mempool is one, another one is the block state and storage. In the end, the Aptos components that we really need are BlockSTM tx execution with Move framework and provide an Aptos compatible Rest API. Short term we can use the Aptos ledger but as we want to plug other Move framework, perhaps will need at some point our own ledger.

I think, currently, we are at a key moment because we can evaluate the effort that we need to do to adapt the Aptos code for our purpose and evaluate the effort needed to develop some of these component and adapt the architecture more to our needs and define a sort of migration plan. Perhpas we should take time to evaluate both path.

3 replies

l-monninger Aug 14, 2024
Author

In our architecture each node has its own mempool so it's a more local Tx storage than a mempool as we usually use in blockchain.

We actually have two mempools. One from Aptos (non-shared) with the transaction pipe alongside and one in Memseq that's supposed to be where we would implement any kind of shared of mempool.

l-monninger Aug 14, 2024
Author

store the Tx after it has been processed for history purpose. This storage can be done in another server, for example a postgresql server.

So, because we already have persistence in Memseq, I would suggest it happens therein.

l-monninger Aug 14, 2024
Author

Mempool is one, another one is the block state and storage. In the end, the Aptos components that we really need are BlockSTM tx execution with Move framework and provide an Aptos compatible Rest API.

We don't use the Aptos Mempool for actual mempool operations. This is already moved to Memseq. The only reason we have the Aptos Mempool involved at this point is for API compatibility. We are basically duplicating transactions to get validation in the same way the Aptos Mempool does.

I think we may even be able to remove the the commit_transaction operation actually given the latest round of updates...

musitdev · 2024-08-14T07:33:52Z

musitdev
Aug 14, 2024
Collaborator

If it seems a too important dev, one thing that roles shows is that we can separate the mempool in too separate component, one for Tx processing and one for history. This way we can optimize both for their real use.

1 reply

l-monninger Aug 14, 2024
Author

I agree with this behind the scenes. But, API unification will likely make its usage simpler and avoid errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suzuka Mempool Horizontal Scaling #341

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Suzuka Mempool Horizontal Scaling #341

Uh oh!

l-monninger Aug 12, 2024

Summary

Just Add More Full Nodes

Light and Proxy Nodes

Concurrent Hash Map

Replies: 2 comments · 4 replies

Uh oh!

musitdev Aug 14, 2024 Collaborator

Mempool roles

Aptos component use

Uh oh!

l-monninger Aug 14, 2024 Author

Uh oh!

l-monninger Aug 14, 2024 Author

Uh oh!

Uh oh!

l-monninger Aug 14, 2024 Author

Uh oh!

musitdev Aug 14, 2024 Collaborator

Uh oh!

Uh oh!

l-monninger Aug 14, 2024 Author

l-monninger
Aug 12, 2024

Replies: 2 comments 4 replies

musitdev
Aug 14, 2024
Collaborator

l-monninger Aug 14, 2024
Author

l-monninger Aug 14, 2024
Author

l-monninger Aug 14, 2024
Author

musitdev
Aug 14, 2024
Collaborator

l-monninger Aug 14, 2024
Author