Skip to content

Transaction propagation problem under high load #2728

@mcrakhman

Description

@mcrakhman

When we spam txs to the mempool of one node (I did spam txs with two accs to two nodes acc A from node X1 to node X2, acc B from node Y1 to node Y2), each acc was giving 11mb/s txs (10 txs, 1,1mb each), so around 22mb/s total, we can basically halt the network where the mempool is full and txs are very slowly propagated. Also after I switched off the second account, leaving only 11 mb/s, the mempool was still working very slowly.

I started to look at the causes of this and it seems that it is partially caused by mempool dropping highest seen transactions in excess of the buffer size defaultPendingSeenPerSigner = 128. This creates a gap where node which broadcasts transactions (first node which receives stuff from the client) has significantly more txs than the other nodes and the slower nodes drop the txs which are in excess of the buffer size, thus forgetting that they exist and only later updating the sequence when it gets the block. Bear in mind that SeenTx messages are broadcasted fairly quickly.

It seems that just increasing the seen tx buffer (defaultPendingSeenPerSigner) is not the correct approach because this number is arbitrary which depends on the size of the mempool. But I think we should note here that if the node sends us any SeenTx message for a particular signer, then it should have seen this tx and all the txs before it (of course it could've dropped some of them after adding to the block). So the idea is that the node should have a consecutive range of txs. So therefore we can create a data structure which for each peer will have a range of txs that it holds (like map[signer][peer]range or something like that whichever is more convenient)

then we can extend SeenTx:

message SeenTx {
  bytes  tx_key   = 1;
  uint64 sequence = 2;
  bytes  signer   = 3;
  uint54 min_sequence = 4; // add min_sequence that the minimum that the node holds, because node can drop messages that were added
}

as a fallback if the node doesn't support this new parameter we can just remember the lowest tx sequence which we got from the node (or lowest tx sequence in the state, whichever is higher) and the deem this as a beginning of range.

Metadata

Metadata

Labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions