-
Notifications
You must be signed in to change notification settings - Fork 184
Description
During IBD of a new SPV wallet, the same set of headers and GC filters are fetched from every connected peer. This is less than ideal from a performance standpoint, because it causes wasted bandwidth, memory and cpu processing and triggers a large number of concurrent goroutines (2000+ for each node).
Here's a sample profile from a mainnet wallet, one using SPV connect to restrict the wallet to a single peer and one where no such restriction is done (and between 6-8 peers are being used to perform IBD):
$ pprof -top regular/cpu.pprof | head
File: dcrwallet
Type: cpu
Time: Oct 19, 2023 at 11:06am (-03)
Duration: 14.79mins, Total samples = 737.47s (83.12%)
Showing nodes accounting for 600.61s, 81.44% of 737.47s total
Dropped 1028 nodes (cum <= 3.69s)
flat flat% sum% cum cum%
57.23s 7.76% 7.76% 58.27s 7.90% github.com/decred/dcrd/gcs/v4.(*bitReader).readNBits
45.90s 6.22% 13.98% 45.90s 6.22% runtime/internal/syscall.Syscall6
40.02s 5.43% 19.41% 40.20s 5.45% github.com/decred/dcrd/crypto/blake256.block
$ pprof -top spvconnect/cpu.pprof | head
File: dcrwallet
Type: cpu
Time: Oct 19, 2023 at 11:29am (-03)
Duration: 6.34mins, Total samples = 445.08s (117.06%)
Showing nodes accounting for 381.27s, 85.66% of 445.08s total
Dropped 938 nodes (cum <= 2.23s)
flat flat% sum% cum cum%
56.36s 12.66% 12.66% 57.15s 12.84% github.com/decred/dcrd/gcs/v4.(*bitReader).readNBits
34.36s 7.72% 20.38% 34.36s 7.72% github.com/dchest/siphash.Hash
30.01s 6.74% 27.13% 30.01s 6.74% runtime/internal/syscall.Syscall6
Going by the total Duration, reducing the duplication for the initial sync can (potentially) halve the time IBD takes.
My strategy for improving this is to:
1- Refactor the FetchMissingCfilters logic to make it syncer based instead of peer based #2291
- This will allow the spv syncer to request the cfilters from multiple peers without duplication during startup
2- Dedupe header logic
- getHeaders and handleBlockAnnouncements have a lot of logic in common, so it might be useful to dedupe/refactor the code.
- The minimum peer version already implies
sendHeaderssupport, so we can drop block inv announcements and rely only on the headers logic (as we should never receive block announcements via inv messages) - Peer init in connectToCandidates and connectToPersistent may also be somewhat dedupe'd.
3- Decouple header and cfilter fetching
- The previously mentioned
FetchMissingCfiltersalready gives the wallet the ability to have headers and cfilters at different sync heights. - This means we can decouple syncing, such that we may sync headers and cfilters at different speeds and from different peers.
4- Add an async GetCFilterV2 version
- One of the limitations of the P2P network is that only a single CFilter may be requested on each msg
- This leads to the current design, where up to 2k goroutines are initialized for each IBD peer and header range
- Adding an async version of GetCFilterv2 to the peering pkg means we can drop the need for so many goroutines
- The tradeoff is that syncing becomes more complex because the syncer will have to handle out of order and missing GCFilters
5- Abstract syncing from peer to syncer #2291
- Instead of having each peer perform header sync during its startup procedure, abstract the logic to the syncer.
- This means a central syncer func will fetch headers in sequence as needed from a select peer instead of all peers
- Rescanning and address discovery logic should be abstracted from individual peers and support fetching blocks from multiple peers as well
6- Add perf tracking and peer rotation
- One of the challenges in syncing from a single peer is that the peer may be slow, compared to others.
- The syncer logic should periodically rotate peers, dropping peers that are behind the best known chain and verifying if other peers have better perf statistics
- The syncer should be biased to using the best performing peers but with occasional peer rotation for exploration
The above is the rough plan, but might be subject to change, depending on how easy it is to keep the each step's refactoring reasonably small.