Skip to content

Conversation

holgerd77
Copy link
Member

This PR makes the necessary preparations to allow for a local client connection with discovery and bootnode connection deactivated which will help on debugging networking issues (@ryanio that was also the source of my interest in #1145 since before it was not possible to deactivate the bootnode connections 😛 ).

For convenience I added two new npm start scripts along with a dedicated README section on how to easily use (see respective commits).

I managed to get a connection work (more hurdles than I expected though). Scarily enough this directly breaks though with:

INFO [03-11|14:37:25] Listener up transport=rlpx url=enode://66b5a3615e5a21901dd89966491ef7da8898693ef3a1fc4d86acdf8cb7dd8452cf551e716cc8ea1c4b1ddd26221bf07fcf18675dd2f3e29e5ab36b7783a4264e@[::]:30303
INFO [03-11|14:37:25] Listener up transport=libp2p url=/ip4/  devp2p:rlpx Restart connection refill .. with selector 0 peers: 0, queue size: 0, open slots: 25 +10s
  devp2p:rlpx Restart connection refill .. with selector 0 peers: 0, queue size: 0, open slots: 25 +10s
  devp2p:rlpx Restart connection refill .. with selector 0 peers: 0, queue size: 0, open slots: 25 +10s
  devp2p:rlpx Restart connection refill .. with selector 0 peers: 0, queue size: 0, open slots: 25 +10s
  devp2p:dpt:server received ping from 127.0.0.1:30304 (peerId: 84db925) +7s
  devp2p:dpt:server send pong to 127.0.0.1:30304 (peerId: 84db925) +1ms
  devp2p:rlpx connected to 127.0.0.1:59507, handshake waiting.. +13ms
  devp2p:rlpx:peer [ERROR] should have valid tag: e8cf20e1b9c1604cddae43b10ff9e0acb63ff04dbf3230caca01925e8be5bd17 / 55a0c304031331d3ec531752047f556176b7334353e49d456166fddd063872fa +15ms
  devp2p:rlpx:peer Send DISCONNECT to 127.0.0.1:59507 (reason: SUBPROTOCOL_ERROR) +124ms
  devp2p:rlpx:peer Error on peer socket data handling: AssertionError [ERR_ASSERTION]: should have valid tag: e8cf20e1b9c1604cddae43b10ff9e0acb63ff04dbf3230caca01925e8be5bd17 / 55a0c304031331d3ec531752047f556176b7334353e49d456166fddd063872fa +0ms
ERROR [03-11|14:39:22] AssertionError [ERR_ASSERTION]: should have valid tag: e8cf20e1b9c1604cddae43b10ff9e0acb63ff04dbf3230caca01925e8be5bd17 / 55a0c304031331d3ec531752047f556176b7334353e49d456166fddd063872fa
    at new AssertionError (internal/assert/assertion_error.js:447:11)
    at Object.assertEq (/EthereumJS/ethereumjs-monorepo/packages/devp2p/src/util.ts:74:11)
    at ECIES._decryptMessage (/EthereumJS/ethereumjs-monorepo/packages/devp2p/src/rlpx/ecies.ts:138:5)
    at ECIES.parseAuthPlain (/EthereumJS/ethereumjs-monorepo/packages/devp2p/src/rlpx/ecies.ts:214:28)
    at ECIES.parseAuthEIP8 (/EthereumJS/ethereumjs-monorepo/packages/devp2p/src/rlpx/ecies.ts:266:10)
    at Peer._handleAuth (/EthereumJS/ethereumjs-monorepo/packages/devp2p/src/rlpx/peer.ts:293:26)
    at Peer._onSocketData (/EthereumJS/ethereumjs-monorepo/packages/devp2p/src/rlpx/peer.ts:522:18)
    at Socket.emit (events.js:314:20)
    at Socket.EventEmitter.emit (domain.js:483:12)
    at addChunk (_stream_readable.js:297:12)

(this might be a major left source for our networking issues)

Will have a look into this tomorrow.

@holgerd77
Copy link
Member Author

(and I have no idea where this 59507 port is coming from?)

@holgerd77 holgerd77 force-pushed the local-client-connectivity branch from d3962ae to ff44614 Compare March 11, 2021 14:01
@jochem-brouwer
Copy link
Member

Hi @holgerd77, just saw this and wanted to chime in that it could possibly be this?

@jochem-brouwer
Copy link
Member

(For this random port? If you restart your client you probably get a different port?)

@codecov
Copy link

codecov bot commented Mar 11, 2021

Codecov Report

Merging #1147 (8861290) into master (2c455a7) will increase coverage by 0.03%.
The diff coverage is 45.45%.

Impacted file tree graph

Flag Coverage Δ
block 81.78% <ø> (-0.04%) ⬇️
blockchain 84.13% <ø> (-0.07%) ⬇️
client 84.05% <45.45%> (+0.01%) ⬆️
common 86.12% <ø> (ø)
devp2p 84.33% <ø> (+0.14%) ⬆️
ethash 82.08% <ø> (ø)
tx 84.01% <ø> (ø)
vm 81.67% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Base automatically changed from monorepo-add-util to master March 11, 2021 18:34
@holgerd77
Copy link
Member Author

@jochem-brouwer thanks, my assumption on the port question is that we might mix up the discovery (DPT) port and the RLPx port. The fixed and deterministic port is the discovery port (so mostly: 30303), since the discovery handshake is always used to start an initial peer connection, also for known peers, since one first has to find out if a peer is alive and open for a connection. Then a subsequently following RLPx connection is done via a randomly chosen RLPx port (I guess the protocol is just not meant for using the same socket here to decouple or avoid concurrency issues or something?).

All this might need some additional confirmation, but that's my current understanding. Setting a reliable discovery port for the client is also fixed here in the PR (this is working), although fixing this wasn't my primary intention for the PR. 😄 This port can then also be used for a local connection with geth.

… provide feedback on peer searching when discovery enabled
…ection debugging, added networking debugging section in developer README
@holgerd77 holgerd77 mentioned this pull request Mar 12, 2021
@holgerd77 holgerd77 force-pushed the local-client-connectivity branch from ff44614 to 0959546 Compare March 12, 2021 10:13
@holgerd77
Copy link
Member Author

@ryanio very strange coincidence: apart from this datadir bug it seems that after merging in #1067 and rebasing this on top it very much seems to be the case that the bug from above is gone, two local clients are now connecting repeatedly to each other and the second client using the first as a bootnode is syncing the chain. 😄 😜

I just have no idea what caused this. I first had private key generation not being done with random() directly any more but coming from devp2p and using the genPrivateKey() which does multiple iterations if a generated key is not correct. I tested this however a couple of times, and it seems that normally this method also simply returns on the first try. If someone has an idea here I would be interested to know.

Anyhow. 😛

Will apart from that do some more tests and eventual 1-2 follow-up commits here and than open up for review.

@holgerd77
Copy link
Member Author

And generally it's really cool to see a local sync, following sync process is e.g. solely done through the local RLPx connection:

image

And this gives a much cleaner view on the local network protocol exchange process:

image

@holgerd77 holgerd77 force-pushed the local-client-connectivity branch from 809125b to 63e7780 Compare March 12, 2021 11:37
@holgerd77
Copy link
Member Author

holgerd77 commented Mar 12, 2021

I've added some instructions on how to locally connect to a geth instance in both ways:

  1. EthereumJS <- Geth (likely with EthereumJS having some chain data)
  2. Geth <- EthereumJS (likely with Geth having some chain data)

Both generally work, I am not totally satisfied with the Geth options for 2. though yet. If the --nodiscover option on Geth is used there is just no connection at all possible, so I opted for a suggestion to run with geth [--syncmode=full] [--verbosity=5] and turn the internet connection off to somewhat isolate on the debug traffic. This nevertheless still produces a lot of discovery debug output. I also experimented with --netrestrict=127.0.0.1/24 but this leaves one with a lot of netrestrict error debug messages on discovery. 😜 If someone has some better way (always with the target to best isolate the network connection process and debug output) let me know, also maybe thinking in different directions (filtering of some sort or whatever)?

Both described ways are nevertheless also sufficient for a start and we can also further evolve on this later on (if possible).

@holgerd77
Copy link
Member Author

On the client behavior itself on the two connection setups (not necessarily a part of this PR):

Both connections are not working sufficiently yet. 😜

  1. Gets disconnected from Geth because we are not serving any headers on request (from a fast syncing node):

image

This should be relatively easily fixable though.

  1. This is dropped from Geth with:
WARN [03-12|13:13:53.010] Dropping unsynced node during fast sync  id=5b58fe45d523f96d conn=inbound addr=127.0.0.1:64023       type=ethereumjs-devp2p/v3...

This seems to be expected behavior I guess (?). Assumption (not proven): geth doesn't want to accept unsynced connections while still being in a sync process itself.

@holgerd77
Copy link
Member Author

Case 1. might actually also be some default behavior:

WARN [03-12|13:59:22.319] Synchronisation failed, dropping peer    peer=c843cc42d11456df err="unsynced peer: remote head 16500 below checkpoint 11337727"

@holgerd77
Copy link
Member Author

Anyhow, apart from all potential debugging: this is open for review now. 😄

Copy link
Contributor

@ryanio ryanio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, lgtm!

@holgerd77 holgerd77 merged commit 8c656da into master Mar 13, 2021
@holgerd77 holgerd77 deleted the local-client-connectivity branch March 13, 2021 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants