[WIP] Feature - rework comms utils / add fluxd zmq#1361
Draft
MorningLightMountain713 wants to merge 16 commits intodevelopmentfrom
Draft
[WIP] Feature - rework comms utils / add fluxd zmq#1361MorningLightMountain713 wants to merge 16 commits intodevelopmentfrom
MorningLightMountain713 wants to merge 16 commits intodevelopmentfrom
Conversation
a3c6259 to
fb42807
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work in progress, code is dirty, tests will be broken, untested on old versions
I am running this on a node now.
What this pull does
fluxCommunicationsUtilsmodule - renamed tonetworkStateServicefullnodeinline package to be more flexible (and parse / write the config)fluxCommunicationstuffThis pull kind of got away on me a little. It's too big. I will split out the zmq / fluxd stuff so we can enable it in stages, as I need to make sure I get the fluxd config file stuff correct - for obvious reasons.
Background
I'll do some diagrams and stuff to make it easier to show what is going on, both before and after.
Historically,
fluxOSpollsfluxdfor various data, mainly thedeterministicfluxnodelistwhich I am now calling thenetworkState(I think that is a more appropriate identifier). The other heavy user of the fluxd api is theblockProcessorwhich polls every 5 seconds. The blockProcessor is out of scope here - but could benefit greatly from using the pub / sub zmq feature in the future.The deterministicfluxnodelist right now is 8.22Mb. So every call to get this list is a lot of i/o.
fluxOShas multiple levels of caching. One at the rpc level (daemonFluxNodeRpcs) where there is a 20 second LRU cache, on both the entire networkState, or if a specific pubkey is searched, then that is cached.There was a second level of caching in the fluxCommsUtils - this would cache the same thing, but for 4 minutes instead. However, when pulling the full list, it wasn't indexed, so any pubkey searches had to go out to the api again, and were subsequently cached.
fluxOS would then go out every 2 minutes and update the full list. Due to the nature of the way blocks can come in, an LRU cache isn't really a good fit for what we're trying to achieve here; meaning that if a bunch of blocks come in over a short period, the list can get stale (and invalid) quite quick.
In this update, we now subscribe to the fluxd zmq endpoint for updates, now, whenever a block is generated, we get notified immediately, and fetch the full list. There is no more polling for the list; Except if there is an issue with the zmq socket, we fall back to polling.
So now we get up to date data. We don't use any LRU caching at all. (it's enabled by default in the daemonFluxNodeRpcs but I've added an override feature) As soon as a block is generated, the old data is invalidated. I.e. old nodes may have been removed etc.
We don't need caching as we are building our own pubkey and endpoint indexes. This means you can look up a pubkey or ip:port locally without having to go out to the rpc. During testing, with ~13k nodes, it takes approx 8ms to build the indexes. Normally this would block the event queue, which is bad. However, I've partitioned the index building so it is at most O(1000), instead of O(n). We are building the index in chunks of 1000 nodes, then yield to the event queue, and move on to the next chunk. This way, no matter how big the node count gets, it won't block the event queue. I was initially using worker threads for this, but overkill for our application.
Old way:
New way:
Will add more detail here soon.