-
Notifications
You must be signed in to change notification settings - Fork 24
clientContextImpl: Cap the number and age of beacons #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mdavidsaver
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to see a PR addressing this issue. I think some further work is required though.
|
Since it is not super obvious. The Beacon TX timing of pvAccessCPP differs from what experience with CA servers might lead you to expect. pvAccessCPP/src/server/beaconEmitter.cpp Lines 31 to 33 in 581d100
A server will send out the first 10 beacons (not configurable) with a 15 second interval (by default), then switch to a 180 second period (not configurable). While So I think the beacon tracking lifetime must be >= 360 seconds. (fyi. with PVXS I try to follow the same model and timings, with a non-configurable limit of 20k servers. Of course, there I only allocate ~64 bytes per server) |
|
The windows CI failures are due to epics-base/ci-scripts#84. When you update, please rebase to pick up ed7eae5. |
|
✅ Build pvAccessCPP 1.0.70 completed (commit cdf3720715 by @JJL772) |
2b51b97 to
777d68d
Compare
|
@mdavidsaver Thanks for the feedback! I finally got around to applying the requested changes. |
|
❌ Build pvAccessCPP 1.0.74 failed (commit 73c3932b45 by @JJL772) |
777d68d to
afca135
Compare
|
❌ Build pvAccessCPP 1.0.75 failed (commit 4e054f2e07 by @JJL772) |
afca135 to
d44adc9
Compare
|
✅ Build pvAccessCPP 1.0.79 completed (commit 2ae88b70f1 by @JJL772) |
|
@mdavidsaver Just wanted to follow up, are there any other changes required for this? |
| if (m_beaconHandlers.size() >= maxTrackedBeacons) | ||
| { | ||
| char ipa[64]; | ||
| sockAddrToDottedIP(&responseFrom->sa, ipa, sizeof(ipa)); | ||
| LOG(logLevelDebug, "Tracked beacon limit reached (%d), ignoring %s\n", maxTrackedBeacons, ipa); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To minimize log spam it would be friendlier to only log when size()==max. So once each time the limit is reached, but not again until falling below the limit. eg. consider if some PVA server gets stuck in a reset loop.
|
This PR now depends on some changes made to pvData: epics-base/pvDataCPP#94 I'm going to mark this as a draft for now because I'm not exactly happy with these changes yet. |
|
❌ Build pvAccessCPP 1.0.108 failed (commit c4e4658381 by @JJL772) |
|
✅ Build pvAccessCPP 1.0.109 completed (commit 9651462441 by @JJL772) |
| epicsTimeStamp ts; | ||
| epicsTimeGetCurrent(&ts); | ||
| if (epicsTimeDiffInSeconds(&ts, &m_handler.m_lastTime) > maxBeaconLifetime) | ||
| m_handler.remove(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any need for precision here. And calling epicsTimeGetCurrent() potentially 20,000 times could be avoided.
Replace epicsTimeStamp m_lastTime with a integer counter. Set the timer to eg maxBeaconLifetime/2. Increment the counter on expiration. If it reaches 3, then remove(). Have touch() zero the counter.
That said, this is an optimization. What you have here will work. So please let me know if you will not have to for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good optimization, though it does introduce a bit of jitter to the beacon timer. I'm setting the timer to maxBeaconLifetime/4 so that beacon lifetime is 360-450s (worse case)
I've also reduced the beacon cap to be 2048, which is still excessive imo. 20k was just unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good feel for what this cap should be. My most recent solid point of reference is NSLS2 circa 2015, when I snooped on beacon traffic to identify ~1000 CA server instances running on ~100 hosts. Well short of 20k, but only a factor of 2 from 2048.
PVXS currently tracks up to 20k addresses. I picked this number from thin air. Although I opted for a larger number because PVXS uses fewer resources per server ( ~= 256 bytes per).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this cap is the maximum number of servers that could be generating beacons visible to a client, the APS accelerator currently has 1136 IOCs in production (mostly CA currently) spread across 5 VLANs, with 881 IOCs in the largest VLAN. I don't see our total number increasing beyond 2048 IOCs, but we do have 13 VMs that run multiple soft IOCs (the largest VM has 214) so the number of visible servers isn't limited to the subnet size. Presumably configuring PVA servers through a shared multicast address would include the servers from multiple subnets too.
I'm fine with the cap being 2048 for pvAccessCPP. It might be worth having the PVXS cap be configurable by an env-var at some point, how many IOCs is ITER expecting to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(* looking into the crystal bowl *)
Maybe 2500 on 500 hosts across 15 networks.
But that's 10 years from now, with an uncertainty of maybe 30%.
7a0921d to
e21830f
Compare
Each beacon has an associated mutex. If we allocate too many beacons on resource constrained systems, i.e. RTEMS, we may run out of resources and crash.
e21830f to
ca85002
Compare
mdavidsaver
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok. Thanks for your persistence!
Each beacon has an associated mutex. If we don't cap the beacon count, IOCs running on resource limited platforms like RTEMS may eventually run out of resources and crash.
Closes #184
The configuration options are probably unnecessary, let me know if I should remove them.
Requires some changes to pvData: epics-base/pvDataCPP#94