Skip to content

Conversation

@JJL772
Copy link
Contributor

@JJL772 JJL772 commented Oct 25, 2023

Each beacon has an associated mutex. If we don't cap the beacon count, IOCs running on resource limited platforms like RTEMS may eventually run out of resources and crash.

Closes #184

The configuration options are probably unnecessary, let me know if I should remove them.

Requires some changes to pvData: epics-base/pvDataCPP#94

Copy link
Member

@mdavidsaver mdavidsaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to see a PR addressing this issue. I think some further work is required though.

@mdavidsaver
Copy link
Member

Since it is not super obvious. The Beacon TX timing of pvAccessCPP differs from what experience with CA servers might lead you to expect.

_fastBeaconPeriod(std::max(context->getBeaconPeriod(), EPICS_PVA_MIN_BEACON_PERIOD)),
_slowBeaconPeriod(std::max(180.0, _fastBeaconPeriod)), // TODO configurable
_beaconCountLimit((int16)std::max(10.0f, EPICS_PVA_MIN_BEACON_COUNT_LIMIT)), // TODO configurable

A server will send out the first 10 beacons (not configurable) with a 15 second interval (by default), then switch to a 180 second period (not configurable). While $EPICS_PVAS_BEACON_PERIOD can override the first "fast" period, I don't think this is in practice useful.

So I think the beacon tracking lifetime must be >= 360 seconds.

(fyi. with PVXS I try to follow the same model and timings, with a non-configurable limit of 20k servers. Of course, there I only allocate ~64 bytes per server)

@mdavidsaver
Copy link
Member

The windows CI failures are due to epics-base/ci-scripts#84. When you update, please rebase to pick up ed7eae5.

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.70 completed (commit cdf3720715 by @JJL772)

@JJL772 JJL772 force-pushed the fix_semaphore branch 2 times, most recently from 2b51b97 to 777d68d Compare December 13, 2023 00:27
@JJL772
Copy link
Contributor Author

JJL772 commented Dec 13, 2023

@mdavidsaver Thanks for the feedback! I finally got around to applying the requested changes.
I ended up using pvxs as a reference and copied the max beacon lifetime (360s) and beacon limit (20000).

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.74 failed (commit 73c3932b45 by @JJL772)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.75 failed (commit 4e054f2e07 by @JJL772)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.79 completed (commit 2ae88b70f1 by @JJL772)

@JJL772
Copy link
Contributor Author

JJL772 commented Feb 28, 2024

@mdavidsaver Just wanted to follow up, are there any other changes required for this?

Comment on lines +4389 to +4399
if (m_beaconHandlers.size() >= maxTrackedBeacons)
{
char ipa[64];
sockAddrToDottedIP(&responseFrom->sa, ipa, sizeof(ipa));
LOG(logLevelDebug, "Tracked beacon limit reached (%d), ignoring %s\n", maxTrackedBeacons, ipa);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To minimize log spam it would be friendlier to only log when size()==max. So once each time the limit is reached, but not again until falling below the limit. eg. consider if some PVA server gets stuck in a reset loop.

@JJL772
Copy link
Contributor Author

JJL772 commented Jun 10, 2024

This PR now depends on some changes made to pvData: epics-base/pvDataCPP#94

I'm going to mark this as a draft for now because I'm not exactly happy with these changes yet.

@JJL772 JJL772 marked this pull request as draft June 10, 2024 19:43
@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.108 failed (commit c4e4658381 by @JJL772)

@JJL772 JJL772 marked this pull request as ready for review June 12, 2024 20:44
@AppVeyorBot
Copy link

Comment on lines 4631 to 4634
epicsTimeStamp ts;
epicsTimeGetCurrent(&ts);
if (epicsTimeDiffInSeconds(&ts, &m_handler.m_lastTime) > maxBeaconLifetime)
m_handler.remove();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any need for precision here. And calling epicsTimeGetCurrent() potentially 20,000 times could be avoided.

Replace epicsTimeStamp m_lastTime with a integer counter. Set the timer to eg maxBeaconLifetime/2. Increment the counter on expiration. If it reaches 3, then remove(). Have touch() zero the counter.

That said, this is an optimization. What you have here will work. So please let me know if you will not have to for this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good optimization, though it does introduce a bit of jitter to the beacon timer. I'm setting the timer to maxBeaconLifetime/4 so that beacon lifetime is 360-450s (worse case)

I've also reduced the beacon cap to be 2048, which is still excessive imo. 20k was just unnecessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good feel for what this cap should be. My most recent solid point of reference is NSLS2 circa 2015, when I snooped on beacon traffic to identify ~1000 CA server instances running on ~100 hosts. Well short of 20k, but only a factor of 2 from 2048.

PVXS currently tracks up to 20k addresses. I picked this number from thin air. Although I opted for a larger number because PVXS uses fewer resources per server ( ~= 256 bytes per).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this cap is the maximum number of servers that could be generating beacons visible to a client, the APS accelerator currently has 1136 IOCs in production (mostly CA currently) spread across 5 VLANs, with 881 IOCs in the largest VLAN. I don't see our total number increasing beyond 2048 IOCs, but we do have 13 VMs that run multiple soft IOCs (the largest VM has 214) so the number of visible servers isn't limited to the subnet size. Presumably configuring PVA servers through a shared multicast address would include the servers from multiple subnets too.

I'm fine with the cap being 2048 for pvAccessCPP. It might be worth having the PVXS cap be configurable by an env-var at some point, how many IOCs is ITER expecting to run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(* looking into the crystal bowl *)
Maybe 2500 on 500 hosts across 15 networks.
But that's 10 years from now, with an uncertainty of maybe 30%.

@JJL772 JJL772 force-pushed the fix_semaphore branch 2 times, most recently from 7a0921d to e21830f Compare November 21, 2025 01:09
Each beacon has an associated mutex. If we allocate
too many beacons on resource constrained systems, i.e.
RTEMS, we may run out of resources and crash.
Copy link
Member

@mdavidsaver mdavidsaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok. Thanks for your persistence!

@mdavidsaver mdavidsaver merged commit 205dc58 into epics-base:master Nov 21, 2025
16 of 25 checks passed
@JJL772 JJL772 deleted the fix_semaphore branch November 21, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resource leak in Beacon tracking

5 participants