Skip to content

X-pack feature usage tracking may be heavily contended #111473

@DaveCTurner

Description

@DaveCTurner

We saw a large cluster in which all the management threads were stuck trying to record the fact that they had just used the WRITE_LOAD_FORECAST_FEATURE, which is something that's used in a tight loop over the shards:

   99.1% [cpu=99.1%, other=0.0%] (495.5ms out of 500ms) cpu usage by thread 'elasticsearch[REDACTED][management][T#1]'
     5/10 snapshots sharing following 20 elements
       [email protected]/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1031)
       [email protected]/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
       [email protected]/org.elasticsearch.license.XPackLicenseState.featureUsed(XPackLicenseState.java:421)
       [email protected]/org.elasticsearch.license.LicensedFeature$Momentary.check(LicensedFeature.java:32)
       org.elasticsearch.xpack.writeloadforecaster.WriteLoadForecasterPlugin.hasValidLicense(WriteLoadForecasterPlugin.java:44)
       org.elasticsearch.xpack.writeloadforecaster.WriteLoadForecasterPlugin$$Lambda/0x00007f396f4c0d40.getAsBoolean(Unknown Source)
       org.elasticsearch.xpack.writeloadforecaster.LicensedWriteLoadForecaster.getForecastedWriteLoad(LicensedWriteLoadForecaster.java:146)
       app/[email protected]/org.elasticsearch.cluster.routing.allocation.AllocationStatsService.stats(AllocationStatsService.java:64)
       app/[email protected]/org.elasticsearch.action.admin.cluster.allocation.TransportGetAllocationStatsAction.masterOperation(TransportGetAllocationStatsAction.java:80)
       app/[email protected]/org.elasticsearch.action.admin.cluster.allocation.TransportGetAllocationStatsAction.masterOperation(TransportGetAllocationStatsAction.java:37)
       app/[email protected]/org.elasticsearch.action.support.master.TransportMasterNodeAction.executeMasterOperation(TransportMasterNodeAction.java:125)
       app/[email protected]/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:236)
       app/[email protected]/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$$Lambda/0x00007f3970689ad0.accept(Unknown Source)
       app/[email protected]/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:100)
       app/[email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
       app/[email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       [email protected]/java.lang.Thread.runWith(Thread.java:1583)
       [email protected]/java.lang.Thread.run(Thread.java:1570)

I think we should change this usage tracking to avoid such heavy updates to individual keys even when checked so frequently by multiple threads. But maybe I'm wrong and we should treat the license check as expensive and avoid calling it in a tight loop like this, in which case we're missing some docs about that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Security/LicenseLicense functionality for commercial features>bugTeam:SecurityMeta label for security team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions