-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Security/LicenseLicense functionality for commercial featuresLicense functionality for commercial features>bugTeam:SecurityMeta label for security teamMeta label for security team
Description
We saw a large cluster in which all the management threads were stuck trying to record the fact that they had just used the WRITE_LOAD_FORECAST_FEATURE
, which is something that's used in a tight loop over the shards:
99.1% [cpu=99.1%, other=0.0%] (495.5ms out of 500ms) cpu usage by thread 'elasticsearch[REDACTED][management][T#1]'
5/10 snapshots sharing following 20 elements
[email protected]/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1031)
[email protected]/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
[email protected]/org.elasticsearch.license.XPackLicenseState.featureUsed(XPackLicenseState.java:421)
[email protected]/org.elasticsearch.license.LicensedFeature$Momentary.check(LicensedFeature.java:32)
org.elasticsearch.xpack.writeloadforecaster.WriteLoadForecasterPlugin.hasValidLicense(WriteLoadForecasterPlugin.java:44)
org.elasticsearch.xpack.writeloadforecaster.WriteLoadForecasterPlugin$$Lambda/0x00007f396f4c0d40.getAsBoolean(Unknown Source)
org.elasticsearch.xpack.writeloadforecaster.LicensedWriteLoadForecaster.getForecastedWriteLoad(LicensedWriteLoadForecaster.java:146)
app/[email protected]/org.elasticsearch.cluster.routing.allocation.AllocationStatsService.stats(AllocationStatsService.java:64)
app/[email protected]/org.elasticsearch.action.admin.cluster.allocation.TransportGetAllocationStatsAction.masterOperation(TransportGetAllocationStatsAction.java:80)
app/[email protected]/org.elasticsearch.action.admin.cluster.allocation.TransportGetAllocationStatsAction.masterOperation(TransportGetAllocationStatsAction.java:37)
app/[email protected]/org.elasticsearch.action.support.master.TransportMasterNodeAction.executeMasterOperation(TransportMasterNodeAction.java:125)
app/[email protected]/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:236)
app/[email protected]/org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$$Lambda/0x00007f3970689ad0.accept(Unknown Source)
app/[email protected]/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:100)
app/[email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
app/[email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
[email protected]/java.lang.Thread.runWith(Thread.java:1583)
[email protected]/java.lang.Thread.run(Thread.java:1570)
I think we should change this usage tracking to avoid such heavy updates to individual keys even when checked so frequently by multiple threads. But maybe I'm wrong and we should treat the license check as expensive and avoid calling it in a tight loop like this, in which case we're missing some docs about that.
Metadata
Metadata
Assignees
Labels
:Security/LicenseLicense functionality for commercial featuresLicense functionality for commercial features>bugTeam:SecurityMeta label for security teamMeta label for security team