Skip to content

Conversation

@nibix
Copy link
Collaborator

@nibix nibix commented Jul 16, 2025

Description

This moves the reloading process of the security configuration (including reading the index and parsing and activating the configuration) to a dedicated thread. Any code which wants to trigger a configuration reload now only signals the thread to perform the reload.

This will especially be the TransportConfigUpdateAction. This action waited so far on a lock for any ongoing configuration reload to complete. Thus, triggering the TransportConfigUpdateAction several times in environments with slow configuration reload processes could exhaust the MANAGEMENT thread pool and thus lead to node failures.

The new code tries to minimize the number of reloads. Thus, if there is already a reload request queued, any further reload request will be merged into the one that is already queued.

  • Category: Enhancement
  • Why these changes are required?: Doing many config updates in a row could exhaust the MANAGEMENT thread pool, which can lead to node failures.
  • What is the old behavior before changes and new behavior after changes? No behavioral changes

Issues Resolved

Testing

  • New unit test
  • Existing integration tests

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@nibix
Copy link
Collaborator Author

nibix commented Jul 16, 2025

@cwperks @kumargu @shikharj05 See here for a draft of the changes for async config updates.

There's one issue which I did not anticipate:

Many tests rely on the fact that the changed configuration is instantly available after the TransportConfigUpdateAction has been called and returned. With the async handling, this is not the case any more. Thus, we have a number of test failures at the moment.

Of course, one could go ahead and adapt these tests to use Awaitility or something similar to wait for the updated configuration. But, in the end, this is also a behavioral change, and I am wondering whether we need to preserve this behavior.

It might be possible by creating an async version of the nodeOperation in TransportNodesAction ... I was always wondering why this was not async. I am not sure if there are any reasons against it.

@cwperks
Copy link
Member

cwperks commented Jul 16, 2025

@nibix should we consider keeping the behavior where REST API calls to security apis only return to the caller when all nodes have finished updating? If there is an async version of each security api, then I think it needs to be similar to async search where a task id is returned that you can then use to get the status of the update.

One thing that was discussed was creating a dedicated threadpool for security config updates (The size of the pool can even be 1 to ensure a single thread is performing updates at any given moment). Plugins can define threadpools like this: #5464 (comment)

Then you can use the name of the threadpool in the constructor for TransportConfigUpdateAction.

@nibix
Copy link
Collaborator Author

nibix commented Jul 16, 2025

should we consider keeping the behavior where REST API calls to security apis only return to the caller when all nodes have finished updating?

yes I am looking into this at the moment.

One thing that was discussed was creating a dedicated threadpool for security config updates (The size of the pool can even be 1 to ensure a single thread is performing updates at any given moment).

I do not think thread pools are a good way to go. If you use a thread pool, you do not have any way to deduplicate redundant update requests. If you do 100 REST API calls for config updates quickly one after another, you will also trigger 100 config updates (provided your thread pool queue size is configured to be that big), even though completing these might take up several minutes after the final REST API call to complete.

With the solution in this PR, we have a deduplication of redundant update requests.

@nibix
Copy link
Collaborator Author

nibix commented Jul 22, 2025

should we consider keeping the behavior where REST API calls to security apis only return to the caller when all nodes have finished updating?

This is now implemented by creating an async version of TransportNodesAction. Potentially, we can/should move that to the core project.

@codecov
Copy link

codecov bot commented Jul 22, 2025

Codecov Report

❌ Patch coverage is 88.69048% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.29%. Comparing base (acadb8c) to head (0555c84).
⚠️ Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
...earch/security/util/TransportNodesAsyncAction.java 83.75% 6 Missing and 7 partials ⚠️
...ecurity/configuration/ConfigurationRepository.java 92.30% 2 Missing and 4 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5479      +/-   ##
==========================================
+ Coverage   73.15%   73.29%   +0.14%     
==========================================
  Files         412      412              
  Lines       26168    26299     +131     
  Branches     3963     3984      +21     
==========================================
+ Hits        19142    19276     +134     
+ Misses       5125     5113      -12     
- Partials     1901     1910       +9     
Files with missing lines Coverage Δ
.../opensearch/security/OpenSearchSecurityPlugin.java 85.41% <100.00%> (+0.01%) ⬆️
...tion/configupdate/TransportConfigUpdateAction.java 88.88% <100.00%> (+1.79%) ⬆️
...ecurity/configuration/ConfigurationRepository.java 88.34% <92.30%> (+4.82%) ⬆️
...earch/security/util/TransportNodesAsyncAction.java 83.75% <83.75%> (ø)

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nibix nibix marked this pull request as ready for review July 22, 2025 15:45
Copy link
Member

@DarshitChanpura DarshitChanpura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nibix for this fix! Left a few questions.

shikharj05
shikharj05 previously approved these changes Aug 4, 2025
@nibix
Copy link
Collaborator Author

nibix commented Aug 6, 2025

@cwperks @kumargu @kkhatua

We invested some time into reproducing the issue in order to verify our fixes. Of course, we had to make some assumptions to fill in the unknowns.

We both reproduced the issue on a docker based cluster and using a Java based integration test. This is the code for the test:

@RunWith(com.carrotsearch.randomizedtesting.RandomizedRunner.class)
@ThreadLeakScope(ThreadLeakScope.Scope.NONE)
public class ManagementThreadPoolExhaustionTest2 {

    @ClassRule
    public static LocalCluster cluster = new LocalCluster.Builder().clusterManager(ClusterManager.DEFAULT)
            .authc(TestSecurityConfig.AuthcDomain.AUTHC_HTTPBASIC_INTERNAL)
            .users(TestSecurityConfig.User.USER_ADMIN)
            .roles(createTestRoles())
            .nodeSettings(Map.of("cluster_manager.throttling.thresholds.auto-create.value", 3000, "cluster.max_shards_per_node", 10000))
            .build();

    @BeforeClass
    public static void createTestData() throws Exception {
        try (Client client = cluster.getInternalNodeClient()) {
            IndicesAliasesRequest indicesAliasesRequest = new IndicesAliasesRequest();

            for (int i = 0; i < 2000; i++) {
                String index = ".kibana_t_" + i + "_001";
                CreateIndexRequest request = new CreateIndexRequest(index).settings(Map.of("index.number_of_shards", 1, "index.number_of_replicas", 0));
                CreateIndexResponse response = client.admin().indices().create(request).actionGet();
                System.out.println(Strings.toString(XContentType.JSON, response));
                indicesAliasesRequest.addAliasAction(IndicesAliasesRequest.AliasActions.add().alias(".kibana_t_" + i).indices(index));
            }

            for (int i = 0; i < 200; i++) {
                CreateIndexRequest request = new CreateIndexRequest("my_example_index_" + i).settings(Map.of("index.number_of_shards", 1, "index.number_of_replicas", 0));
                CreateIndexResponse response = client.admin().indices().create(request).actionGet();
            }

            client.admin().indices().aliases(indicesAliasesRequest).actionGet();

        }
    }

    @Test
    public void test() throws Exception {
        State state = new State();
        Thread nodeStatsThread = new Thread(() -> {
            try (TestRestClient client = cluster.getRestClient(TestSecurityConfig.User.USER_ADMIN)) {
                for (int i = 0; i < 1000; i++) {
                    int initialPendingRequests = state.pendingCreateUserRequests;
                    long start = System.currentTimeMillis();
                    TestRestClient.HttpResponse response = client.get("_nodes/stats");
                    System.out.println("_nodes/stats called when " + initialPendingRequests + " were pending took " + (System.currentTimeMillis() - start) + " ms");
                    System.out.println(parseNodeStatsResponse(response));
                    System.out.println(response.getBody());
                    Thread.sleep(100);
                }
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        });
        nodeStatsThread.setDaemon(true);
        nodeStatsThread.start();

        for (int i = 0; i < 10; i++) {
            String userName = "user_" + i;
            Thread createUserThread = new Thread(() -> {
                state.pendingCreateUserRequests++;
                try (TestRestClient client = cluster.getRestClient(TestSecurityConfig.User.USER_ADMIN, cluster.getAdminCertificate())) {
                    System.out.println("Creating " + userName);
                    long start = System.currentTimeMillis();
                        TestRestClient.HttpResponse response = client.putJson("_plugins/_security/api/internalusers/"+ userName, """
                                {
                                    "password": "secret+1234A",
                                    "backend_roles": ["role"]
                                }
                                """);
                        System.out.println("Finished creating " + userName + "; took " + (System.currentTimeMillis() - start) + "ms \n" + response.getBody());
                } finally {
                    state.pendingCreateUserRequests--;
                }
            });
            createUserThread.start();
            Thread.sleep(500);
        }

        Thread.sleep(100 * 1000);
    }

    static String parseNodeStatsResponse(TestRestClient.HttpResponse response) {
        if (response.getBody().contains("receive_timeout_transport_exception")) {
            return "TIMEOUT\n";
        } else {
            JsonNode responseJsonNode = response.bodyAsJsonNode();
            JsonNode nodes = responseJsonNode.get("nodes");
            Iterator<String> fieldNames = nodes.fieldNames();
            StringBuilder result = new StringBuilder();
            while (fieldNames.hasNext()) {
                String nodeId = fieldNames.next();
                JsonNode node = nodes.get(nodeId);
                JsonNode threadPool = node.get("thread_pool");
                JsonNode managementThreadPool = threadPool.get("management");
                result.append(nodeId + ": management thread pool: active: " + managementThreadPool.get("active") + "/5" + "; queue: " + managementThreadPool.get("queue") + "\n");
            }

            return result.toString();
        }
    }

    static TestSecurityConfig.Role [] createTestRoles() {
        List<TestSecurityConfig.Role> result = new ArrayList<>();

        for (int i = 0; i < 2500; i++) {
            result.add(new TestSecurityConfig.Role("role" + i).indexPermissions("crud").on("*example*", ".*example*"));
        }

        return result.toArray(new TestSecurityConfig.Role[0]);
    }

    static class State {
        int pendingCreateUserRequests = 0;
    }
}

Our manual tests were executed on OpenSearch 2.19.2, the Java test was executed on a recent OpenSearch main snapshot, however with the optimizations from #5470 and #5471 reverted.

Both the manual test and the automatic test follow the same structure:

  • Create 2500 indices with corresponding aliases
  • Create 2500 roles of the form
role_«index»:
  index_permissions:
  - index_patterns: ['*example*', '.*example*']
    allowed_actions: ['crud']
  tenant_permissions:
  - tenant_patterns: ['tenant_«index»']
    allowed_actions: ['foo']
  • Create a thread that pools the _nodes/stats REST API every 100 ms. We print the result from the API (which includes thread pool stats) and the latency.
  • Call one of the security config REST APIs to modify the security configuration. The calls are spaced by 500 ms intervals, but we do not wait for them to return.

The output of the test (without full node stats due to the length):

_nodes/stats called when 1 were pending took 1129 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 1/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 1/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 1/5; queue: 0

_nodes/stats called when 3 were pending took 46 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 1/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 1/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 1/5; queue: 0

_nodes/stats called when 3 were pending took 49 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 1/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 1/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 1/5; queue: 0

_nodes/stats called when 2 were pending took 38 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 2/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 2/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 2/5; queue: 0

_nodes/stats called when 2 were pending took 39 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 2/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 2/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 2/5; queue: 0

_nodes/stats called when 3 were pending took 48 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 3/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 3/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 3/5; queue: 0

_nodes/stats called when 3 were pending took 39 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 3/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 3/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 3/5; queue: 0

_nodes/stats called when 4 were pending took 33 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 3/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 3/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 5/5; queue: 1

_nodes/stats called when 4 were pending took 34 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 4/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 4/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 4/5; queue: 0

_nodes/stats called when 4 were pending took 35 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 4/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 4/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 4/5; queue: 0

_nodes/stats called when 5 were pending took 34 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 5/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 5/5; queue: 0
a10khf__T_-0LCuR_____w: management thread pool: active: 5/5; queue: 1

_nodes/stats called when 5 were pending took 29 ms
7HexEAAAQACkUO79_____w: management thread pool: active: 5/5; queue: 0
JpyKdwAAQACbBJGZ_____w: management thread pool: active: 5/5; queue: 1
a10khf__T_-0LCuR_____w: management thread pool: active: 5/5; queue: 4

_nodes/stats called when 6 were pending took 30013 ms
TIMEOUT
{"_nodes":{"total":3,"successful":0,"failed":3,"failures":[{"type":"failed_node_exception","reason":"Failed node [7HexEAAAQACkUO79_____w]","node_id":"7HexEAAAQACkUO79_____w","caused_by":{"type":"receive_timeout_transport_exception","reason":"[cluster_manager_0][127.0.0.1:47320][cluster:monitor/nodes/stats[n]] request_id [96782] timed out after [30026ms]"}},{"type":"failed_node_exception","reason":"Failed node [JpyKdwAAQACbBJGZ_____w]","node_id":"JpyKdwAAQACbBJGZ_____w","caused_by":{"type":"receive_timeout_transport_exception","reason":"[data_1][127.0.0.1:47331][cluster:monitor/nodes/stats[n]] request_id [96783] timed out after [30026ms]"}},{"type":"failed_node_exception","reason":"Failed node [a10khf__T_-0LCuR_____w]","node_id":"a10khf__T_-0LCuR_____w","caused_by":{"type":"receive_timeout_transport_exception","reason":"[data_0][127.0.0.1:47330][cluster:monitor/nodes/stats[n]] request_id [96784] timed out after [30026ms]"}}]},"cluster_name":"local_cluster_1","nodes":{}}

In the output, the phrase "when 5 were pending" refers to the sent config REST API requests which did not get a response yet. Theoretically, we should get already a timeout for 5 pending requests, due to the size of the management thread pool, which is 5. In practise, we seem to get one more because one of the pending threads finishes while we are waiting for the node stats response. Thus, the thread gets freed for the node stats response.

We then executed the same test with the commit from this PR included. Then, we could not reproduce the timeout, even though we had up to 8 pending config REST API calls:

[...]

_nodes/stats called when 8 were pending took 26 ms
Pk_1HgAAQACaKVY0AAAAAA: management thread pool: active: 1/5; queue: 0
EyUArP__T_-QZZiM_____w: management thread pool: active: 1/5; queue: 0
M6q7QAAAQACdMpQfAAAAAA: management thread pool: active: 1/5; queue: 0

We also captured thread dumps to check for any deadlocks or other issues. We only found what we expected: In the version without the optimization, all management threads were busy or waiting for the ConfigurationRepository lock:

"opensearch[opensearch-node1][management][T#1]" #67 [304] daemon prio=5 os_prio=0 cpu=706242.20ms elapsed=8109.25s tid=0x0000fffec000a680 nid=304 waiting on condition  [0x0000fffeea57f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x00000000c56b75f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:269)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:756)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos([email protected]/AbstractQueuedSynchronizer.java:1038)
	at java.util.concurrent.locks.ReentrantLock$Sync.tryLockNanos([email protected]/ReentrantLock.java:168)
	at java.util.concurrent.locks.ReentrantLock.tryLock([email protected]/ReentrantLock.java:479)
	at org.opensearch.security.configuration.ConfigurationRepository.loadConfigurationWithLock(ConfigurationRepository.java:537)
	at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:532)
	at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:523)
	at org.opensearch.security.action.configupdate.TransportConfigUpdateAction.nodeOperation(TransportConfigUpdateAction.java:128)
	at org.opensearch.security.action.configupdate.TransportConfigUpdateAction.nodeOperation(TransportConfigUpdateAction.java:52)
	at org.opensearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:200)
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:334)
	at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:330)
	at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceivedDecorate(SecuritySSLRequestHandler.java:207)
	at org.opensearch.security.transport.SecurityRequestHandler.messageReceivedDecorate(SecurityRequestHandler.java:314)
	at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:155)
	at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:873)
	at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113)
	at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:44)
	at org.opensearch.performanceanalyzer.transport.RTFPerformanceAnalyzerTransportRequestHandler.messageReceived(RTFPerformanceAnalyzerTransportRequestHandler.java:63)
	at org.opensearch.wlm.WorkloadManagementTransportInterceptor$RequestHandler.messageReceived(WorkloadManagementTransportInterceptor.java:63)
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:108)
	at org.opensearch.transport.NativeMessageHandler$RequestHandler.doRun(NativeMessageHandler.java:487)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1014)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1144)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:642)
	at java.lang.Thread.runWith([email protected]/Thread.java:1596)
	at java.lang.Thread.run([email protected]/Thread.java:1583)

@shikharj05
Copy link
Collaborator

@nibix anything pending on this PR? @cwperks any concerns with moving ahead?

@nibix nibix force-pushed the config-update-thread branch 2 times, most recently from 4223cb6 to a740b13 Compare September 8, 2025 19:35
@nibix
Copy link
Collaborator Author

nibix commented Sep 8, 2025

@shikharj05

anything pending on this PR? @cwperks any concerns with moving ahead?

From my side, this is ready to go. Just now, I have rebased the branch and fixed the conflict in the changelog.

cwperks
cwperks previously approved these changes Sep 16, 2025
@cwperks
Copy link
Member

cwperks commented Sep 16, 2025

Thank you @nibix ! The changes LGTM. It would also be nice to complete #5501 in the future so many security config changes can happen in a single request.

@shikharj05
Copy link
Collaborator

I think there's one last conflict to fix, changes LGTM as well. Thanks @nibix !

Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
@nibix
Copy link
Collaborator Author

nibix commented Sep 17, 2025

I think there's one last conflict to fix

I have resolved the conflict. Please check again and re-approve in case :)

nibix and others added 2 commits September 17, 2025 12:54
@nibix nibix requested review from cwperks and shikharj05 September 29, 2025 12:04
shikharj05
shikharj05 previously approved these changes Oct 8, 2025
@shikharj05 shikharj05 merged commit 3b491d5 into opensearch-project:main Oct 27, 2025
68 checks passed
toepkerd pushed a commit to toepkerd/security that referenced this pull request Oct 27, 2025
…h-project#5479)

Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Darshit Chanpura <[email protected]>
Co-authored-by: Darshit Chanpura <[email protected]>
Signed-off-by: Dennis Toepker <[email protected]>
toepkerd pushed a commit to toepkerd/security that referenced this pull request Oct 27, 2025
…h-project#5479)

Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Nils Bandener <[email protected]>
Signed-off-by: Darshit Chanpura <[email protected]>
Co-authored-by: Darshit Chanpura <[email protected]>
Signed-off-by: Dennis Toepker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] ActionPrivileges initialization can take a long time in clusters with a large number of roles with repeated index patterns

5 participants