Skip to content

Commit 44844dd

Browse files
authored
Add Get Node List in the Cluster API (#13015)
1 parent 302b365 commit 44844dd

File tree

7 files changed

+126
-0
lines changed

7 files changed

+126
-0
lines changed

docs/en/changes/changes.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@
6666
* OAP self observability: Add watermark circuit break/recover metrics.
6767
* Add Baseline module for support alarm module query baseline data.
6868
* BaseLine: Support query baseline metrics names.
69+
* Add `Get Node List in the Cluster` API.
6970

7071
#### UI
7172

@@ -102,6 +103,7 @@
102103
* Add Status APIs docs.
103104
* Simplified the release process with removing maven central publish relative processes.
104105
* Add Circuit Breaking mechanism doc.
106+
* Add `Get Node List in the Cluster` API doc.
105107

106108

107109
All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/224?closed=1)

docs/en/setup/backend/backend-cluster.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,13 @@ There are various ways to manage the cluster in the backend. Choose the one that
2525
In the `application.yml` file, there are default configurations for the aforementioned coordinators under the
2626
section `cluster`. You can specify any of them in the `selector` property to enable it.
2727

28+
___
29+
**NOTICE**,
30+
Before you set up the cluster, please read the [Query Cluster Nodes](../../status/query_cluster_nodes.md) API to understand how to
31+
verify the cluster node list. If the nodes don't match the expectation, the cluster is not working properly, there could
32+
be many feature impacts, e.g. the metrics could be inaccurate and the alarms could not be triggered correctly.
33+
___
34+
2835
# Cloud Native
2936
## Kubernetes
3037

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Get Node List in the Cluster
2+
3+
The OAP cluster is a set of OAP servers that work together to provide a scalable and reliable service. The OAP cluster
4+
supports [various cluster coordinator](../setup/backend/backend-cluster.md) to manage the cluster membership and the
5+
communication.
6+
This API provides capability to query the node list in the cluster from every OAP node perspective. If the cluster
7+
coordinator doesn't work properly, the node list may be incomplete or incorrect. So, we recommend you to check the
8+
node list when set up a cluster.
9+
10+
This API is used to get the unified and effective TTL configurations.
11+
12+
- URL, `http://{core restHost}:{core restPort}/status/cluster/nodes`
13+
- HTTP GET method.
14+
15+
```json
16+
{
17+
"nodes": [
18+
{
19+
"host": "10.0.12.23",
20+
"port": 11800,
21+
"isSelf": true
22+
},
23+
{
24+
"host": "10.0.12.25",
25+
"port": 11800,
26+
"isSelf": false
27+
},
28+
{
29+
"host": "10.0.12.37",
30+
"port": 11800,
31+
"isSelf": false
32+
}
33+
]
34+
}
35+
```
36+
37+
The `nodes` list all the nodes in the cluster. The size of the list should be exactly same as your cluster setup.
38+
The `host` and `port` are the address of the OAP node, which are used for OAP nodes communicating with each other. The
39+
`isSelf` is a flag to indicate whether the node is the current node, others are remote nodes.

docs/en/status/status_apis.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ logs and self-observability solutions.
1010
- [Dump Effective Initial Configurations API](../debugging/config_dump.md)
1111
- [Tracing Query Execution APIs](../debugging/query-tracing.md)
1212
- [Get Effective TTL Configurations API](query_ttl_setup.md)
13+
- [Query Cluster Nodes API](query_cluster_nodes.md)
1314

1415
If you have a proposal about new status API, please don't hesitate
1516
to [create a discussion](https://github.com/apache/skywalking/discussions/new?category=ideas).

docs/menu.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -344,6 +344,8 @@ catalog:
344344
path: "/en/debugging/query-tracing"
345345
- name: "Get Effective TTL Configurations"
346346
path: "/en/status/query_ttl_setup"
347+
- name: "Get Node List in the Cluster"
348+
path: "/en/status/query_cluster_nodes"
347349
- name: "Customization"
348350
catalog:
349351
- name: "Overview"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*
17+
*/
18+
19+
package org.apache.skywalking.oap.query.debug;
20+
21+
import com.google.gson.JsonArray;
22+
import com.google.gson.JsonObject;
23+
import com.linecorp.armeria.common.HttpRequest;
24+
import com.linecorp.armeria.common.HttpResponse;
25+
import com.linecorp.armeria.common.MediaType;
26+
import com.linecorp.armeria.server.annotation.ExceptionHandler;
27+
import com.linecorp.armeria.server.annotation.Get;
28+
import lombok.extern.slf4j.Slf4j;
29+
import org.apache.skywalking.oap.server.core.CoreModule;
30+
import org.apache.skywalking.oap.server.core.remote.client.Address;
31+
import org.apache.skywalking.oap.server.core.remote.client.RemoteClientManager;
32+
import org.apache.skywalking.oap.server.library.module.ModuleManager;
33+
34+
@Slf4j
35+
@ExceptionHandler(StatusQueryExceptionHandler.class)
36+
public class ClusterStatusQueryHandler {
37+
private final ModuleManager moduleManager;
38+
private RemoteClientManager remoteClientManager;
39+
40+
public ClusterStatusQueryHandler(final ModuleManager manager) {
41+
this.moduleManager = manager;
42+
}
43+
44+
private RemoteClientManager getRemoteClientManager() {
45+
if (remoteClientManager == null) {
46+
remoteClientManager = moduleManager.find(CoreModule.NAME)
47+
.provider()
48+
.getService(RemoteClientManager.class);
49+
}
50+
return remoteClientManager;
51+
}
52+
53+
@Get("/status/cluster/nodes")
54+
public HttpResponse buildClusterNodeList(HttpRequest request) {
55+
JsonObject clusterInfo = new JsonObject();
56+
57+
JsonArray nodeList = new JsonArray();
58+
clusterInfo.add("nodes", nodeList);
59+
getRemoteClientManager().getRemoteClient().stream().map(c -> {
60+
final Address address = c.getAddress();
61+
JsonObject node = new JsonObject();
62+
node.addProperty("host", address.getHost());
63+
node.addProperty("port", address.getPort());
64+
node.addProperty("isSelf", address.isSelf());
65+
return node;
66+
}).forEach(nodeList::add);
67+
68+
return HttpResponse.of(MediaType.JSON_UTF_8, clusterInfo.toString());
69+
}
70+
71+
}

oap-server/server-query-plugin/status-query-plugin/src/main/java/org/apache/skywalking/oap/query/debug/StatusQueryProvider.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,10 @@ public void start() throws ServiceNotProvidedException {
7171
new TTLConfigQueryHandler(getManager()),
7272
Collections.singletonList(HttpMethod.GET)
7373
);
74+
service.addHandler(
75+
new ClusterStatusQueryHandler(getManager()),
76+
Collections.singletonList(HttpMethod.GET)
77+
);
7478
}
7579

7680
public void notifyAfterCompleted() throws ServiceNotProvidedException, ModuleStartException {

0 commit comments

Comments
 (0)