Skip to content

Commit 10e239f

Browse files
authored
Introduce index mode and refactor banyandb group settings (#12790)
1 parent e1698a5 commit 10e239f

File tree

36 files changed

+364
-268
lines changed

36 files changed

+364
-268
lines changed

docs/en/changes/changes.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
* Adapt the new metadata standardization in Istio 1.24.
2323
* Bump up netty to 4.1.115, grpc to 1.68.1, boringssl to 2.0.69.
2424
* BanyanDB: Support update the Group settings when OAP starting.
25+
* BanyanDB: Introduce index mode and refactor banyandb group settings.
2526

2627
#### UI
2728

Lines changed: 89 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,115 @@
1-
21
## BanyanDB
3-
[BanyanDB](https://github.com/apache/skywalking-banyandb) is a dedicated storage implementation developed by the SkyWalking Team and the community.
4-
Activate BanyanDB as the storage, and set storage provider to **banyandb**.
52

6-
The OAP requires BanyanDB 0.7 server. From this version, BanyanDB provides general compatibility.
3+
[BanyanDB](https://github.com/apache/skywalking-banyandb) is a dedicated storage implementation developed by the SkyWalking Team and the community. Activate BanyanDB as the storage by setting the storage provider to **banyandb**.
4+
5+
The OAP requires BanyanDB version **0.8** or later. From this version onwards, BanyanDB provides general compatibility.
6+
7+
### Configuration
78

89
```yaml
910
storage:
1011
banyandb:
1112
# Targets is the list of BanyanDB servers, separated by commas.
12-
# Each target is a BanyanDB server in the format of `host:port`
13-
# If the BanyanDB is deployed as a standalone server, the target should be the IP address or domain name and port of the BanyanDB server.
14-
# If the BanyanDB is deployed in a cluster, the targets should be the IP address or domain name and port of the `liaison` nodes, separated by commas.
13+
# Each target is a BanyanDB server in the format of `host:port`.
14+
# If BanyanDB is deployed as a standalone server, the target should be the IP address or domain name and port of the BanyanDB server.
15+
# If BanyanDB is deployed in a cluster, the targets should be the IP address or domain name and port of the `liaison` nodes, separated by commas.
1516
targets: ${SW_STORAGE_BANYANDB_TARGETS:127.0.0.1:17912}
16-
# The max number of records in a bulk write request.
17-
# Bigger value can improve the write performance, but also increase the OAP and BanyanDB Server memory usage.
17+
18+
# The maximum number of records in a bulk write request.
19+
# A larger value can improve write performance but also increases OAP and BanyanDB Server memory usage.
1820
maxBulkSize: ${SW_STORAGE_BANYANDB_MAX_BULK_SIZE:10000}
21+
1922
# The minimum seconds between two bulk flushes.
2023
# If the data in a bulk is less than maxBulkSize, the data will be flushed after this period.
21-
# If the data in a bulk is more than maxBulkSize, the data will be flushed immediately.
22-
# Bigger value can reduce the write pressure on BanyanDB Server, but also increase the latency of the data.
24+
# If the data in a bulk exceeds maxBulkSize, the data will be flushed immediately.
25+
# A larger value can reduce write pressure on BanyanDB Server but increase data latency.
2326
flushInterval: ${SW_STORAGE_BANYANDB_FLUSH_INTERVAL:15}
24-
# The timeout seconds of a bulk flush.
27+
28+
# The timeout in seconds for a bulk flush.
2529
flushTimeout: ${SW_STORAGE_BANYANDB_FLUSH_TIMEOUT:10}
26-
# The shard number of `measure` groups that store the metrics data.
27-
metricsShardsNumber: ${SW_STORAGE_BANYANDB_METRICS_SHARDS_NUMBER:1}
28-
# The shard number of `stream` groups that store the trace, log and profile data.
29-
recordShardsNumber: ${SW_STORAGE_BANYANDB_RECORD_SHARDS_NUMBER:1}
30-
# The multiplier of the number of shards of the super dataset.
31-
# Super dataset is a special dataset that stores the trace or log data that is too large to be stored in the normal dataset.
32-
# If the normal dataset has `n` shards, the super dataset will have `n * superDatasetShardsFactor` shards.
33-
# For example, supposing `recordShardsNumber` is 3, and `superDatasetShardsFactor` is 2,
34-
# `segment-default` is a normal dataset that has 3 shards, and `segment-minute` is a super dataset that has 6 shards.
35-
superDatasetShardsFactor: ${SW_STORAGE_BANYANDB_SUPERDATASET_SHARDS_FACTOR:2}
30+
3631
# The number of threads that write data to BanyanDB concurrently.
37-
# Bigger value can improve the write performance, but also increase the OAP and BanyanDB Server CPU usage.
32+
# A higher value can improve write performance but also increases CPU usage on both OAP and BanyanDB Server.
3833
concurrentWriteThreads: ${SW_STORAGE_BANYANDB_CONCURRENT_WRITE_THREADS:15}
39-
# The maximum size of dataset when the OAP loads cache, such as network aliases.
34+
35+
# The maximum size of the dataset when the OAP loads cache, such as network aliases.
4036
resultWindowMaxSize: ${SW_STORAGE_BANYANDB_QUERY_MAX_WINDOW_SIZE:10000}
37+
4138
# The maximum size of metadata per query.
4239
metadataQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_MAX_SIZE:10000}
43-
# The maximum size of trace segments per query.
40+
41+
# The maximum number of trace segments per query.
4442
segmentQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_SEGMENT_SIZE:200}
45-
# The max number of profile task query in a request.
43+
44+
# The maximum number of profile task queries in a request.
4645
profileTaskQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_PROFILE_TASK_SIZE:200}
47-
# The batch size of query profiling data.
46+
47+
# The batch size for querying profile data.
4848
profileDataQueryBatchSize: ${SW_STORAGE_BANYANDB_QUERY_PROFILE_DATA_BATCH_SIZE:100}
49-
# Data is stored in BanyanDB in segments. A segment is a time range of data.
50-
# The segment interval is the time range of a segment.
51-
# The value should be less or equal to data TTL relevant settings.
52-
segmentIntervalDays: ${SW_STORAGE_BANYANDB_SEGMENT_INTERVAL_DAYS:1}
53-
# The super dataset segment interval is the time range of a segment in the super dataset.
54-
superDatasetSegmentIntervalDays: ${SW_STORAGE_BANYANDB_SUPER_DATASET_SEGMENT_INTERVAL_DAYS:1}
55-
# Specific groups settings.
56-
# For example, {"group1": {"blockIntervalHours": 4, "segmentIntervalDays": 1}}
57-
# Please refer to https://github.com/apache/skywalking-banyandb/blob/${BANYANDB_RELEASE}/docs/crud/group.md#create-operation
58-
# for group setting details.
59-
specificGroupSettings: ${SW_STORAGE_BANYANDB_SPECIFIC_GROUP_SETTINGS:""}
60-
# If the BanyanDB server is configured with TLS, config the TLS cert file path and open tls connection.
49+
50+
# If the BanyanDB server is configured with TLS, configure the TLS cert file path and enable TLS connection.
6151
sslTrustCAPath: ${SW_STORAGE_BANYANDB_SSL_TRUST_CA_PATH:""}
52+
53+
# The group settings of record.
54+
# `gr` is the short name of the group settings of record.
55+
#
56+
# The "normal" section defines settings for datasets not specified in "super".
57+
# Each dataset will be grouped under a single group named "normal".
58+
grNormalShardNum: ${SW_STORAGE_BANYANDB_GR_NORMAL_SHARD_NUM:1}
59+
grNormalSIDays: ${SW_STORAGE_BANYANDB_GR_NORMAL_SI_DAYS:1}
60+
grNormalTTLDays: ${SW_STORAGE_BANYANDB_GR_NORMAL_TTL_DAYS:3}
61+
# "super" is a special dataset designed to store trace or log data that is too large for normal datasets.
62+
# Each super dataset will be a separate group in BanyanDB, following the settings defined in the "super" section.
63+
grSuperShardNum: ${SW_STORAGE_BANYANDB_GR_SUPER_SHARD_NUM:2}
64+
grSuperSIDays: ${SW_STORAGE_BANYANDB_GR_SUPER_SI_DAYS:1}
65+
grSuperTTLDays: ${SW_STORAGE_BANYANDB_GR_SUPER_TTL_DAYS:3}
66+
67+
# The group settings of metrics.
68+
# `gm` is the short name of the group settings of metrics.
69+
#
70+
# OAP stores metrics based its granularity.
71+
# Valid values are "day", "hour", and "minute". That means metrics will be stored in the three separate groups.
72+
# Non-"minute" are governed by the "core.downsampling" setting.
73+
# For example, if "core.downsampling" is set to "hour", the "hour" will be used, while "day" are ignored.
74+
gmMinuteShardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_SHARD_NUM:2}
75+
gmMinuteSIDays: ${SW_STORAGE_BANYANDB_GM_MINUTE_SI_DAYS:1}
76+
gmMinuteTTLDays: ${SW_STORAGE_BANYANDB_GM_MINUTE_TTL_DAYS:7}
77+
gmHourShardNum: ${SW_STORAGE_BANYANDB_GM_HOUR_SHARD_NUM:1}
78+
gmHourSIDays: ${SW_STORAGE_BANYANDB_GM_HOUR_SI_DAYS:1}
79+
gmHourTTLDays: ${SW_STORAGE_BANYANDB_GM_HOUR_TTL_DAYS:15}
80+
gmDayShardNum: ${SW_STORAGE_BANYANDB_GM_DAY_SHARD_NUM:1}
81+
gmDaySIDays: ${SW_STORAGE_BANYANDB_GM_DAY_SI_DAYS:1}
82+
gmDayTTLDays: ${SW_STORAGE_BANYANDB_GM_DAY_TTL_DAYS:30}
83+
# If the metrics is marked as "index_mode", the metrics will be stored in the "index" group.
84+
# The "index" group is designed to store metrics that are used for indexing without value columns.
85+
# Such as `service_traffic`, `network_address_alias`, etc.
86+
# "index_mode" requires BanyanDB *0.8.0* or later.
87+
gmIndexShardNum: ${SW_STORAGE_BANYANDB_GM_INDEX_SHARD_NUM:1}
88+
gmIndexSIDays: ${SW_STORAGE_BANYANDB_GM_INDEX_SI_DAYS:1}
89+
gmIndexTTLDays: ${SW_STORAGE_BANYANDB_GM_INDEX_TTL_DAYS:30}
90+
6291
```
6392

64-
BanyanDB Server supports two installation modes: standalone and cluster. The standalone mode is suitable for small-scale deployments, while the cluster mode is suitable for large-scale deployments.
93+
### Installation Modes
94+
95+
BanyanDB Server supports two installation modes:
96+
97+
- **Standalone Mode**: Suitable for small-scale deployments.
98+
- **Configuration**: `targets` is the IP address/hostname and port of the BanyanDB server.
99+
100+
- **Cluster Mode**: Suitable for large-scale deployments.
101+
- **Configuration**: `targets` is the IP address/hostname and port of the `liaison` nodes, separated by commas. `Liaison` nodes are the entry points of the BanyanDB cluster.
102+
103+
### Group Settings
104+
105+
BanyanDB supports **group settings** to configure storage groups, shards, segment intervals, and TTL (Time-To-Live). The group settings file is a YAML file required when using BanyanDB as the storage.
106+
107+
#### Basic Group Settings
108+
109+
- `ShardNum`: Number of shards in the group. Shards are the basic units of data storage in BanyanDB. Data is distributed across shards based on the hash value of the series ID. Refer to the [BanyanDB Shard](https://skywalking.apache.org/docs/skywalking-banyandb/latest/concept/clustering/#52-data-sharding) documentation for more details.
110+
- `SIDays`: Interval in days for creating a new segment. Segments are time-based, allowing efficient data retention and querying. `SI` stands for Segment Interval.
111+
- `TTLDays`: Time-to-live for the data in the group, in days. Data exceeding the TTL will be deleted.
65112

66-
* Standalone mode: `targets` is the IP address/host name and port of the BanyanDB server.
67-
* Cluster mode: `targets` is the IP address/host name and port of the `liaison` nodes, separated by commas. `liaison` nodes are the entry points of the BanyanDB cluster.
113+
For more details on setting `segmentIntervalDays` and `ttlDays`, refer to the [BanyanDB Rotation](https://skywalking.apache.org/docs/skywalking-banyandb/latest/concept/rotation/) documentation.
68114

69-
For more details, please refer to the documents of [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/latest/readme/)
70-
and [BanyanDB Java Client](https://github.com/apache/skywalking-banyandb-java-client) subprojects.
115+
For more details, refer to the documentation of [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/latest/readme/) and the [BanyanDB Java Client](https://github.com/apache/skywalking-banyandb-java-client) subprojects.

oap-server-bom/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
<httpcore.version>4.4.13</httpcore.version>
7474
<httpasyncclient.version>4.1.5</httpasyncclient.version>
7575
<commons-compress.version>1.21</commons-compress.version>
76-
<banyandb-java-client.version>0.7.0</banyandb-java-client.version>
76+
<banyandb-java-client.version>0.8.0-rc0</banyandb-java-client.version>
7777
<kafka-clients.version>3.4.0</kafka-clients.version>
7878
<spring-kafka-test.version>2.4.6.RELEASE</spring-kafka-test.version>
7979
<consul.client.version>1.5.3</consul.client.version>

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/manual/cache/TopNCacheReadCommand.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
*/
3939
@Stream(name = TopNCacheReadCommand.INDEX_NAME, scopeId = DefaultScopeDefine.CACHE_SLOW_ACCESS, builder = TopNCacheReadCommand.Builder.class, processor = TopNStreamProcessor.class)
4040
@BanyanDB.TimestampColumn(TopN.TIMESTAMP)
41+
@BanyanDB.IndexMode
4142
public class TopNCacheReadCommand extends TopN {
4243
public static final String INDEX_NAME = "top_n_cache_read_command";
4344

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/manual/cache/TopNCacheWriteCommand.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
*/
3939
@Stream(name = TopNCacheWriteCommand.INDEX_NAME, scopeId = DefaultScopeDefine.CACHE_SLOW_ACCESS, builder = TopNCacheWriteCommand.Builder.class, processor = TopNStreamProcessor.class)
4040
@BanyanDB.TimestampColumn(TopN.TIMESTAMP)
41+
@BanyanDB.IndexMode
4142
public class TopNCacheWriteCommand extends TopN {
4243
public static final String INDEX_NAME = "top_n_cache_write_command";
4344

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/manual/database/TopNDatabaseStatement.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
*/
3939
@Stream(name = TopNDatabaseStatement.INDEX_NAME, scopeId = DefaultScopeDefine.DATABASE_SLOW_STATEMENT, builder = TopNDatabaseStatement.Builder.class, processor = TopNStreamProcessor.class)
4040
@BanyanDB.TimestampColumn(TopN.TIMESTAMP)
41+
@BanyanDB.IndexMode
4142
public class TopNDatabaseStatement extends TopN {
4243
public static final String INDEX_NAME = "top_n_database_statement";
4344

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/manual/endpoint/EndpointTraffic.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@
4545
"serviceId",
4646
"name"
4747
})
48+
@BanyanDB.IndexMode
4849
public class EndpointTraffic extends Metrics {
4950

5051
public static final String INDEX_NAME = "endpoint_traffic";

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/manual/instance/InstanceTraffic.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
"serviceId",
4949
"name"
5050
})
51+
@BanyanDB.IndexMode
5152
public class InstanceTraffic extends Metrics {
5253
public static final String INDEX_NAME = "instance_traffic";
5354
public static final String SERVICE_ID = "service_id";

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/manual/networkalias/NetworkAddressAlias.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
@EqualsAndHashCode(of = {
4545
"address"
4646
})
47+
@BanyanDB.IndexMode
4748
public class NetworkAddressAlias extends Metrics {
4849
public static final String INDEX_NAME = "network_address_alias";
4950
public static final String ADDRESS = "address";

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/manual/process/ProcessTraffic.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
"name",
5252
})
5353
@BanyanDB.StoreIDAsTag
54+
@BanyanDB.IndexMode
5455
public class ProcessTraffic extends Metrics {
5556
public static final String INDEX_NAME = "process_traffic";
5657
public static final String SERVICE_ID = "service_id";

0 commit comments

Comments
 (0)