Skip to content

Commit 3cef02b

Browse files
authored
Update banyandb doc (#12792)
1 parent 10e239f commit 3cef02b

File tree

7 files changed

+98
-39
lines changed

7 files changed

+98
-39
lines changed

docs/en/banyandb/ttl.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Native TTL
2+
3+
BanyanDB employs a Time-To-Live (TTL) mechanism to automatically delete data older than the specified duration. When using BanyanDB as the storage backend, the `recordDataTTL` and `metricsDataTTL` configurations are deprecated. Instead, TTL settings should be configured directly within `storage.banyandb`.
4+
5+
For detailed information, please refer to the [Storage BanyanDB](../setup/backend/storages/banyandb.md) documentation.
6+
7+
## Segment Interval and TTL
8+
9+
BanyanDB's data rotation mechanism manages data storage based on **Segment Interval** and **TTL** settings:
10+
11+
- **Segment Interval (`SIDays`)**: Specifies the time interval in days for creating a new data segment. Segments are time-based, facilitating efficient data retention and querying.
12+
- **TTL (`TTLDays`)**: Defines the time-to-live for data within a group, in days. Data that exceeds the TTL will be automatically deleted.
13+
14+
### Best Practices for Setting `SIDays` and `TTLDays`
15+
16+
- **Data Retention Requirements**: Set the TTL based on how long you need to retain your data. For instance, to retain data for 30 days, set the TTL to 30 days.
17+
- **Segment Management**: Avoid generating too many segments, as this increases the overhead for data management and querying.
18+
- **Query Requirements**: Align segment intervals with your query patterns. For example:
19+
- If you frequently query data for the last 30 minutes, set `SIDays` to 1 day.
20+
- For querying data from the last 7 days, set `SIDays` to 7 days.
21+
22+
## Configuration Guidelines
23+
24+
### Record Data
25+
26+
For both standard and super datasets:
27+
28+
- **Recommended `SIDays`**: `1`
29+
- Most queries are performed within a day.
30+
- **`TTLDays`**: Set according to your data retention needs.
31+
32+
### Metrics Data
33+
34+
Configure `SIDays` and `TTLDays` based on data retention and query requirements. Recommended settings include:
35+
36+
| Group | `SIDays` | `TTLDays` |
37+
|----------------|----------|-----------|
38+
| Minute (`gmMinute`) | 1 | 7 |
39+
| Hour (`gmHour`) | 5 | 15 |
40+
| Day (`gmDay`) | 15 | 15 |
41+
| Index (`gmIndex`) | 15 | 15 |
42+
43+
**Group Descriptions:**
44+
45+
- **Minute (`gmMinute`)**: Stores metrics with a 1-minute granularity. Suitable for recent data queries requiring minute-level detail. Consequently, it has shorter `SIDays` and `TTLDays` compared to other groups.
46+
- **Hour (`gmHour`)**: Stores metrics with a 1-hour granularity. Designed for queries that need hour-level detail over a longer period than minute-level data.
47+
- **Day (`gmDay`)**: Stores metrics with a 1-day granularity. This group handles the longest segment intervals and TTLs among all granularity groups.
48+
- **Index (`gmIndex`)**: Stores metrics used solely for indexing without value columns. Since queries often scan all segments in the `index` group, it shares the same `SIDays` and `TTLDays` as the `day` group to optimize performance. This group's `TTL` must be set to the **max** value of all groups.
49+
50+
For more details on configuring `segmentIntervalDays` and `ttlDays`, refer to the [BanyanDB Rotation](https://skywalking.apache.org/docs/skywalking-banyandb/latest/concept/rotation/) documentation.

docs/en/changes/changes.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,5 +40,6 @@
4040

4141
#### Documentation
4242
* Update release document to adopt newly added revision-based process.
43+
* Improve BanyanDB documentation.
4344

4445
All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/224?closed=1)

docs/en/setup/backend/storages/banyandb.md

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,42 +14,31 @@ storage:
1414
# If BanyanDB is deployed as a standalone server, the target should be the IP address or domain name and port of the BanyanDB server.
1515
# If BanyanDB is deployed in a cluster, the targets should be the IP address or domain name and port of the `liaison` nodes, separated by commas.
1616
targets: ${SW_STORAGE_BANYANDB_TARGETS:127.0.0.1:17912}
17-
1817
# The maximum number of records in a bulk write request.
1918
# A larger value can improve write performance but also increases OAP and BanyanDB Server memory usage.
2019
maxBulkSize: ${SW_STORAGE_BANYANDB_MAX_BULK_SIZE:10000}
21-
2220
# The minimum seconds between two bulk flushes.
2321
# If the data in a bulk is less than maxBulkSize, the data will be flushed after this period.
2422
# If the data in a bulk exceeds maxBulkSize, the data will be flushed immediately.
2523
# A larger value can reduce write pressure on BanyanDB Server but increase data latency.
2624
flushInterval: ${SW_STORAGE_BANYANDB_FLUSH_INTERVAL:15}
27-
2825
# The timeout in seconds for a bulk flush.
2926
flushTimeout: ${SW_STORAGE_BANYANDB_FLUSH_TIMEOUT:10}
30-
3127
# The number of threads that write data to BanyanDB concurrently.
3228
# A higher value can improve write performance but also increases CPU usage on both OAP and BanyanDB Server.
3329
concurrentWriteThreads: ${SW_STORAGE_BANYANDB_CONCURRENT_WRITE_THREADS:15}
34-
3530
# The maximum size of the dataset when the OAP loads cache, such as network aliases.
3631
resultWindowMaxSize: ${SW_STORAGE_BANYANDB_QUERY_MAX_WINDOW_SIZE:10000}
37-
3832
# The maximum size of metadata per query.
3933
metadataQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_MAX_SIZE:10000}
40-
4134
# The maximum number of trace segments per query.
4235
segmentQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_SEGMENT_SIZE:200}
43-
4436
# The maximum number of profile task queries in a request.
4537
profileTaskQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_PROFILE_TASK_SIZE:200}
46-
4738
# The batch size for querying profile data.
4839
profileDataQueryBatchSize: ${SW_STORAGE_BANYANDB_QUERY_PROFILE_DATA_BATCH_SIZE:100}
49-
5040
# If the BanyanDB server is configured with TLS, configure the TLS cert file path and enable TLS connection.
5141
sslTrustCAPath: ${SW_STORAGE_BANYANDB_SSL_TRUST_CA_PATH:""}
52-
5342
# The group settings of record.
5443
# `gr` is the short name of the group settings of record.
5544
#
@@ -63,7 +52,6 @@ storage:
6352
grSuperShardNum: ${SW_STORAGE_BANYANDB_GR_SUPER_SHARD_NUM:2}
6453
grSuperSIDays: ${SW_STORAGE_BANYANDB_GR_SUPER_SI_DAYS:1}
6554
grSuperTTLDays: ${SW_STORAGE_BANYANDB_GR_SUPER_TTL_DAYS:3}
66-
6755
# The group settings of metrics.
6856
# `gm` is the short name of the group settings of metrics.
6957
#
@@ -75,18 +63,18 @@ storage:
7563
gmMinuteSIDays: ${SW_STORAGE_BANYANDB_GM_MINUTE_SI_DAYS:1}
7664
gmMinuteTTLDays: ${SW_STORAGE_BANYANDB_GM_MINUTE_TTL_DAYS:7}
7765
gmHourShardNum: ${SW_STORAGE_BANYANDB_GM_HOUR_SHARD_NUM:1}
78-
gmHourSIDays: ${SW_STORAGE_BANYANDB_GM_HOUR_SI_DAYS:1}
66+
gmHourSIDays: ${SW_STORAGE_BANYANDB_GM_HOUR_SI_DAYS:5}
7967
gmHourTTLDays: ${SW_STORAGE_BANYANDB_GM_HOUR_TTL_DAYS:15}
8068
gmDayShardNum: ${SW_STORAGE_BANYANDB_GM_DAY_SHARD_NUM:1}
81-
gmDaySIDays: ${SW_STORAGE_BANYANDB_GM_DAY_SI_DAYS:1}
82-
gmDayTTLDays: ${SW_STORAGE_BANYANDB_GM_DAY_TTL_DAYS:30}
69+
gmDaySIDays: ${SW_STORAGE_BANYANDB_GM_DAY_SI_DAYS:15}
70+
gmDayTTLDays: ${SW_STORAGE_BANYANDB_GM_DAY_TTL_DAYS:15}
8371
# If the metrics is marked as "index_mode", the metrics will be stored in the "index" group.
8472
# The "index" group is designed to store metrics that are used for indexing without value columns.
8573
# Such as `service_traffic`, `network_address_alias`, etc.
8674
# "index_mode" requires BanyanDB *0.8.0* or later.
87-
gmIndexShardNum: ${SW_STORAGE_BANYANDB_GM_INDEX_SHARD_NUM:1}
88-
gmIndexSIDays: ${SW_STORAGE_BANYANDB_GM_INDEX_SI_DAYS:1}
89-
gmIndexTTLDays: ${SW_STORAGE_BANYANDB_GM_INDEX_TTL_DAYS:30}
75+
gmIndexShardNum: ${SW_STORAGE_BANYANDB_GM_INDEX_SHARD_NUM:2}
76+
gmIndexSIDays: ${SW_STORAGE_BANYANDB_GM_INDEX_SI_DAYS:15}
77+
gmIndexTTLDays: ${SW_STORAGE_BANYANDB_GM_INDEX_TTL_DAYS:15}
9078

9179
```
9280

@@ -110,6 +98,20 @@ BanyanDB supports **group settings** to configure storage groups, shards, segmen
11098
- `SIDays`: Interval in days for creating a new segment. Segments are time-based, allowing efficient data retention and querying. `SI` stands for Segment Interval.
11199
- `TTLDays`: Time-to-live for the data in the group, in days. Data exceeding the TTL will be deleted.
112100

113-
For more details on setting `segmentIntervalDays` and `ttlDays`, refer to the [BanyanDB Rotation](https://skywalking.apache.org/docs/skywalking-banyandb/latest/concept/rotation/) documentation.
101+
For more details on setting `segmentIntervalDays` and `ttlDays`, refer to the [BanyanDB TTL](../../../banyandb/ttl.md) documentation.
102+
103+
#### Record Group Settings
104+
105+
The `gr` prefix is used for record group settings. The `normal` and `super` sections are used to define settings for normal and super datasets, respectively.
106+
107+
Super datasets are used to store trace or log data that is too large for normal datasets. Each super dataset is stored in a separate group in BanyanDB. The settings defined in the `super` section are applied to all super datasets.
108+
109+
Normal datasets are stored in a single group named `normal`. The settings defined in the `normal` section are applied to all normal datasets.
110+
111+
#### Metrics Group Settings
112+
113+
The `gm` prefix is used for metrics group settings. The `minute`, `hour`, and `day` sections are used to define settings for metrics stored based on granularity.
114+
115+
The `index` group is designed to store metrics used for indexing without value columns. For example, `service_traffic`, `network_address_alias`, etc.
114116

115117
For more details, refer to the documentation of [BanyanDB](https://skywalking.apache.org/docs/skywalking-banyandb/latest/readme/) and the [BanyanDB Java Client](https://github.com/apache/skywalking-banyandb-java-client) subprojects.

docs/en/setup/backend/ttl.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,8 @@ These are the settings for the different types:
1010
metricsDataTTL: ${SW_CORE_METRICS_DATA_TTL:7} # Unit is day
1111
```
1212
13+
## BanyanDB TTL
14+
15+
BanyanDB has a TTL mechanism to automatically delete data that is older than the specified time. When you use BanyanDB as the storage backend, `recordDataTTL` and `metricsDataTTL` are not used. Instead, you should configure the TTL settings in `storage.banyandb`.
16+
17+
Please refer to the [Storage BanyanDB](storages/banyandb.md) document for more information.

docs/menu.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,10 @@ catalog:
198198
path: "/en/setup/backend/backend-telemetry"
199199
- name: "OAP Health Check"
200200
path: "/en/setup/backend/backend-health-check"
201+
- name: "BanyanDB Exclusive Setup"
202+
catalog:
203+
- name: "Native TTL"
204+
path: "/en/banyandb/ttl"
201205
- name: "Tracing"
202206
catalog:
203207
- name: "Trace Sampling"

oap-server/server-starter/src/main/resources/application.yml

Lines changed: 8 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -223,42 +223,31 @@ storage:
223223
# If BanyanDB is deployed as a standalone server, the target should be the IP address or domain name and port of the BanyanDB server.
224224
# If BanyanDB is deployed in a cluster, the targets should be the IP address or domain name and port of the `liaison` nodes, separated by commas.
225225
targets: ${SW_STORAGE_BANYANDB_TARGETS:127.0.0.1:17912}
226-
227226
# The maximum number of records in a bulk write request.
228227
# A larger value can improve write performance but also increases OAP and BanyanDB Server memory usage.
229228
maxBulkSize: ${SW_STORAGE_BANYANDB_MAX_BULK_SIZE:10000}
230-
231229
# The minimum seconds between two bulk flushes.
232230
# If the data in a bulk is less than maxBulkSize, the data will be flushed after this period.
233231
# If the data in a bulk exceeds maxBulkSize, the data will be flushed immediately.
234232
# A larger value can reduce write pressure on BanyanDB Server but increase data latency.
235233
flushInterval: ${SW_STORAGE_BANYANDB_FLUSH_INTERVAL:15}
236-
237234
# The timeout in seconds for a bulk flush.
238235
flushTimeout: ${SW_STORAGE_BANYANDB_FLUSH_TIMEOUT:10}
239-
240236
# The number of threads that write data to BanyanDB concurrently.
241237
# A higher value can improve write performance but also increases CPU usage on both OAP and BanyanDB Server.
242238
concurrentWriteThreads: ${SW_STORAGE_BANYANDB_CONCURRENT_WRITE_THREADS:15}
243-
244239
# The maximum size of the dataset when the OAP loads cache, such as network aliases.
245240
resultWindowMaxSize: ${SW_STORAGE_BANYANDB_QUERY_MAX_WINDOW_SIZE:10000}
246-
247241
# The maximum size of metadata per query.
248242
metadataQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_MAX_SIZE:10000}
249-
250243
# The maximum number of trace segments per query.
251244
segmentQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_SEGMENT_SIZE:200}
252-
253245
# The maximum number of profile task queries in a request.
254246
profileTaskQueryMaxSize: ${SW_STORAGE_BANYANDB_QUERY_PROFILE_TASK_SIZE:200}
255-
256247
# The batch size for querying profile data.
257248
profileDataQueryBatchSize: ${SW_STORAGE_BANYANDB_QUERY_PROFILE_DATA_BATCH_SIZE:100}
258-
259249
# If the BanyanDB server is configured with TLS, configure the TLS cert file path and enable TLS connection.
260250
sslTrustCAPath: ${SW_STORAGE_BANYANDB_SSL_TRUST_CA_PATH:""}
261-
262251
# The group settings of record.
263252
# `gr` is the short name of the group settings of record.
264253
#
@@ -272,7 +261,6 @@ storage:
272261
grSuperShardNum: ${SW_STORAGE_BANYANDB_GR_SUPER_SHARD_NUM:2}
273262
grSuperSIDays: ${SW_STORAGE_BANYANDB_GR_SUPER_SI_DAYS:1}
274263
grSuperTTLDays: ${SW_STORAGE_BANYANDB_GR_SUPER_TTL_DAYS:3}
275-
276264
# The group settings of metrics.
277265
# `gm` is the short name of the group settings of metrics.
278266
#
@@ -281,21 +269,21 @@ storage:
281269
# Non-"minute" are governed by the "core.downsampling" setting.
282270
# For example, if "core.downsampling" is set to "hour", the "hour" will be used, while "day" are ignored.
283271
gmMinuteShardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_SHARD_NUM:2}
284-
gmMinuteSIDays: ${SW_STORAGE_BANYANDB_GM_MINUTE_SI_DAYS:7}
272+
gmMinuteSIDays: ${SW_STORAGE_BANYANDB_GM_MINUTE_SI_DAYS:1}
285273
gmMinuteTTLDays: ${SW_STORAGE_BANYANDB_GM_MINUTE_TTL_DAYS:7}
286274
gmHourShardNum: ${SW_STORAGE_BANYANDB_GM_HOUR_SHARD_NUM:1}
287-
gmHourSIDays: ${SW_STORAGE_BANYANDB_GM_HOUR_SI_DAYS:1}
288-
gmHourTTLDays: ${SW_STORAGE_BANYANDB_GM_HOUR_TTL_DAYS:7}
275+
gmHourSIDays: ${SW_STORAGE_BANYANDB_GM_HOUR_SI_DAYS:5}
276+
gmHourTTLDays: ${SW_STORAGE_BANYANDB_GM_HOUR_TTL_DAYS:15}
289277
gmDayShardNum: ${SW_STORAGE_BANYANDB_GM_DAY_SHARD_NUM:1}
290-
gmDaySIDays: ${SW_STORAGE_BANYANDB_GM_DAY_SI_DAYS:1}
291-
gmDayTTLDays: ${SW_STORAGE_BANYANDB_GM_DAY_TTL_DAYS:30}
278+
gmDaySIDays: ${SW_STORAGE_BANYANDB_GM_DAY_SI_DAYS:15}
279+
gmDayTTLDays: ${SW_STORAGE_BANYANDB_GM_DAY_TTL_DAYS:15}
292280
# If the metrics is marked as "index_mode", the metrics will be stored in the "index" group.
293281
# The "index" group is designed to store metrics that are used for indexing without value columns.
294282
# Such as `service_traffic`, `network_address_alias`, etc.
295283
# "index_mode" requires BanyanDB *0.8.0* or later.
296-
gmIndexShardNum: ${SW_STORAGE_BANYANDB_GM_INDEX_SHARD_NUM:1}
297-
gmIndexSIDays: ${SW_STORAGE_BANYANDB_GM_INDEX_SI_DAYS:1}
298-
gmIndexTTLDays: ${SW_STORAGE_BANYANDB_GM_INDEX_TTL_DAYS:30}
284+
gmIndexShardNum: ${SW_STORAGE_BANYANDB_GM_INDEX_SHARD_NUM:2}
285+
gmIndexSIDays: ${SW_STORAGE_BANYANDB_GM_INDEX_SI_DAYS:15}
286+
gmIndexTTLDays: ${SW_STORAGE_BANYANDB_GM_INDEX_TTL_DAYS:15}
299287

300288
agent-analyzer:
301289
selector: ${SW_AGENT_ANALYZER:default}

oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/BanyanDBStorageProvider.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,15 @@ public void onInitialized(final BanyanDBStorageConfig initialized) {
121121

122122
@Override
123123
public void prepare() throws ServiceNotProvidedException, ModuleStartException {
124+
if (config.getGmDayTTLDays() > config.getGmIndexTTLDays()) {
125+
throw new ModuleStartException("gmDayTTLDays must be less than or equal to gmIndexTTLDays");
126+
}
127+
if (config.getGmHourTTLDays() > config.getGmIndexTTLDays()) {
128+
throw new ModuleStartException("gmHourTTLDays must be less than or equal to gmIndexTTLDays");
129+
}
130+
if (config.getGmMinuteTTLDays() > config.getGmIndexTTLDays()) {
131+
throw new ModuleStartException("gmMinuteTTLDays must be less than or equal to gmIndexTTLDays");
132+
}
124133
this.registerServiceImplementation(StorageBuilderFactory.class, new StorageBuilderFactory.Default());
125134

126135
this.client = new BanyanDBStorageClient(config);

0 commit comments

Comments
 (0)