|
| 1 | +# Data Lifecycle Stages(Hot/Warm/Cold) |
| 2 | + |
| 3 | +Lifecycle Stages provide a mechanism to optimize storage costs and query performance based on the time granularity of records/metrics, |
| 4 | +specially if you require keep mass of data for a long time. |
| 5 | + |
| 6 | +The data lifecycle includes hot, warm, and cold stages. Each stage has different TTL settings and Segment Creation Policies. |
| 7 | +Each group of records/metrics can be automatically migrated and stored in different stages according to the configuration. |
| 8 | + |
| 9 | +## Stages Definition |
| 10 | +- **hot**: The default first stage of data storage. The data is the newest, can be updated(metrics), and is most frequently queried. |
| 11 | +- **warm**: Optional, the second stage of data storage. The data is less frequently queried than the hot stage, can't be updated, and still performs well. |
| 12 | +- **cold**: Optional, the third stage of data storage. The data is rarely queried and is stored for a long time. The query performance is significantly lower than the hot/warm stages data. |
| 13 | + |
| 14 | +If necessary, you also can jump the warm stage, and only use hot and cold stages. Then the data will be moved to the cold stage after the TTL of the hot stage. |
| 15 | + |
| 16 | +## Configuration Guidelines |
| 17 | +The lifecycle stages configuration is under each group settings of the `bydb.yml` file, for example, the `metricsMin` group: |
| 18 | + |
| 19 | +```yaml |
| 20 | + metricsMin: |
| 21 | + # The settings for the default `hot` stage. |
| 22 | + shardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_SHARD_NUM:2} |
| 23 | + segmentInterval: ${SW_STORAGE_BANYANDB_GM_MINUTE_SI_DAYS:1} |
| 24 | + ttl: ${SW_STORAGE_BANYANDB_GM_MINUTE_TTL_DAYS:7} |
| 25 | + enableWarmStage: ${SW_STORAGE_BANYANDB_GM_MINUTE_ENABLE_WARM_STAGE:false} |
| 26 | + enableColdStage: ${SW_STORAGE_BANYANDB_GM_MINUTE_ENABLE_COLD_STAGE:false} |
| 27 | + warm: |
| 28 | + shardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_SHARD_NUM:2} |
| 29 | + segmentInterval: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_SI_DAYS:3} |
| 30 | + ttl: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_TTL_DAYS:15} |
| 31 | + nodeSelector: ${SW_STORAGE_BANYANDB_GM_MINUTE_WARM_NODE_SELECTOR:"type=warm"} |
| 32 | + cold: |
| 33 | + shardNum: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_SHARD_NUM:2} |
| 34 | + segmentInterval: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_SI_DAYS:5} |
| 35 | + ttl: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_TTL_DAYS:60} |
| 36 | + nodeSelector: ${SW_STORAGE_BANYANDB_GM_MINUTE_COLD_NODE_SELECTOR:"type=cold"} |
| 37 | +``` |
| 38 | +
|
| 39 | +1. **shardNum**: The number of shards for the group. |
| 40 | +2. **segmentInterval**: The time interval in days for creating a new data segment. |
| 41 | +- According to the freshness of the data, the `segmentInterval` days should: `hot` < `warm` < `cold`. |
| 42 | +3. **ttl**: The time-to-live for data within the group, in days. |
| 43 | +4. **enableWarmStage/enableColdStage**: Enable the warm/cold stage for the group. |
| 44 | +- The `hot` stage is always enabled by default. |
| 45 | +- If the `warm` stage is enabled, the data will be moved to the `warm` stage after the TTL of the `hot` stage. |
| 46 | +- If the `cold` stage is enabled and `warm` stage is disabled, the data will be moved to the `cold` stage after the TTL of the `hot` stage. |
| 47 | +- If both `warm` and `cold` stages are enabled, the data will be moved to the `warm` stage after the TTL of the `hot` stage, and then to the `cold` stage after the TTL of the `warm` stage. |
| 48 | +- OAP will query the data from the `hot and warm` stage by default if the `warm` stage is enabled. |
| 49 | +5. **nodeSelector**: Specifying target nodes for this stage. |
| 50 | + |
| 51 | +For more details on configuring `segmentIntervalDays` and `ttlDays`, refer to the [BanyanDB Rotation](https://skywalking.apache.org/docs/skywalking-banyandb/latest/concept/rotation/) documentation. |
| 52 | + |
| 53 | +## Procedure and The TTL for Stages |
| 54 | +About the TTL can refer to [Progressive TTL](ttl.md). |
| 55 | +The following diagram illustrates the lifecycle stages, assuming the TTL settings for hot, warm and cold stages are `TTL1, TTL2 and TTL3` days respectively: |
| 56 | + |
| 57 | +```mermaid |
| 58 | +sequenceDiagram |
| 59 | + Data(T0) ->> Hot Data(TTL1): Input |
| 60 | + Hot Data(TTL1) -->+ Hot Data(TTL1): TTL1 |
| 61 | + Hot Data(TTL1) ->>- Warm Data(TTL2): Migrate |
| 62 | + Warm Data(TTL2) -->+ Warm Data(TTL2): TTL2 - TTL1 |
| 63 | + Warm Data(TTL2) ->>- Cold Data(TTL3): Migrate |
| 64 | + Cold Data(TTL3) -->+ Cold Data(TTL3): TTL3 - TTL2 |
| 65 | + Cold Data(TTL3) ->>- Deleted: Delete |
| 66 | + Data(T0) --> Hot Data(TTL1): Live TTL1 Days |
| 67 | + Data(T0) --> Warm Data(TTL2): Live TTL2 Days |
| 68 | + Data(T0) --> Cold Data(TTL3): Live TTL3 Days |
| 69 | +``` |
| 70 | + |
| 71 | +- When the data is input, it will be stored in the hot stage and live for `TTL1` days. |
| 72 | +- After `TTL1` days, the data will be migrated to the warm stage and live until `TTL2` days. **It means data will in this stage for (TTL2 - TTL1) days**. |
| 73 | +- After `TTL2` days, the data will be migrated to the cold stage and live until `TTL3` days. **It means data will in this stage for (TTL3 - TTL2) days**. |
| 74 | +- After `TTL3` days, the data will be deleted. |
| 75 | +- The data will live for `TTL3` days in total. |
| 76 | + |
| 77 | +## Querying |
| 78 | +- According to the lifecycle stages configuration, OAP will query the data from the `hot and warm` stage by default if the `warm` stage is enabled. |
| 79 | +Otherwise, OAP will query the data from the `hot` stage only. |
| 80 | +- If the `cold` stage is enabled, for better query performance, you should specify the stage in the query and OAP will limit the query time range. |
| 81 | + |
| 82 | + |
0 commit comments