Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,20 @@ cluster, and they use the
[`influxd-ctl` tool](/enterprise_influxdb/v1/tools/influxd-ctl/) available on
all meta nodes.

{{% warn %}}
Before you begin, stop writing historical data to InfluxDB.
Historical data have timestamps that occur at anytime in the past.
Performing a rebalance while writing historical data can lead to data loss.
{{% /warn %}}
> [!Warning]
> #### Stop writing data before rebalancing
>
> Before you begin, stop writing historical data to InfluxDB.
> Historical data have timestamps that occur at anytime in the past.
> Performing a rebalance while writing historical data can lead to data loss.

> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

## Rebalance Procedure 1: Rebalance a cluster to create space

Expand All @@ -61,18 +70,23 @@ data node to expand the total disk capacity of the cluster.
In the next steps, you will safely move shards from one of the two original data
nodes to the new data node.

### Step 1: Truncate Hot Shards
### Step 1: Truncate hot shards

Hot shards are shards that are currently receiving writes.
Hot shards are shards that currently receive writes.
Performing any action on a hot shard can lead to data inconsistency within the
cluster which requires manual intervention from the user.

To prevent data inconsistency, truncate hot shards before moving any shards
> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

To prevent data inconsistency, truncate shards before moving any shards
across data nodes.
The command below creates a new hot shard which is automatically distributed
across all data nodes in the cluster, and the system writes all new points to
that shard.
All previous writes are now stored in cold shards.
The following command truncates all hot shards and creates new shards to write data to:

```
influxd-ctl truncate-shards
Expand All @@ -84,10 +98,11 @@ The expected output of this command is:
Truncated shards.
```

Once you truncate the shards, you can work on redistributing the cold shards
without the threat of data inconsistency in the cluster.
Any hot or new shards are now evenly distributed across the cluster and require
no further intervention.
New shards are automatically distributed across all data nodes, and InfluxDB writes new points to them.
Previous writes are stored in cold shards.

After truncating shards, you can redistribute cold shards without data inconsistency.
Hot and new shards are evenly distributed and require no further intervention.

### Step 2: Identify Cold Shards

Expand Down Expand Up @@ -292,18 +307,23 @@ name duration shardGroupDuration replicaN default
autogen 0s 1h0m0s 3 #👍 true
```

### Step 2: Truncate Hot Shards
### Step 2: Truncate hot shards

Hot shards are shards that are currently receiving writes.
Hot shards are shards that currently receive writes.
Performing any action on a hot shard can lead to data inconsistency within the
cluster which requires manual intervention from the user.

To prevent data inconsistency, truncate hot shards before copying any shards
> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

To prevent data inconsistency, truncate shards before copying any shards
to the new data node.
The command below creates a new hot shard which is automatically distributed
across the three data nodes in the cluster, and the system writes all new points
to that shard.
All previous writes are now stored in cold shards.
The following command truncates all hot shards and creates new shards to write data to:

```
influxd-ctl truncate-shards
Expand All @@ -315,10 +335,11 @@ The expected output of this command is:
Truncated shards.
```

Once you truncate the shards, you can work on distributing the cold shards
without the threat of data inconsistency in the cluster.
Any hot or new shards are now automatically distributed across the cluster and
require no further intervention.
New shards are automatically distributed across all data nodes, and InfluxDB writes new points to them.
Previous writes are stored in cold shards.

After truncating shards, you can redistribute cold shards without data inconsistency.
Hot and new shards are evenly distributed and require no further intervention.

### Step 3: Identify Cold Shards

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ We recommend the following design guidelines for most use cases:
- [Where to store data (tag or field)](#where-to-store-data-tag-or-field)
- [Avoid too many series](#avoid-too-many-series)
- [Use recommended naming conventions](#use-recommended-naming-conventions)
- [Writing data with future timestamps](#writing-data-with-future-timestamps)
- [Shard Group Duration Management](#shard-group-duration-management)

## Where to store data (tag or field)
Expand Down Expand Up @@ -209,6 +210,38 @@ from(bucket:"<database>/<retention_policy>")
> SELECT mean("temp") FROM "weather_sensor" WHERE region = 'north'
```

## Writing data with future timestamps

When designing schemas for applications that write data with future timestamps--such as forecast data from machine learning models, predictions, or scheduled events--consider the following implications for InfluxDB Enterprise v1 cluster operations and data integrity.

### Understanding future data behavior

InfluxDB Enterprise v1 creates shards based on time ranges.
When you write data with future timestamps, InfluxDB creates shards that cover future time periods.

> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

### Use separate databases for future data

When planning for data that contains future timestamps, consider isolating it in dedicated databases to:

- Minimize impact on real-time data operations
- Allow targeted maintenance operations on current vs. future data
- Simplify backup and recovery strategies for different data types

```sql
# Example: Separate databases for different data types
CREATE DATABASE "realtime_metrics"
CREATE DATABASE "ml_forecasts"
CREATE DATABASE "scheduled_predictions"
```

## Shard group duration management

### Shard group duration overview
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ The `influxd-ctl truncate-shards` command truncates all shards that are currentl
being written to (also known as "hot" shards) and creates new shards to write
new data to.

> [!Caution]
> #### Overlapping shards with forecast and future data
>
> Running `truncate-shards` on shards containing future timestamps can create
> overlapping shards with duplicate data points.
>
> [Understand the risks with future data](#understand-the-risks-with-future-data).

## Usage

```sh
Expand All @@ -40,3 +48,34 @@ _Also see [`influxd-ctl` global flags](/enterprise_influxdb/v1/tools/influxd-ctl
```bash
influxd-ctl truncate-shards -delay 3m
```

## Understand the risks with future data

> [!Important]
> If you need to rebalance shards that contain future data, contact [InfluxData support](https://www.influxdata.com/contact/) for assistance.

When you write data points with timestamps in the future (for example, forecast data from machine learning models),
the `truncate-shards` command behaves differently and can cause data duplication issues.

### How truncate-shards normally works

For shards containing current data:
1. The command creates an artificial stop point in the shard at the truncation timestamp
2. Creates a new shard starting from the truncation point
3. Example: A one-week shard (Sunday to Saturday) becomes:
- Shard A: Sunday to truncation point (Wednesday 2pm)
- Shard B: Truncation point (Wednesday 2pm) to Saturday

This works correctly because the meta nodes understand the boundaries and route queries appropriately.

### The problem with future data

For shards containing future timestamps:
1. The truncation doesn't cleanly split the shard at a point in time
2. Instead, it creates overlapping shards that cover the same time period
3. Example: If you're writing September forecast data in August:
- Original shard: September 1-7
- After truncation:
- Shard A: September 1-7 (with data up to truncation)
- Shard B: September 1-7 (for new data after truncation)
- **Result**: Duplicate data points for the same timestamps