Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,20 @@ cluster, and they use the
[`influxd-ctl` tool](/enterprise_influxdb/v1/tools/influxd-ctl/) available on
all meta nodes.

{{% warn %}}
Before you begin, stop writing historical data to InfluxDB.
Historical data have timestamps that occur at anytime in the past.
Performing a rebalance while writing historical data can lead to data loss.
{{% /warn %}}
> [!Warning]
> #### Stop writing data before rebalancing
>
> Before you begin, stop writing historical data to InfluxDB.
> Historical data have timestamps that occur at anytime in the past.
> Performing a rebalance while writing historical data can lead to data loss.

> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

## Rebalance Procedure 1: Rebalance a cluster to create space

Expand All @@ -67,6 +76,14 @@ Hot shards are shards that are currently receiving writes.
Performing any action on a hot shard can lead to data inconsistency within the
cluster which requires manual intervention from the user.

> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

To prevent data inconsistency, truncate hot shards before moving any shards
across data nodes.
The command below creates a new hot shard which is automatically distributed
Expand Down Expand Up @@ -298,6 +315,14 @@ Hot shards are shards that are currently receiving writes.
Performing any action on a hot shard can lead to data inconsistency within the
cluster which requires manual intervention from the user.

> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

To prevent data inconsistency, truncate hot shards before copying any shards
to the new data node.
The command below creates a new hot shard which is automatically distributed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ We recommend the following design guidelines for most use cases:
- [Where to store data (tag or field)](#where-to-store-data-tag-or-field)
- [Avoid too many series](#avoid-too-many-series)
- [Use recommended naming conventions](#use-recommended-naming-conventions)
- [Writing data with future timestamps](#writing-data-with-future-timestamps)
- [Shard Group Duration Management](#shard-group-duration-management)

## Where to store data (tag or field)
Expand Down Expand Up @@ -209,6 +210,38 @@ from(bucket:"<database>/<retention_policy>")
> SELECT mean("temp") FROM "weather_sensor" WHERE region = 'north'
```

## Writing data with future timestamps

When designing schemas for applications that write data with future timestamps--such as forecast data from machine learning models, predictions, or scheduled events--consider the following implications for InfluxDB Enterprise v1 cluster operations and data integrity.

### Understanding future data behavior

InfluxDB Enterprise v1 creates shards based on time ranges.
When you write data with future timestamps, InfluxDB creates shards that cover future time periods.

> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).

### Use separate databases for future data

When planning for data that contains future timestamps, consider isolating it in dedicated databases to:

- Minimize impact on real-time data operations
- Allow targeted maintenance operations on current vs. future data
- Simplify backup and recovery strategies for different data types

```sql
# Example: Separate databases for different data types
CREATE DATABASE "realtime_metrics"
CREATE DATABASE "ml_forecasts"
CREATE DATABASE "scheduled_predictions"
```

## Shard group duration management

### Shard group duration overview
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ The `influxd-ctl truncate-shards` command truncates all shards that are currentl
being written to (also known as "hot" shards) and creates new shards to write
new data to.

> [!Caution]
> #### Overlapping shards with forecast and future data
>
> Running `truncate-shards` on shards containing future timestamps can create
> overlapping shards with duplicate data points.
>
> [Understand the risks with future data](#understand-the-risks-with-future-data).

## Usage

```sh
Expand All @@ -40,3 +48,34 @@ _Also see [`influxd-ctl` global flags](/enterprise_influxdb/v1/tools/influxd-ctl
```bash
influxd-ctl truncate-shards -delay 3m
```

## Understand the risks with future data

> [!Important]
> If you need to rebalance shards that contain future data, contact [InfluxData support](https://www.influxdata.com/contact/) for assistance.

When you write data points with timestamps in the future (for example, forecast data from machine learning models),
the `truncate-shards` command behaves differently and can cause data duplication issues.

### How truncate-shards normally works

For shards containing current data:
1. The command creates an artificial stop point in the shard at the truncation timestamp
2. Creates a new shard starting from the truncation point
3. Example: A one-week shard (Sunday to Saturday) becomes:
- Shard A: Sunday to truncation point (Wednesday 2pm)
- Shard B: Truncation point (Wednesday 2pm) to Saturday

This works correctly because the meta nodes understand the boundaries and route queries appropriately.

### The problem with future data

For shards containing future timestamps:
1. The truncation doesn't cleanly split the shard at a point in time
2. Instead, it creates overlapping shards that cover the same time period
3. Example: If you're writing September forecast data in August:
- Original shard: September 1-7
- After truncation:
- Shard A: September 1-7 (with data up to truncation)
- Shard B: September 1-7 (for new data after truncation)
- **Result**: Duplicate data points for the same timestamps