Skip to content

Commit d961c8e

Browse files
committed
fix(partitioning): improve clarity and consistency in partitioning, apply suggestions from @reidkaufmann
1 parent 3718389 commit d961c8e

File tree

2 files changed

+18
-27
lines changed

2 files changed

+18
-27
lines changed

content/shared/v3-distributed-admin-custom-partitions/best-practices.md

Lines changed: 15 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -33,39 +33,30 @@ If points don't have a value for the tag, InfluxDB can't store them in the corre
3333

3434
## Avoid over-partitioning
3535

36-
As you plan your partitioning strategy, keep in mind that data can be
37-
"over-partitioned"--meaning partitions are so granular that queries end up
38-
having to retrieve and read many partitions from the object store, which
39-
hurts query performance.
40-
41-
- Balance the partition time interval with the actual amount of data written
42-
during each interval. If a single interval doesn't contain a lot of data,
43-
it is better to partition by larger time intervals.
44-
- Don't partition by tags that you typically don't use in your query workload.
45-
- Don't partition by distinct values of high-cardinality tags.
46-
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
47-
partition by these tags.
36+
As you plan your partitioning strategy, keep in mind that over-partitioning your data can hurt query performance. If partitions are too granular, queries may need to retrieve and read many partitions from the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
37+
38+
- Balance the partition time interval with the actual amount of data written during each interval. If a single interval doesn't contain a lot of data, partition by larger time intervals.
39+
- Avoid partitioning by tags that you typically don't use in your query workload.
40+
- Avoid partitioning by distinct values of high-cardinality tags. Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to partition by these tags.
4841

4942
## Limit the number of partition files
5043

51-
Avoid exceeding **10,000** total partition files.
44+
Avoid exceeding **10,000** total partitions.
5245
Limiting the total partition count can help manage system performance and costs.
5346

54-
While planning your strategy include the following steps to keep the total
55-
partition count below 10,000 files over the next few years:
47+
While planning your strategy, take the following steps to limit your total
48+
partition count.
49+
We currently recommend planning to keep the total partition count below 10,000.
5650

5751
- [Estimate the total partition count](#estimate-the-total-partition-count) for the lifespan of your data
58-
- Take the following steps to limit the total partition count:
59-
60-
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
61-
to prevent the number of files from growing unbounded.
62-
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
63-
and creating too many partition files.
64-
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
52+
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
53+
to prevent the number of partitions from growing unbounded
54+
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
55+
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
6556

6657
### Estimate the total partition count
6758

68-
Use the following formula to estimate the total partition file count over the
59+
Use the following formula to estimate the total partition count over the
6960
lifetime of the database (or retention period):
7061

7162
```text
@@ -75,4 +66,4 @@ total_partition_count = (cardinality_of_partitioned_tag) * (data_lifespan / part
7566
- `total_partition_count`: The number of partition files in [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)
7667
- `cardinality_of_partitioned_tag`: The number of distinct values for a tag
7768
- `data_lifespan`: The [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period), if set, or the expected lifetime of the database
78-
- `partition_duration`: The partition time interval, defined by the [tine part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
69+
- `partition_duration`: The partition time interval, defined by the [time part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)

content/shared/v3-distributed-admin-custom-partitions/partition-templates.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ customerID,500
7979
```
8080

8181
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
82-
Each bucket is identified by the remainder of the tag value hashed into a 32bit
82+
Each bucket is identified by the remainder of the tag value hashed into a 32-bit
8383
integer divided by the specified number of buckets:
8484

8585
```rust
@@ -108,8 +108,8 @@ Time part templates use a limited subset of the
108108
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
109109
to specify time format in partition keys.
110110
Time part templates can be daily (`%Y-%m-%d`), monthly (`%Y-%m`), or yearly (`%Y`).
111-
InfluxDB uses the smallest unit of time included in the time part template as
112-
the partition interval.
111+
InfluxDB partitions data by the smallest unit of time included in the time part
112+
template.
113113

114114
InfluxDB supports only [date specifiers](#date-specifiers) in time part templates.
115115

0 commit comments

Comments
 (0)