Skip to content

Commit 3718389

Browse files
authored
Merge pull request #5734 from influxdata/jstirnaman/DAR-463
docs(partitioning): enhance best practices and time part templates do…
2 parents c2911bf + 5ddffb2 commit 3718389

File tree

20 files changed

+1071
-1842
lines changed

20 files changed

+1071
-1842
lines changed

content/influxdb/cloud-dedicated/admin/custom-partitions/_index.md

Lines changed: 4 additions & 404 deletions
Large diffs are not rendered by default.

content/influxdb/cloud-dedicated/admin/custom-partitions/best-practices.md

Lines changed: 4 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -8,49 +8,9 @@ menu:
88
name: Best practices
99
parent: Manage data partitioning
1010
weight: 202
11+
source: /shared/v3-distributed-admin-custom-partitions/best-practices.md
1112
---
1213

13-
Use the following best practices when defining custom partitioning strategies
14-
for your data stored in {{< product-name >}}.
15-
16-
- [Partition by tags that you commonly query for a specific value](#partition-by-tags-that-you-commonly-query-for-a-specific-value)
17-
- [Only partition by tags that _always_ have a value](#only-partition-by-tags-that-always-have-a-value)
18-
- [Avoid over-partitioning](#avoid-over-partitioning)
19-
20-
## Partition by tags that you commonly query for a specific value
21-
22-
Custom partitioning primarily benefits queries that look for a specific tag
23-
value in the `WHERE` clause. For example, if you often query data related to a
24-
specific ID, partitioning by the tag that stores the ID helps the InfluxDB
25-
query engine to more quickly identify what partitions contain the relevant data.
26-
27-
{{% note %}}
28-
29-
#### Use tag buckets for high-cardinality tags
30-
31-
Partitioning using distinct values of tags with many (10K+) unique values can
32-
actually hurt query performance as partitions are created for each unique tag value.
33-
Instead, use [tag buckets](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-bucket-part-templates)
34-
to partition by high-cardinality tags.
35-
This method of partitioning groups tag values into "buckets" and partitions by bucket.
36-
{{% /note %}}
37-
38-
## Only partition by tags that _always_ have a value
39-
40-
You should only partition by tags that _always_ have a value.
41-
If points don't have a value for the tag, InfluxDB can't store them in the correct partitions and, at query time, must read all the partitions.
42-
43-
## Avoid over-partitioning
44-
45-
As you plan your partitioning strategy, keep in mind that data can be
46-
"over-partitioned"--meaning partitions are so granular that queries end up
47-
having to retrieve and read many partitions from the object store, which
48-
hurts query performance.
49-
50-
- Balance the partition time interval with the actual amount of data written
51-
during each interval. If a single interval doesn't contain a lot of data,
52-
it is better to partition by larger time intervals.
53-
- Don't partition by tags that you typically don't use in your query workload.
54-
- Don't partition by distinct values of high-cardinality tags.
55-
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
56-
partition by these tags.
14+
<!--
15+
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/best-practices.md
16+
-->

content/influxdb/cloud-dedicated/admin/custom-partitions/define-custom-partitions.md

Lines changed: 3 additions & 155 deletions
Original file line numberDiff line numberDiff line change
@@ -10,161 +10,9 @@ weight: 202
1010
related:
1111
- /influxdb/cloud-dedicated/reference/cli/influxctl/database/create/
1212
- /influxdb/cloud-dedicated/reference/cli/influxctl/table/create/
13+
source: /shared/v3-distributed-admin-custom-partitions/define-custom-partitions.md
1314
---
1415

15-
Use the [`influxctl` CLI](/influxdb/cloud-dedicated/reference/cli/influxctl/)
16-
to define custom partition strategies when creating a database or table.
17-
By default, {{< product-name >}} partitions data by day.
18-
19-
The partitioning strategy of a database or table is determined by a
20-
[partition template](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-templates)
21-
which defines the naming pattern for [partition keys](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-keys).
22-
Partition keys uniquely identify each partition.
23-
When a partition template is applied to a database, it becomes the default template
24-
for all tables in that database, but can be overridden when creating a
25-
table.
26-
27-
- [Create a database with a custom partition template](#create-a-database-with-a-custom-partition-template)
28-
- [Create a table with a custom partition template](#create-a-table-with-a-custom-partition-template)
29-
- [Example partition templates](#example-partition-templates)
30-
31-
{{% warn %}}
32-
33-
#### Partition templates can only be applied on create
34-
35-
You can only apply a partition template when creating a database or table.
36-
You can't update a partition template on an existing resource.
37-
{{% /warn %}}
38-
39-
Use the following command flags to identify
40-
[partition template parts](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-part-templates):
41-
42-
- `--template-tag`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
43-
to use in the partition template.
44-
- `--template-tag-bucket`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
45-
and number of "buckets" to group tag values into.
46-
Provide the tag key and the number of buckets to bucket tag values into
47-
separated by a comma: `tagKey,N`.
48-
- `--template-timeformat`: A [Rust strftime date and time](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
49-
string that specifies the time format in the partition template and determines
50-
the time interval to partition by.
51-
52-
{{% note %}}
53-
A partition template can include up to 7 total tag and tag bucket parts
54-
and only 1 time part.
55-
{{% /note %}}
56-
57-
_View [partition template part restrictions](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#restrictions)._
58-
59-
{{% note %}}
60-
#### Always provide a time format when using custom partitioning
61-
62-
When defining a custom partition template for your database or table using any
63-
of the `influxctl` `--template-*` flags, always include the `--template-timeformat`
64-
flag with a time format to use in your partition template.
65-
Otherwise, InfluxDB omits time from the partition template and won't compact partitions.
66-
{{% /note %}}
67-
68-
## Create a database with a custom partition template
69-
70-
The following example creates a new `example-db` database and applies a partition
71-
template that partitions by distinct values of two tags (`room` and `sensor-type`),
72-
bucketed values of the `customerID` tag, and by day using the time format `%Y-%m-%d`:
73-
74-
<!--Skip database create and delete tests: namespaces aren't reusable-->
75-
<!--pytest.mark.skip-->
76-
77-
```sh
78-
influxctl database create \
79-
--template-tag room \
80-
--template-tag sensor-type \
81-
--template-tag-bucket customerID,500 \
82-
--template-timeformat '%Y-%m-%d' \
83-
example-db
84-
```
85-
86-
## Create a table with a custom partition template
87-
88-
The following example creates a new `example-table` table in the specified
89-
database and applies a partition template that partitions by distinct values of
90-
two tags (`room` and `sensor-type`), bucketed values of the `customerID` tag,
91-
and by month using the time format `%Y-%m`:
92-
93-
<!--Skip database create and delete tests: namespaces aren't reusable-->
94-
<!--pytest.mark.skip-->
95-
96-
{{% code-placeholders "DATABASE_NAME" %}}
97-
98-
```sh
99-
influxctl table create \
100-
--template-tag room \
101-
--template-tag sensor-type \
102-
--template-tag-bucket customerID,500 \
103-
--template-timeformat '%Y-%m' \
104-
DATABASE_NAME \
105-
example-table
106-
```
107-
108-
{{% /code-placeholders %}}
109-
110-
Replace the following in your command:
111-
112-
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} [database](/influxdb/cloud-dedicated/admin/databases/)
113-
114-
<!--actual test
115-
116-
```sh
117-
118-
# Test the preceding command outside of the code block.
119-
# influxctl authentication requires TTY interaction--
120-
# output the auth URL to a file that the host can open.
121-
122-
TABLE_NAME=table_TEST_RUN
123-
script -c "influxctl table create \
124-
--template-tag room \
125-
--template-tag sensor-type \
126-
--template-tag-bucket customerID,500 \
127-
--template-timeformat '%Y-%m' \
128-
DATABASE_NAME \
129-
$TABLE_NAME" \
130-
/dev/null > /shared/urls.txt
131-
132-
script -c "influxctl query \
133-
--database DATABASE_NAME \
134-
--token DATABASE_TOKEN \
135-
'SHOW TABLES'" > /shared/temp_tables.txt
136-
grep -q $TABLE_NAME /shared/temp_tables.txt
137-
rm /shared/temp_tables.txt
138-
```
139-
16+
<!--
17+
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_define-custom-partitions.md
14018
-->
141-
142-
## Example partition templates
143-
144-
Given the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
145-
with a `2024-01-01T00:00:00Z` timestamp:
146-
147-
```text
148-
prod,line=A,station=weld1 temp=81.9,qty=36i 1704067200000000000
149-
```
150-
151-
##### Partitioning by distinct tag values
152-
153-
| Description | Tag parts | Time part | Resulting partition key |
154-
| :---------------------- | :---------------- | :--------- | :----------------------- |
155-
| By day (default) | | `%Y-%m-%d` | 2024-01-01 |
156-
| By month | | `%Y-%m` | 2024-01 |
157-
| By year | | `%Y` | 2024 |
158-
| Single tag, by day | `line` | `%Y-%m-%d` | A \| 2024-01-01 |
159-
| Single tag, by month | `line` | `%Y-%m` | A \| 2024-01 |
160-
| Single tag, by year | `line` | `%Y` | A \| 2024 |
161-
| Multiple tags, by day | `line`, `station` | `%Y-%m-%d` | A \| weld1 \| 2024-01-01 |
162-
| Multiple tags, by month | `line`, `station` | `%Y-%m` | A \| weld1 \| 2024-01 |
163-
| Multiple tags, by year | `line`, `station` | `%Y` | A \| weld1 \| 2024 |
164-
165-
##### Partition by tag buckets
166-
167-
| Description | Tag part | Tag bucket part | Time part | Resulting partition key |
168-
| :---------------------------------- | :------- | :-------------- | :--------- | :---------------------- |
169-
| Distinct tag, tag buckets, by day | `line` | `station,100` | `%Y-%m-%d` | A \| 3 \| 2024-01-01 |
170-
| Distinct tag, tag buckets, by month | `line` | `station,500` | `%Y-%m` | A \| 303 \| 2024-01 |

content/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates.md

Lines changed: 4 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -8,124 +8,9 @@ menu:
88
influxdb_cloud_dedicated:
99
parent: Manage data partitioning
1010
weight: 202
11+
source: /shared/v3-distributed-admin-custom-partitions/partition-templates.md
1112
---
1213

13-
Use partition templates to define the patterns used to generate partition keys.
14-
A partition key uniquely identifies a partition and is used to name the partition
15-
Parquet file in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
16-
17-
A partition template consists of 1-8 _template parts_---dimensions to partition data by.
18-
Three types of template parts exist:
19-
20-
- **tag**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
21-
to partition by.
22-
- **tag bucket**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
23-
and number of "buckets" to group tag values into. Data is partitioned by the
24-
tag bucket rather than each distinct tag value.
25-
- {{< req type="key" >}} **time**: A Rust strftime date and time string that specifies the time interval
26-
to partition data by. The smallest unit of time included in the time part
27-
template is the interval used to partition data.
28-
29-
{{% note %}}
30-
A partition template must include 1 [time part](#time-part-templates)
31-
and can include up to 7 total [tag](#tag-part-templates) and [tag bucket](#tag-bucket-part-templates) parts.
32-
{{% /note %}}
33-
34-
<!-- TOC -->
35-
- [Restrictions](#restrictions)
36-
- [Template part size limit](#template-part-size-limit)
37-
- [Reserved keywords](#reserved-keywords)
38-
- [Reserved Characters](#reserved-characters)
39-
- [Tag part templates](#tag-part-templates)
40-
- [Tag bucket part templates](#tag-bucket-part-templates)
41-
- [Time part templates](#time-part-templates)
42-
<!-- /TOC -->
43-
44-
## Restrictions
45-
46-
### Template part size limit
47-
48-
Each template part is limited to 200 bytes in length.
49-
Anything longer will be truncated at 200 bytes and appended with `#`.
50-
51-
### Partition key size limit
52-
53-
With the truncation of template parts, the maximum length of a partition key is
54-
1,607 bytes (1.57 KiB).
55-
56-
### Reserved keywords
57-
58-
The following reserved keywords cannot be used in partition templates:
59-
60-
- `time`
61-
62-
### Reserved Characters
63-
64-
If used in template parts, non-ASCII characters and the following reserved
65-
characters must be [percent encoded](https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding):
66-
67-
- `|`: Partition key part delimiter
68-
- `!`: Null or missing partition key part
69-
- `^`: Empty string partition key part
70-
- `#`: Key part truncation marker
71-
- `%`: Required for unambiguous reversal of percent encoding
72-
73-
## Tag part templates
74-
75-
Tag part templates consist of a _tag key_ to partition by.
76-
Generated partition keys include the unique _tag value_ specific to each partition.
77-
78-
A partition template may include a given tag key only once in template parts
79-
that operate on tags (tag value and tag bucket)--for example:
80-
81-
If a template partitions on unique values of `tag_A`, then
82-
you can't use `tag_A` as a tag bucket part.
83-
84-
## Tag bucket part templates
85-
86-
Tag bucket part templates consist of a _tag key_ to partition by and the
87-
_number of "buckets" to partition tag values into_--for example:
88-
89-
```
90-
customerID,500
91-
```
92-
93-
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
94-
Each bucket is identified by the remainder of the tag value hashed into a 32bit
95-
integer divided by the specified number of buckets:
96-
97-
```rust
98-
hash(tagValue) % N
99-
```
100-
101-
Generated partition keys include the unique _tag bucket identifier_ specific to
102-
each partition.
103-
104-
**Supported number of tag buckets**: 1-1,000
105-
106-
{{% note %}}
107-
Tag buckets should be used to partition by high cardinality tags or tags with an
108-
unknown number of distinct values.
109-
{{% /note %}}
110-
111-
A partition template may include a given tag key only once in template parts
112-
that operate on tags (tag value and tag bucket)--for example:
113-
114-
If a template partitions on unique values of `tag_A`, then
115-
you can't use `tag_A` as a tag bucket part.
116-
117-
## Time part templates
118-
119-
Time part templates use a limited subset of the
120-
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
121-
to specify time format in partition keys.
122-
InfluxDB uses the smallest unit of time included in the time part template as
123-
the partition interval.
124-
125-
### Date specifiers
126-
127-
| Variable | Example | Description |
128-
| :------: | :----------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
129-
| `%Y` | `2001` | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). |
130-
| `%m` | `07` | Month number (01--12), zero-padded to 2 digits. |
131-
| `%d` | `08` | Day number (01--31), zero-padded to 2 digits. |
14+
<!--
15+
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_partition-templates.md
16+
-->

0 commit comments

Comments
 (0)