Skip to content

Commit 7f8bde4

Browse files
committed
fix(v3): Update Cloud Dedicated and Clustered column limit to 1000 and clarify the potential impact of wide schemas.
1 parent 78b9925 commit 7f8bde4

File tree

11 files changed

+382
-140
lines changed

11 files changed

+382
-140
lines changed

content/influxdb/cloud-dedicated/admin/databases/_index.md

Lines changed: 39 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,14 @@ menu:
1111
parent: Administer InfluxDB Cloud
1212
weight: 101
1313
influxdb/cloud-dedicated/tags: [databases]
14+
related:
15+
- /influxdb/cloud-dedicated/write-data/best-practices/schema-design/
16+
- /influxdb/cloud-dedicated/reference/cli/influxctl/
17+
alt_links:
18+
cloud: /influxdb/cloud/admin/buckets/
19+
cloud_serverless: /influxdb/cloud-serverless/admin/buckets/
20+
clustered: /influxdb/clustered/admin/databases/
21+
oss: /influxdb/v2/admin/buckets/
1422
---
1523

1624
An InfluxDB database is a named location where time series data is stored.
@@ -19,11 +27,13 @@ Each InfluxDB database has a [retention period](#retention-periods).
1927
{{% note %}}
2028
**If coming from InfluxDB v1**, the concepts of databases and retention policies
2129
have been combined into a single concept--database. Retention policies are no
22-
longer part of the InfluxDB data model. However, InfluxDB Cloud Dedicated does
30+
longer part of the InfluxDB data model.
31+
However, {{% product-name %}} does
2332
support InfluxQL, which requires databases and retention policies.
2433
See [InfluxQL DBRP naming convention](/influxdb/cloud-dedicated/admin/databases/create/#influxql-dbrp-naming-convention).
2534

26-
**If coming from InfluxDB v2 or InfluxDB Cloud**, _database_ and _bucket_ are synonymous.
35+
**If coming from InfluxDB v2, InfluxDB Cloud (TSM), or InfluxDB Cloud Serverless**,
36+
_database_ and _bucket_ are synonymous.
2737
{{% /note %}}
2838

2939
## Retention periods
@@ -40,9 +50,10 @@ never be removed by the retention enforcement service.
4050

4151
## Table and column limits
4252

43-
In {{< product-name >}}, table (measurement) and column limits can be
44-
customized when [creating](#create-a-database) or
45-
[updating a database](#update-a-database).
53+
You can customize [table (measurement) limits](#table-limit) and
54+
[table column limits](#column-limit) when you
55+
[create](#create-a-database) or
56+
[update a database](#update-a-database) in {{< product-name >}}.
4657

4758
### Table limit
4859

@@ -72,7 +83,7 @@ data by measurement and time range and stores each partition as a Parquet
7283
file in your cluster's object store. By increasing the number of measurements
7384
(tables) you can store in your database, you also increase the potential for
7485
more `PUT` requests into your object store as InfluxDB creates more partitions.
75-
Each `PUT` request incurs a monetary cost and will increase the operating cost of
86+
Each `PUT` request incurs a monetary cost and increases the operating cost of
7687
your cluster.
7788

7889
{{% /expand %}}
@@ -89,22 +100,33 @@ operating cost of your cluster.
89100

90101
### Column limit
91102

92-
**Default maximum number of columns**: 250
103+
**Default maximum number of columns**: 1000
104+
105+
A table can contain **up to 1000 columns**.
106+
Each row must include a time column, with the remaining columns representing
107+
tags and fields.
108+
As a result, a table can have one time column and up to 999 field and tag columns.
109+
110+
When creating or updating a database, you can configure the table column limit to be
111+
lower than 1000, based on your requirements.
112+
After you update the column limit for a database, the limit applies to newly
113+
created tables; it doesn't override the column limit for existing tables.
114+
115+
If you attempt to write to a table and exceed the column limit, the write
116+
request fails and InfluxDB returns an error.
93117

94-
Time, fields, and tags are each represented by a column in a table.
95118
Increasing your column limit affects your {{% product-name omit=" Clustered" %}}
96119
cluster in the following ways:
97120

98121
{{< expand-wrapper >}}
99-
{{% expand "May adversely affect query performance" %}}
100-
101-
At query time, the InfluxDB query engine identifies what table contains the queried
102-
data and then evaluates each row in the table to match the conditions of the query.
103-
The more columns that are in each row, the longer it takes to evaluate each row.
104-
105-
Through performance testing, InfluxData has identified 250 columns as the
106-
threshold where query performance may be affected
107-
(depending on the shape of and data types in your schema).
122+
{{% expand "May adversely affect system performance" %}}
123+
124+
InfluxData identified 1000 columns as the safe limit for maintaining system
125+
performance and stability.
126+
Exceeding this threshold can result in
127+
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas),
128+
which can negatively impact performance and resource use,
129+
depending on the shape of your schema and data types in the schema.
108130

109131
{{% /expand %}}
110132
{{< /expand-wrapper >}}

content/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries.md

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,31 +15,31 @@ related:
1515
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
1616
aliases:
1717
- /influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries/
18+
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
1819
---
1920

2021
Optimize SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
2122
Learn how to use observability tools to analyze query execution and view metrics.
2223

2324
- [Why is my query slow?](#why-is-my-query-slow)
2425
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
26+
- [Query only the data you need](#query-only-the-data-you-need)
2527
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
2628

2729
## Why is my query slow?
2830

2931
Query performance depends on time range and complexity.
3032
If a query is slower than you expect, it might be due to the following reasons:
3133

32-
- It queries data from a large time range.
34+
- It queries data from a large time range.
3335
- It includes intensive operations, such as querying many string values or `ORDER BY` sorting or re-sorting large amounts of data.
3436

3537
## Strategies for improving query performance
3638

3739
The following design strategies generally improve query performance and resource use:
3840

3941
- Follow [schema design best practices](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/) to make querying easier and more performant.
40-
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that filters data by a time range.
41-
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
42-
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
42+
- [Query only the data you need](#query-only-the-data-you-need).
4343
- [Downsample data](/influxdb/cloud-dedicated/process-data/downsample/) to reduce the amount of data you need to query.
4444

4545
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
@@ -52,9 +52,39 @@ Some bottlenecks may be out of your control and are the result of a suboptimal e
5252
{{% note %}}
5353
#### Analyze query plans to view metrics and recognize bottlenecks
5454

55-
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/).
55+
To view runtime metrics for a query, such as the number of files scanned, use
56+
the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze)
57+
and learn how to [analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/).
5658
{{% /note %}}
5759

60+
### Query only the data you need
61+
62+
#### Include a WHERE clause
63+
64+
InfluxDB v3 stores data in a Parquet file for each measurement and day, and
65+
retrieves files from the Object store to answer a query.
66+
To reduce the number of files that a query needs to retrieve from the Object store,
67+
include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that
68+
filters data by a time range.
69+
70+
#### SELECT only columns you need
71+
72+
Because InfluxDB v3 is a columnar database, it only processes the columns
73+
selected in a query, which can mitigate the query performance impact of
74+
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas).
75+
76+
However, a non-specific query that retrieves a large number of columns from a
77+
wide schema can be slower and less efficient than a more targeted
78+
query--for example, consider the following queries:
79+
80+
- `SELECT time,a,b,c`
81+
- `SELECT *`
82+
83+
If the table contains 10 columns, the difference in performance between the
84+
two queries is minimal.
85+
In a table with over 1000 columns, the `SELECT *` query is slower and
86+
less efficient.
87+
5888
## Analyze and troubleshoot queries
5989

6090
Use the following tools to analyze and troubleshoot queries and find performance bottlenecks:

content/influxdb/cloud-dedicated/reference/cli/influxctl/database/create.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ influxctl database create [flags] <DATABASE_NAME>
104104
| :--- | :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
105105
| | `--retention-period` | Database retention period (default is `0s`, infinite) |
106106
| | `--max-tables` | Maximum tables per database (default is 500, `0` uses default) |
107-
| | `--max-columns` | Maximum columns per table (default is 250, `0` uses default) |
107+
| | `--max-columns` | Maximum columns per table (default is 1000, `0` uses default) |
108108
| | `--template-tag` | Tag to add to partition template (can include multiple of this flag) |
109109
| | `--template-tag-bucket` | Tag and number of buckets to partition tag values into separated by a comma--for example: `tag1,100` (can include multiple of this flag) |
110110
| | `--template-timeformat` | Timestamp format for partition template (default is `%Y-%m-%d`) |

content/influxdb/cloud-dedicated/write-data/best-practices/schema-design.md

Lines changed: 72 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ menu:
88
name: Schema design
99
weight: 201
1010
parent: write-best-practices
11+
related:
12+
- /influxdb/cloud-dedicated/admin/databases/
13+
- /influxdb/cloud-dedicated/reference/cli/influxctl/
14+
- /influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/
1115
---
1216

1317
Use the following guidelines to design your [schema](/influxdb/cloud-dedicated/reference/glossary/#schema)
@@ -18,7 +22,7 @@ for simpler and more performant queries.
1822
- [Tags versus fields](#tags-versus-fields)
1923
- [Schema restrictions](#schema-restrictions)
2024
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
21-
- [Tables can contain up to 250 columns](#tables-can-contain-up-to-250-columns)
25+
- [Maximum number of columns per table](#maximum-number-of-columns-per-table)
2226
- [Design for performance](#design-for-performance)
2327
- [Avoid wide schemas](#avoid-wide-schemas)
2428
- [Avoid sparse schemas](#avoid-sparse-schemas)
@@ -37,10 +41,13 @@ Tables contain multiple tags and fields.
3741
<!-- vale InfluxDataDocs.v3Schema = NO -->
3842

3943
- **Database**: A named location where time series data is stored.
40-
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
44+
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB
45+
Cloud Serverless and InfluxDB TSM implementations.
46+
4147
A database can contain multiple _tables_.
4248
- **Table**: A logical grouping for time series data.
43-
In {{% product-name %}}, _table_ is synonymous with _measurement_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
49+
In {{% product-name %}}, _table_ is synonymous with _measurement_ in
50+
InfluxDB Cloud Serverless and InfluxDB TSM implementations.
4451
All _points_ in a given table should have the same _tags_.
4552
A table contains multiple _tags_ and _fields_.
4653
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
@@ -52,7 +59,9 @@ Tables contain multiple tags and fields.
5259
Field values may be null, but at least one field value is not null on any given row.
5360
- **Timestamp**: Timestamp associated with the data.
5461
When stored on disk and queried, all data is ordered by time.
55-
In InfluxDB, a timestamp is a nanosecond-scale [Unix timestamp](/influxdb/cloud-dedicated/reference/glossary/#unix-timestamp) in UTC.
62+
In InfluxDB, a timestamp is a nanosecond-scale
63+
[Unix timestamp](/influxdb/cloud-dedicated/reference/glossary/#unix-timestamp)
64+
in UTC.
5665
A timestamp is never null.
5766

5867
{{% note %}}
@@ -91,8 +100,9 @@ question as you design your schema.
91100
- String
92101
- Boolean
93102

94-
{{% product-name %}} doesn't index tag values or field values.
95-
Tag keys, field keys, and other metadata are indexed to optimize performance.
103+
{{% product-name %}} indexes tag keys, field keys, and other metadata
104+
to optimize performance.
105+
It doesn't index tag values or field values.
96106

97107
{{% note %}}
98108
The InfluxDB v3 storage engine supports infinite tag value and series cardinality.
@@ -106,26 +116,37 @@ cardinality doesn't affect the overall performance of your database.
106116

107117
### Do not use duplicate names for tags and fields
108118

109-
Tags and fields within the same table can't be named the same.
110-
All tags and fields are stored as unique columns in a table representing the
111-
table on disk.
119+
Use unique names for tags and fields within the same table.
120+
{{% product-name %}} stores tags and fields as unique columns in a table that
121+
represents the table on disk.
112122
If you attempt to write a table that contains tags or fields with the same name,
113123
the write fails due to a column conflict.
114124

115-
### Tables can contain up to 250 columns
125+
### Maximum number of columns per table
126+
127+
A table has a [maximum number of columns](/influxdb/cloud-dedicated/admin/databases/#column-limit).
128+
Each row must include a time column.
129+
As a result, a table can have the following:
130+
131+
- a time column
132+
- field and tag columns up to the configured maximum.
116133

117-
A table can contain **up to 250 columns**. Each row requires a time column,
118-
but the rest represent tags and fields stored in the table.
119-
Therefore, a table can contain one time column and 249 total field and tag columns.
120-
If you attempt to write to a table and exceed the 250 column limit, the
121-
write request fails and InfluxDB returns an error.
134+
If you attempt to write to a table and exceed the column limit, then the write
135+
request fails and InfluxDB returns an error.
136+
137+
InfluxData identified 1000 columns as the safe limit for maintaining system
138+
performance and stability.
139+
Exceeding this threshold can result in
140+
[wide schemas](#avoid-wide-schemas), which can negatively impact performance
141+
and resource use, depending on the shape and data types in your schema.
122142

123143
---
124144

125145
## Design for performance
126146

127-
How you structure your schema within a table can affect the overall
128-
performance of queries against that table.
147+
How you structure your schema within a table can affect resource use and
148+
the performance of queries against that table.
149+
129150
The following guidelines help to optimize query performance:
130151

131152
- [Avoid wide schemas](#avoid-wide-schemas)
@@ -135,26 +156,45 @@ The following guidelines help to optimize query performance:
135156

136157
### Avoid wide schemas
137158

138-
A wide schema is one with many tags and fields and corresponding columns for each.
139-
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
140-
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
159+
A wide schema refers to a schema with a large number of columns (tags and fields).
141160

142-
Although a wide schema won't affect query performance, it can lead to the following:
161+
Wide schemas can lead to the following issues:
143162

144-
- More resources required for persisting and compacting data during ingestion.
145-
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
163+
- Increased resource usage for persisting and compacting data during ingestion.
164+
- Reduced sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
165+
- Reduced query performance when [using non-specific queries](#avoid-non-specific-queries).
146166

147-
The InfluxDB v3 storage engine has a
148-
[limit of 250 columns per table](#tables-can-contain-up-to-250-columns).
167+
To prevent wide schema issues, limit the number of tags and fields stored in a table.
168+
If you need to store more than the [maximum number of columns](/influxdb/cloud-dedicated/admin/databases/),
169+
consider segmenting your fields into separate tables.
149170

150-
To avoid a wide schema, limit the number of tags and fields stored in a table.
151-
If you need to store more than 249 total tags and fields, consider segmenting
152-
your fields into a separate table.
171+
#### Avoid non-specific queries
172+
173+
Because InfluxDB v3 is a columnar database, it only processes the columns
174+
selected in a query, which can mitigate the query performance impact of wide schemas.
175+
If you [query only the data that you need](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries/#strategies-for-improving-query-performance),
176+
then a wide schema might not impact query performance.
177+
178+
However, a non-specific query that retrieves a large number of columns from a
179+
wide schema
180+
is slower and less efficient than a more targeted query--for example, consider
181+
the following queries:
182+
183+
- `SELECT time,a,b,c`
184+
- `SELECT *`
185+
186+
If the table contains 10 columns, the difference in performance between the
187+
two queries is minimal.
188+
In a table with over 1000 columns, the `SELECT *` query is slower and
189+
less efficient.
153190

154191
#### Avoid too many tags
155192

156-
In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-dedicated/reference/glossary/#tag-key) and [tag values](/influxdb/cloud-dedicated/reference/glossary/#tag-value) on the point.
157-
A point that contains more tags has a more complex primary key, which could impact sorting performance if you sort using all parts of the key.
193+
In InfluxDB, the primary key for a row is the combination of the point's
194+
timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-dedicated/reference/glossary/#tag-key)
195+
and [tag values](/influxdb/cloud-dedicated/reference/glossary/#tag-value) on the point.
196+
A point that contains more tags has a more complex primary key, which could
197+
impact sorting performance if you sort using all parts of the key.
158198

159199
### Avoid sparse schemas
160200

@@ -275,7 +315,8 @@ Without regular expressions, your queries will be easier to write and more perfo
275315

276316
#### Not recommended {.orange}
277317

278-
For example, consider the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/) that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
318+
For example, consider the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
319+
that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
279320

280321
```text
281322
home,sensor=loc-kitchen.model-A612.id-1726ZA temp=72.1

content/influxdb/cloud-serverless/admin/buckets/_index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,12 @@ weight: 105
1212
influxdb/cloud-serverless/tags: [buckets]
1313
aliases:
1414
- /influxdb/cloud-serverless/organizations/buckets/
15+
- /influxdb/cloud-serverless/admin/databases/
1516
alt_links:
1617
cloud: /influxdb/cloud/admin/buckets/
18+
cloud_dedicated: /influxdb/cloud-dedicated/admin/databases/
19+
clustered: /influxdb/clustered/admin/databases/
20+
oss: /influxdb/v2/admin/buckets/
1721
---
1822

1923
A **bucket** is a named location where time series data is stored.
@@ -30,6 +34,8 @@ support InfluxQL and the InfluxDB v1 API `/write` and `/query` endpoints, which
3034
See how to [map v1 databases and retention policies to buckets](/influxdb/cloud-serverless/guides/api-compatibility/v1/#map-v1-databases-and-retention-policies-to-buckets).
3135

3236
**If coming from InfluxDB v2 or InfluxDB Cloud**, _buckets_ are functionally equivalent.
37+
38+
**If coming from InfluxDB Cloud Dedicated or InfluxDB Clustered**, _database_ and _bucket_ are synonymous.
3339
{{% /note %}}
3440

3541
## Retention period

0 commit comments

Comments
 (0)