Skip to content

Commit 0904ee3

Browse files
committed
Merge branch 'main' of https://github.com/clickhouse/clickhouse-docs into hyperdx_cloud_docs
2 parents 6a93bdf + 9d513d0 commit 0904ee3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1227
-256
lines changed

.github/workflows/check-build.yml

Lines changed: 21 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ jobs:
1616
runs-on: ubuntu-latest
1717
strategy:
1818
matrix:
19-
check_type: [spellcheck, kbcheck, md-lint]
19+
check_type: [spellcheck, kbcheck, md-lint, glossary-check]
2020
steps:
2121
# Add setup steps per check here
2222
- uses: actions/checkout@v4
2323
- name: Install Aspell
2424
if: matrix.check_type == 'spellcheck'
2525
run: sudo apt-get update && sudo apt-get install -y aspell aspell-en
2626
- name: Set up Python
27-
if: matrix.check_type == 'kbcheck'
27+
if: matrix.check_type == 'kbcheck' || matrix.check_type == 'glossary-check'
2828
run: |
2929
curl -Ls https://astral.sh/uv/install.sh | sh
3030
uv clean
@@ -39,31 +39,25 @@ jobs:
3939
run: yarn add -D markdownlint-cli2
4040

4141
# Run the checks here
42-
- name: Run checks
43-
id: check_step
44-
run: |
45-
if [[ "${{ matrix.check_type }}" == "spellcheck" ]]; then
46-
yarn check-spelling
47-
exit_code=$?
48-
elif [[ "${{ matrix.check_type }}" == "kbcheck" ]]; then
49-
yarn check-kb
50-
exit_code=$?
51-
elif [[ "${{ matrix.check_type }}" == "md-lint" ]]; then
52-
yarn check-markdown
53-
exit_code=$?
54-
fi
55-
56-
if [[ $exit_code -ne 0 ]]; then
57-
echo "::error::${{ matrix.check_type }} check failed. See logs for details."
58-
exit 1
59-
fi
42+
- name: Run spellcheck
43+
if: matrix.check_type == 'spellcheck'
44+
run: yarn check-spelling
6045

61-
- name: Set check status
62-
if: steps.check_step.outcome != 'success'
63-
uses: actions/github-script@v6
64-
with:
65-
script: |
66-
core.setFailed('${{ matrix.check_type }} check failed.');
46+
- name: Run KB check
47+
if: matrix.check_type == 'kbcheck'
48+
run: yarn check-kb
49+
50+
- name: Run markdown lint
51+
if: matrix.check_type == 'md-lint'
52+
run: yarn check-markdown
53+
54+
- name: Run glossary check
55+
if: matrix.check_type == 'glossary-check'
56+
run: |
57+
echo "Extracting glossary from markdown..."
58+
python3 scripts/glossary/extract-glossary-terms.py
59+
echo "Checking glossary coverage..."
60+
python3 scripts/glossary/wrap-glossary-terms.py --check || echo "::warning::Glossary check found unwrapped terms (non-blocking)"
6761
6862
check_overall_status:
6963
needs: stylecheck
@@ -74,5 +68,4 @@ jobs:
7468
if: needs.stylecheck.result != 'success'
7569
run: |
7670
echo "::error::One or more checks of the style check failed."
77-
exit 1
78-
71+
exit 1

docs/best-practices/partitioning_keys.mdx

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,12 @@ import partitions from '@site/static/images/bestpractices/partitions.png';
1212
import merges_with_partitions from '@site/static/images/bestpractices/merges_with_partitions.png';
1313

1414
:::note A data management technique
15-
Partitioning is primarily a data management technique and not a query optimization tool, and while it can improve performance in specific workloads, it should not be the first mechanism used to accelerate queries; the partitioning key must be chosen carefully, with a clear understanding of its implications, and only applied when it aligns with data life cycle needs or well-understood access patterns.
15+
Partitioning is primarily a data management technique and not a query optimization tool, and while it can improve performance in specific workloads, it should not be the first mechanism used to accelerate queries; the ^^partitioning key^^ must be chosen carefully, with a clear understanding of its implications, and only applied when it aligns with data life cycle needs or well-understood access patterns.
1616
:::
1717

18-
In ClickHouse, partitioning organizes data into logical segments based on a specified key. This is defined using the `PARTITION BY` clause at table creation time and is commonly used to group rows by time intervals, categories, or other business-relevant dimensions. Each unique value of the partitioning expression forms its own physical partition on disk, and ClickHouse stores data in separate parts for each of these values. Partitioning improves data management, simplifies retention policies, and can help with certain query patterns.
18+
In ClickHouse, partitioning organizes data into logical segments based on a specified key. This is defined using the `PARTITION BY` clause at table creation time and is commonly used to group rows by time intervals, categories, or other business-relevant dimensions. Each unique value of the partitioning expression forms its own physical partition on disk, and ClickHouse stores data in separate ^^parts^^ for each of these values. Partitioning improves data management, simplifies retention policies, and can help with certain query patterns.
1919

20-
For example, consider the following UK price paid dataset table with a partitioning key of `toStartOfMonth(date)`.
20+
For example, consider the following UK price paid dataset table with a ^^partitioning key^^ of `toStartOfMonth(date)`.
2121

2222
```sql
2323
CREATE TABLE uk.uk_price_paid_simple_partitioned
@@ -40,28 +40,28 @@ The ClickHouse server first splits the rows from the example insert with 4 rows
4040

4141
For a more detailed explanation of partitioning, we recommend [this guide](/partitions).
4242

43-
With partitioning enabled, ClickHouse only [merges](/merges) data parts within, but not across partitions. We sketch that for our example table from above:
43+
With partitioning enabled, ClickHouse only [merges](/merges) data ^^parts^^ within, but not across partitions. We sketch that for our example table from above:
4444

4545
<Image img={merges_with_partitions} size="md" alt="Partitions" />
4646

4747
## Applications of partitioning {#applications-of-partitioning}
4848

49-
Partitioning is a powerful tool for managing large datasets in ClickHouse, especially in observability and analytics use cases. It enables efficient data life cycle operations by allowing entire partitions, often aligned with time or business logic, to be dropped, moved, or archived in a single metadata operation. This is significantly faster and less resource-intensive than row-level delete or copy operations. Partitioning also integrates cleanly with ClickHouse features like TTL and tiered storage, making it possible to implement retention policies or hot/cold storage strategies without custom orchestration. For example, recent data can be kept on fast SSD-backed storage, while older partitions are automatically moved to cheaper object storage.
49+
Partitioning is a powerful tool for managing large datasets in ClickHouse, especially in observability and analytics use cases. It enables efficient data life cycle operations by allowing entire partitions, often aligned with time or business logic, to be dropped, moved, or archived in a single metadata operation. This is significantly faster and less resource-intensive than row-level delete or copy operations. Partitioning also integrates cleanly with ClickHouse features like ^^TTL^^ and tiered storage, making it possible to implement retention policies or hot/cold storage strategies without custom orchestration. For example, recent data can be kept on fast SSD-backed storage, while older partitions are automatically moved to cheaper object storage.
5050

5151
While partitioning can improve query performance for some workloads, it can also negatively impact response time.
5252

53-
If the partitioning key is not in the primary key and you are filtering by it, users may see an improvement in query performance with partitioning. See [here](/partitions#query-optimization) for an example.
53+
If the ^^partitioning key^^ is not in the ^^primary key^^ and you are filtering by it, users may see an improvement in query performance with partitioning. See [here](/partitions#query-optimization) for an example.
5454

55-
Conversely, if queries need to query across partitions performance may be negatively impacted due to a higher number of total parts. For this reason, users should understand their access patterns before considering partitioning a a query optimization technique.
55+
Conversely, if queries need to query across partitions performance may be negatively impacted due to a higher number of total ^^parts^^. For this reason, users should understand their access patterns before considering partitioning a a query optimization technique.
5656

5757
In summary, users should primarily think of partitioning as a data management technique. For an example of managing data, see ["Managing Data"](/observability/managing-data) from the observability use-case guide and ["What are table partitions used for?"](/partitions#data-management) from Core Concepts - Table partitions.
5858

59-
## Choose a low cardinality partitioning key {#choose-a-low-cardinality-partitioning-key}
59+
## Choose a low cardinality ^^partitioning key^^ {#choose-a-low-cardinality-partitioning-key}
6060

61-
Importantly, a higher number of parts will negatively affect query performance. ClickHouse will therefore respond to inserts with a [“too many parts”](/knowledgebase/exception-too-many-parts) error if the number of parts exceeds specified limits either in [total](/operations/settings/merge-tree-settings#max_parts_in_total) or [per partition](/operations/settings/merge-tree-settings#parts_to_throw_insert).
61+
Importantly, a higher number of ^^parts^^ will negatively affect query performance. ClickHouse will therefore respond to inserts with a [“too many parts”](/knowledgebase/exception-too-many-parts) error if the number of ^^parts^^ exceeds specified limits either in [total](/operations/settings/merge-tree-settings#max_parts_in_total) or [per partition](/operations/settings/merge-tree-settings#parts_to_throw_insert).
6262

63-
Choosing the right **cardinality** for the partitioning key is critical. A high-cardinality partitioning key - where the number of distinct partition values is large - can lead to a proliferation of data parts. Since ClickHouse does not merge parts across partitions, too many partitions will result in too many unmerged parts, eventually triggering the “Too many parts” error. [Merges are essential](/merges) for reducing storage fragmentation and optimizing query speed, but with high-cardinality partitions, that merge potential is lost.
63+
Choosing the right **cardinality** for the ^^partitioning key^^ is critical. A high-cardinality ^^partitioning key^^ - where the number of distinct partition values is large - can lead to a proliferation of data ^^parts^^. Since ClickHouse does not merge ^^parts^^ across partitions, too many partitions will result in too many unmerged ^^parts^^, eventually triggering the “Too many ^^parts^^” error. [Merges are essential](/merges) for reducing storage fragmentation and optimizing query speed, but with high-cardinality partitions, that merge potential is lost.
6464

65-
By contrast, a **low-cardinality partitioning key**—with fewer than 100 - 1,000 distinct values - is usually optimal. It enables efficient part merging, keeps metadata overhead low, and avoids excessive object creation in storage. In addition, ClickHouse automatically builds MinMax indexes on partition columns, which can significantly speed up queries that filter on those columns. For example, filtering by month when the table is partitioned by `toStartOfMonth(date)` allows the engine to skip irrelevant partitions and their parts entirely.
65+
By contrast, a **low-cardinality ^^partitioning key^^**—with fewer than 100 - 1,000 distinct values - is usually optimal. It enables efficient part merging, keeps metadata overhead low, and avoids excessive object creation in storage. In addition, ClickHouse automatically builds MinMax indexes on partition columns, which can significantly speed up queries that filter on those columns. For example, filtering by month when the table is partitioned by `toStartOfMonth(date)` allows the engine to skip irrelevant partitions and their ^^parts^^ entirely.
6666

67-
While partitioning can improve performance in some query patterns, it's primarily a data management feature. In many cases, querying across all partitions can be slower than using a non-partitioned table due to increased data fragmentation and more parts being scanned. Use partitioning judiciously, and always ensure that the chosen key is low-cardinality and aligns with your data life cycle policies (e.g., retention via TTL). If you're unsure whether partitioning is necessary, you may want to start without it and optimize later based on observed access patterns.
67+
While partitioning can improve performance in some query patterns, it's primarily a data management feature. In many cases, querying across all partitions can be slower than using a non-partitioned table due to increased data fragmentation and more ^^parts^^ being scanned. Use partitioning judiciously, and always ensure that the chosen key is low-cardinality and aligns with your data life cycle policies (e.g., retention via ^^TTL^^). If you're unsure whether partitioning is necessary, you may want to start without it and optimize later based on observed access patterns.

docs/cloud/manage/backups/export-backups-to-own-cloud-account.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ You will need the following details to export/restore backups to your own CSP st
4141
2. AWS access key and secret. AWS role based authentication is also supported and can be used in place of AWS access key and secret.
4242

4343
:::note
44-
In order to use role based authentication, please follow the Secure s3 [setup](https://clickhouse.com/docs/cloud/security/secure-s3). In addition, you will need to add `s3:PutObject`, and `s3:DeleteObject` permissions to the IAM policy decribed [here.](https://clickhouse.com/docs/cloud/security/secure-s3#option-2-manually-create-iam-role)
44+
In order to use role based authentication, please follow the Secure s3 [setup](https://clickhouse.com/docs/cloud/security/secure-s3). In addition, you will need to add `s3:PutObject`, and `s3:DeleteObject` permissions to the IAM policy described [here.](https://clickhouse.com/docs/cloud/security/secure-s3#option-2-manually-create-iam-role)
4545
:::
4646

4747
### Azure {#azure}

docs/cloud/manage/troubleshooting-billing-issues.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ valid billing method configured. After your 30 day trial ends or your trial
1818
credits are depleted, whichever occurs first, you have the following billing
1919
options to continue using ClickHouse Cloud:
2020

21-
| Billing option | Description |
22-
|-----------------------------------------------------|-----------------------------------------------------------------------------------------|
23-
| [Direct PAYG](#direct-payg) | Add a valid credit card to your organization to Pay-As-You-Go |
24-
| [Marketplace PAYG](#cloud-marketplace-payg) | Set up a Pay-As-You-Go subscription via a supported cloud marketplace provider |
25-
| [Commited spend contract](#commited-spend-contract) | Enter into a committed spend contract directly or through a supported cloud marketplace |
21+
| Billing option | Description |
22+
|------------------------------------------------------|-----------------------------------------------------------------------------------------|
23+
| [Direct PAYG](#direct-payg) | Add a valid credit card to your organization to Pay-As-You-Go |
24+
| [Marketplace PAYG](#cloud-marketplace-payg) | Set up a Pay-As-You-Go subscription via a supported cloud marketplace provider |
25+
| [Committed spend contract](#committed-spend-contract) | Enter into a committed spend contract directly or through a supported cloud marketplace |
2626

2727
If your trial ends and no billing option has been configured for your organization,
2828
all your services will be stopped. If a billing method still has not been
@@ -89,7 +89,7 @@ for help. If a valid credit card has not been provided, the same unpaid invoice
8989
restrictions outlined above for [Direct PAYG](#direct-payg) will apply - this
9090
includes service suspension and eventual data deletion.
9191

92-
### Committed contract billing {#commited-spend-contract}
92+
### Committed contract billing {#committed-spend-contract}
9393

9494
You may purchase credits for your organization through a committed contract by:
9595

docs/cloud/reference/changelog.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,17 @@ import dashboards from '@site/static/images/cloud/reference/may-30-dashboards.pn
3131

3232
In addition to this ClickHouse Cloud changelog, please see the [Cloud Compatibility](/cloud/reference/cloud-compatibility.md) page.
3333

34+
35+
## July 31, 2025 {#july-31-2025}
36+
37+
**Vertical scaling for ClickPipes now available**
38+
39+
[Vertical scaling is now available for streaming ClickPipes](https://clickhouse.com/blog/clickpipes-flexible-scaling-monitoring).
40+
This feature allows you to control the size of each replica, in addition to the
41+
number of replicas (horizontal scaling). The details page for each ClickPipe now
42+
also includes per-replica CPU and memory utilization, which helps you better
43+
understand your workloads and plan re-sizing operations with confidence.
44+
3445
## July 24, 2025 {#july-24-2025}
3546

3647
**ClickPipes for MySQL CDC now in public beta**

0 commit comments

Comments
 (0)