Skip to content

Commit 1da0c12

Browse files
committed
Merge remote-tracking branch 'upstream/main' into docs-kafka_table_engine
2 parents 79a8d51 + 24b3e20 commit 1da0c12

File tree

7 files changed

+40
-21
lines changed

7 files changed

+40
-21
lines changed

docs/guides/developer/deduplicating-inserts-on-retries.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,16 @@ Insert operations can sometimes fail due to errors such as timeouts. When insert
99

1010
When an insert is retried, ClickHouse tries to determine whether the data has already been successfully inserted. If the inserted data is marked as a duplicate, ClickHouse does not insert it into the destination table. However, the user will still receive a successful operation status as if the data had been inserted normally.
1111

12+
## Limitations {#limitations}
13+
14+
### Uncertain insert status {#uncertain-insert-status}
15+
16+
The user must retry the insert operation until it succeeds. If all retries fail, it is impossible to determine whether the data was inserted or not. When materialized views are involved, it is also unclear in which tables the data may have appeared. The materialized views could be out of sync with the source table.
17+
18+
### Deduplication window limit {#deduplication-window-limit}
19+
20+
If more than `*_deduplication_window` other insert operations occur during the retry sequence, deduplication may not work as intended. In this case, the same data can be inserted multiple times.
21+
1222
## Enabling insert deduplication on retries {#enabling-insert-deduplication-on-retries}
1323

1424
### Insert deduplication for tables {#insert-deduplication-for-tables}
@@ -45,7 +55,8 @@ You can control this process using the following settings for the source table:
4555
- [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds)
4656
- [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non_replicated_deduplication_window)
4757

48-
You can also use the user profile setting [`deduplicate_blocks_in_dependent_materialized_views`](/operations/settings/settings#deduplicate_blocks_in_dependent_materialized_views).
58+
You have to also enable the user profile setting [`deduplicate_blocks_in_dependent_materialized_views`](/operations/settings/settings#deduplicate_blocks_in_dependent_materialized_views).
59+
With enabled setting `insert_deduplicate=1` an inserted data is deduplicated in source table. The setting `deduplicate_blocks_in_dependent_materialized_views=1` additionally enables deduplication in dependant tables. You have to enable both if full deduplication is desired.
4960

5061
When inserting blocks into tables under materialized views, ClickHouse calculates the `block_id` by hashing a string that combines the `block_id`s from the source table and additional identifiers. This ensures accurate deduplication within materialized views, allowing data to be distinguished based on its original insertion, regardless of any transformations applied before reaching the destination table under the materialized view.
5162

docs/integrations/data-ingestion/clickpipes/index.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,13 @@ Steps:
8282
<Image img={cp_custom_role} alt="Assign a custom role" size="lg" border/>
8383

8484
## Error reporting {#error-reporting}
85-
ClickPipes will create a table next to your destination table with the postfix `<destination_table_name>_clickpipes_error`. This table will contain any errors from the operations of your ClickPipe (network, connectivity, etc.) and also any data that don't conform to the schema. The error table has a [TTL](/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-ttl) of 7 days.
86-
If ClickPipes cannot connect to a data source or destination after 15min., ClickPipes instance stops and stores an appropriate message in the error table (providing the ClickHouse instance is available).
85+
ClickPipes will store errors in two separate tables depending on the type of error encountered during the ingestion process.
86+
### Record Errors {#record-errors}
87+
ClickPipes will create a table next to your destination table with the postfix `<destination_table_name>_clickpipes_error`. This table will contain any errors from malformed data or mismatched schema and will include the entirety of the invalid message. This table has a [TTL](/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-ttl) of 7 days.
88+
### System Errors {#system-errors}
89+
Errors related to the operation of the ClickPipe will be stored in the `system.clickpipes_log` table. This will store all other errors related to the operation of your ClickPipe (network, connectivity, etc.). This table has a [TTL](/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-ttl) of 7 days.
90+
91+
If ClickPipes cannot connect to a data source after 15 min or to a destination after 1 hr, the ClickPipes instance stops and stores an appropriate message in the system error table (provided the ClickHouse instance is available).
8792

8893
## F.A.Q {#faq}
8994
- **What is ClickPipes?**
@@ -100,4 +105,4 @@ If ClickPipes cannot connect to a data source or destination after 15min., Click
100105

101106
- **Is there a way to handle errors or failures when using ClickPipes for Kafka?**
102107

103-
Yes, ClickPipes for Kafka will automatically retry case of failures when consuming data from Kafka. ClickPipes also supports enabling a dedicated error table that will hold errors and malformed data for 7 days.
108+
Yes, ClickPipes for Kafka will automatically retry in the event of failures when consuming data from Kafka for any operational issue including network issues, connectivity issues, etc. In the event of malformed data or invalid schema, ClickPipes will store the record in the record_error table and continue processing.

docs/integrations/data-ingestion/kafka/kafka-table-engine.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ description: 'Using the Kafka Table Engine'
66
title: 'Using the Kafka table engine'
77
---
88

9-
import CloudAvailableBadge from '@theme/badges/CloudAvailableBadge';
109
import Image from '@theme/IdealImage';
1110
import kafka_01 from '@site/static/images/integrations/data-ingestion/kafka/kafka_01.png';
1211
import kafka_02 from '@site/static/images/integrations/data-ingestion/kafka/kafka_02.png';
@@ -15,8 +14,6 @@ import kafka_04 from '@site/static/images/integrations/data-ingestion/kafka/kafk
1514

1615
# Using the Kafka table engine
1716

18-
<CloudAvailableBadge/>
19-
2017
The Kafka table engine can be used to [**read** data from](#kafka-to-clickhouse) and [**write** data to](#clickhouse-to-kafka) Apache Kafka and other Kafka API-compatible brokers (e.g., Redpanda, Amazon MSK).
2118

2219
### Kafka to ClickHouse {#kafka-to-clickhouse}

scripts/badger.sh

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
#!/bin/bash
2-
32
# This script is used to generate a list of feature badges in use in the docs
43
# It runs as part of a weekly github action and the result is sent as a notification
54
# to the #docs channel in Slack.
@@ -19,33 +18,41 @@ components=(
1918
"ExperimentalBadge"
2019
"PrivatePreviewBadge"
2120
"CloudNotSupportedBadge"
22-
"CloudAvailableBadge"
21+
"CloudOnlyBadge"
22+
)
23+
24+
# Custom display names (must match order of components array)
25+
display_names=(
26+
"Beta Features"
27+
"Experimental Features"
28+
"Private Preview Features"
29+
"Cloud Unsupported Features"
30+
"Cloud Only Features"
2331
)
2432

2533
# Function to extract slug from a file's frontmatter
2634
extract_slug_from_file() {
2735
local filepath="$1"
2836
local slug=""
29-
3037
# Look for "slug: some/path/slug" in the file
3138
slug=$(grep -m 1 "^slug:" "$filepath" 2>/dev/null | sed 's/^slug:[[:space:]]*//' | tr -d '"' | tr -d "'")
32-
3339
# If no slug found, return the filepath as fallback
3440
if [ -z "$slug" ]; then
3541
slug="[no slug] $filepath"
3642
fi
37-
3843
echo "$slug"
3944
}
4045

4146
# Search for each component and collect all slugs
42-
for component in "${components[@]}"; do
43-
echo "$component:"
44-
47+
for i in "${!components[@]}"; do
48+
component="${components[$i]}"
49+
display_name="${display_names[$i]}"
50+
51+
echo "$display_name:"
4552
# Get unique files containing the component
4653
files=$(grep -rl --include="*.md" --include="*.mdx" --include="*.jsx" --include="*.tsx" \
4754
-E "<$component[[:space:]/>]|</$component>" "$DOCS_DIR" 2>/dev/null | sort -u)
48-
55+
4956
if [ -z "$files" ]; then
5057
echo " (none)"
5158
else
@@ -60,6 +67,5 @@ for component in "${components[@]}"; do
6067
fi
6168
done <<< "$files"
6269
fi
63-
6470
echo
6571
done

scripts/settings/session-settings.sql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ WITH
5858
name,
5959
' {#'||name||'} \n\n',
6060
multiIf(tier == 'Experimental', '<ExperimentalBadge/>\n\n', tier == 'Beta', '<BetaBadge/>\n\n', ''),
61-
if(description LIKE '%Only has an effect in ClickHouse Cloud%', '<CloudAvailableBadge/>\n\n', ''),
61+
if(description LIKE '%Only has an effect in ClickHouse Cloud%', '<CloudOnlyBadge/>\n\n', ''),
6262
if(
6363
type != '' AND default != '',
6464
format(
@@ -83,7 +83,7 @@ description: ''Settings which are found in the ``system.settings`` table.''
8383
8484
import ExperimentalBadge from \'@theme/badges/ExperimentalBadge\';
8585
import BetaBadge from \'@theme/badges/BetaBadge\';
86-
import CloudAvailableBadge from \'@theme/badges/CloudAvailableBadge\';
86+
import CloudOnlyBadge from \'@theme/badges/CloudOnlyBadge\';
8787
import SettingsInfoBlock from \'@theme/SettingsInfoBlock/SettingsInfoBlock\';
8888
import VersionHistory from \'@theme/VersionHistory/VersionHistory\';
8989

src/theme/badges/CloudAvailableBadge/index.js renamed to src/theme/badges/CloudOnlyBadge/index.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,12 @@ const Icon = () => {
1212
)
1313
}
1414

15-
const CloudAvailableBadge = () => {
15+
const CloudOnlyBadge = () => {
1616
return (
1717
<div className={styles.cloudBadge}>
1818
<Icon />{'ClickHouse Cloud only'}
1919
</div>
2020
)
2121
}
2222

23-
export default CloudAvailableBadge
23+
export default CloudOnlyBadge

0 commit comments

Comments
 (0)