-
Notifications
You must be signed in to change notification settings - Fork 399
kafka table engine, modernize table creation, add tip on limitations,… #3666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
randomizedcoder
wants to merge
5
commits into
ClickHouse:main
Choose a base branch
from
randomizedcoder:kafka_table_engine_limitations
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
813baab
kafka table engine, modernize table creation, add tip on limitations,…
randomizedcoder 3004bfe
Specify codeblock language
Blargian 85a85db
clarify kafka engine docs enhancements
randomizedcoder 6b9a46a
Kafka table engine
randomizedcoder 010beaa
Improve notes on the Kafka engine integration tests
randomizedcoder File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,6 +35,12 @@ To persist this data from a read of the table engine, we need a means of capturi | |
|
|
||
| <Image img={kafka_01} size="lg" alt="Kafka table engine architecture diagram" style={{width: '80%'}} /> | ||
|
|
||
| :::tip | ||
| The Kafka Engine does have some limitations: | ||
| - [Kafka message header](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format) is not currently supported, so schema registry cannot be used | ||
| - Protobuf, ProtobufSingle are supported, while [ProtobufList](https://github.com/ClickHouse/ClickHouse/issues/78746) is not | ||
| ::: | ||
|
|
||
| #### Steps {#steps} | ||
|
|
||
|
|
||
|
|
@@ -179,10 +185,19 @@ CREATE TABLE github_queue | |
| review_comments UInt32, | ||
| member_login LowCardinality(String) | ||
| ) | ||
| ENGINE = Kafka('kafka_host:9092', 'github', 'clickhouse', | ||
| 'JSONEachRow') settings kafka_thread_per_consumer = 0, kafka_num_consumers = 1; | ||
| ENGINE = Kafka SETTINGS | ||
| kafka_broker_list = 'kafka_host:9092', | ||
| kafka_topic_list = 'github', | ||
| kafka_group_name = 'clickhouse', | ||
| kafka_format = 'JSONEachRow'; | ||
| ``` | ||
|
|
||
| :::tip | ||
| Notes | ||
| [Full Kafka Engine Table creation options](https://clickhouse.com/docs/engines/table-engines/integrations/kafka#creating-a-table) | ||
|
|
||
| [kafka_format must be the last setting](https://github.com/ClickHouse/ClickHouse/issues/37895) | ||
|
||
| ::: | ||
|
|
||
| We discuss engine settings and performance tuning below. At this point, a simple select on the table `github_queue` should read some rows. Note that this will move the consumer offsets forward, preventing these rows from being re-read without a [reset](#common-operations). Note the limit and required parameter `stream_like_engine_allow_direct_select.` | ||
|
|
||
|
|
@@ -305,6 +320,12 @@ Errors such as authentication issues are not reported in responses to Kafka engi | |
| </kafka> | ||
| ``` | ||
|
|
||
| Another useful source of information is the system.kafka_consumers table: | ||
|
|
||
| ```sql | ||
| SELECT * FROM system.kafka_consumers FORMAT Vertical; | ||
| ``` | ||
|
|
||
| ##### Handling malformed messages {#handling-malformed-messages} | ||
|
|
||
| Kafka is often used as a "dumping ground" for data. This leads to topics containing mixed message formats and inconsistent field names. Avoid this and utilize Kafka features such Kafka Streams or ksqlDB to ensure messages are well-formed and consistent before insertion into Kafka. If these options are not possible, ClickHouse has some features that can help. | ||
|
|
@@ -474,7 +495,6 @@ Multiple ClickHouse instances can all be configured to read from a topic using t | |
|
|
||
| Consider the following when looking to increase Kafka Engine table throughput performance: | ||
|
|
||
|
|
||
| * The performance will vary depending on the message size, format, and target table types. 100k rows/sec on a single table engine should be considered obtainable. By default, messages are read in blocks, controlled by the parameter kafka_max_block_size. By default, this is set to the [max_insert_block_size](/operations/settings/settings#max_insert_block_size), defaulting to 1,048,576. Unless messages are extremely large, this should nearly always be increased. Values between 500k to 1M are not uncommon. Test and evaluate the effect on throughput performance. | ||
| * The number of consumers for a table engine can be increased using kafka_num_consumers. However, by default, inserts will be linearized in a single thread unless kafka_thread_per_consumer is changed from the default value of 1. Set this to 1 to ensure flushes are performed in parallel. Note that creating a Kafka engine table with N consumers (and kafka_thread_per_consumer=1) is logically equivalent to creating N Kafka engines, each with a materialized view and kafka_thread_per_consumer=0. | ||
| * Increasing consumers is not a free operation. Each consumer maintains its own buffers and threads, increasing the overhead on the server. Be conscious of the overhead of consumers and scale linearly across your cluster first and if possible. | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full stop needed but i believe this is supported @Blargian