[FEAT] buffer implementation by gaurav7261 · Pull Request #658 · ClickHouse/clickhouse-kafka-connect

gaurav7261 · 2026-02-13T08:41:27Z

Summary

• Buffer implementation: Adds internal record buffering to ClickHouseSinkTask so that records from multiple poll() calls are accumulated and flushed as a single large batch, reducing the number of
inserts to ClickHouse.
• Two new config options: bufferCount (number of records to accumulate before flushing) and bufferFlushTime (max time in ms before flushing regardless of count). Both default to 0 (disabled), preserving
backward compatibility.

Checklist

Delete items not relevant to your PR:

Unit and integration tests covering the common scenarios were added
A human-readable description of the changes was provided to include in CHANGELOG
For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

CLAassistant · 2026-02-13T08:41:37Z

All committers have signed the CLA.

rschu1ze · 2026-02-16T13:28:18Z

@antaljanosbenjamin ^^

antaljanosbenjamin · 2026-02-16T15:23:04Z

I think @chernser is a better person to review this, as I don't have experience with this repo, only the Kafka table engine.

chernser · 2026-02-17T15:55:54Z

@mzitnik would you please take a look?

@gaurav7261 thank you for the contribution!
However there are some immediate comments:

feature description is missing. We require an open issue and good explanation of why and how. It is not a complete IP but something we can use to review the PR
feature requires good set of tests.

gaurav7261 · 2026-02-17T16:00:03Z

Yes working on same only

gaurav7261 · 2026-02-17T17:38:37Z

@mzitnik can you please review now

mzitnik · 2026-02-17T18:40:49Z

@gaurav7261
thank you for your contribution!

as @chernser asked below
another can you explain what is the porpuse of the PR and what you are trying to solve.

gaurav7261 · 2026-02-17T18:58:05Z

@mzitnik our replica.fetch.max.bytes and fetch.max.bytes are 20mb and 55mb on broker, and our throughput of kafka is 15mbps, and avg record size is 3k, so we are not able to pull more than 7-9k record per poll, due to which we are unable to create large batch, i know current implementation have some loophole like in below scenario When buffering is enabled, put() returns successfully with records sitting in an in-memory buffer, never written to ClickHouse. Since flush() is a no-op right now, those offsets are committed for data that only exists in RAM. If the process crashes or a rebalance happens, those records are permanently lost. This breaks at-least-once delivery, correct me if i am wrong here? i am working on manual commit if batching is enabled like how s3 confluent sink connector do, wdyt, is it correct way to move forward, i am also implementing precommit like here if buffering is enabled, please check

src/main/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkTask.java

gaurav7261 · 2026-02-18T18:22:37Z

@mzitnik please review again, resolved comments

gaurav7261 · 2026-02-23T04:33:02Z

@mzitnik can you please review it

Copilot

Pull request overview

This PR adds internal record buffering capability to the ClickHouse Kafka Connect sink connector. The implementation allows records from multiple poll() calls to be accumulated in memory and flushed to ClickHouse as a single large batch, reducing the number of insert operations and improving throughput.

Changes:

Added two new configuration options: bufferCount (record count threshold) and bufferFlushTime (time-based threshold in milliseconds), both defaulting to 0 (disabled) for backward compatibility
Implemented buffering logic in ClickHouseSinkTask with proper offset management to ensure at-least-once delivery semantics
Added comprehensive test coverage for buffering behavior, offset management, partition rebalancing, and error handling scenarios

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/main/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkTask.java	Core buffering implementation with put(), flushBuffer(), preCommit(), close(), and stop() logic
src/main/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkConfig.java	Added BUFFER_COUNT and BUFFER_FLUSH_TIME configuration options with defaults and validation
src/test/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkTaskBufferTest.java	Comprehensive test suite covering buffering, offset management, partition handling, and error scenarios
CHANGELOG.md	Added entry for version 1.3.6 documenting the new buffering feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-23T11:31:47Z

src/main/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkTask.java

+        this.bufferFlushTime = clickHouseSinkConfig.getBufferFlushTime();
+        this.bufferingEnabled = this.bufferCount > 0;
+        this.buffer = this.bufferingEnabled ? new ArrayList<>(this.bufferCount) : new ArrayList<>();
+        this.lastFlushTime = System.currentTimeMillis();
+        this.flushedOffsets = new HashMap<>();


Consider adding validation or a warning when bufferFlushTime > 0 but bufferCount = 0. While the config documentation states that bufferFlushTime is "Only effective when bufferCount > 0", users might miss this and wonder why time-based flushing isn't working. Adding a log warning in the start() method (after line 68) would help users catch this misconfiguration: if (this.bufferFlushTime > 0 && this.bufferCount == 0) { LOGGER.warn("bufferFlushTime is set but will be ignored because bufferCount is 0"); }

Copilot · 2026-02-23T11:31:47Z

src/main/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkTask.java

+
+        if (bufferingEnabled) {
+            LOGGER.info("Internal buffering enabled: bufferCount={}, bufferFlushTime={}ms",
+                    bufferCount, bufferFlushTime);
+        }


The same log message "Internal buffering enabled: bufferCount={}, bufferFlushTime={}ms" appears in both ClickHouseSinkConfig constructor (line 285) and ClickHouseSinkTask.start() method (line 75). This will result in duplicate log entries when buffering is enabled. Consider removing one of these log statements - the one in ClickHouseSinkConfig seems more appropriate since it logs during configuration parsing, while the one in ClickHouseSinkTask is redundant.

Suggested change

if (bufferingEnabled) {

LOGGER.info("Internal buffering enabled: bufferCount={}, bufferFlushTime={}ms",

bufferCount, bufferFlushTime);

}

…ore tests

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

gaurav7261 force-pushed the buffer_implementation branch 2 times, most recently from 7df69a7 to 6e823d9 Compare February 13, 2026 13:18

chernser requested a review from mzitnik February 17, 2026 15:53

[FEAT] buffer implementation with test cases

ee627bd

gaurav7261 force-pushed the buffer_implementation branch from 6e823d9 to ee627bd Compare February 17, 2026 17:35

mzitnik reviewed Feb 18, 2026

View reviewed changes

src/main/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkTask.java Outdated Show resolved Hide resolved

src/main/java/com/clickhouse/kafka/connect/sink/ClickHouseSinkTask.java Outdated Show resolved Hide resolved

gaurav7261 force-pushed the buffer_implementation branch from 9e115b7 to 6560216 Compare February 18, 2026 18:21

gaurav7261 requested a review from mzitnik February 18, 2026 18:21

gaurav7261 force-pushed the buffer_implementation branch from 6560216 to 64b9fb1 Compare February 23, 2026 10:39

mshustov requested a review from Copilot February 23, 2026 11:24

Copilot started reviewing on behalf of mshustov February 23, 2026 11:25 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

[FEAT] add at-least-once delivery guarantees, rebalance safety, and m…

e84034d

…ore tests

gaurav7261 force-pushed the buffer_implementation branch from 64b9fb1 to e84034d Compare February 23, 2026 12:35

gaurav7261 requested a review from Copilot February 23, 2026 13:00

Copilot started reviewing on behalf of gaurav7261 February 23, 2026 13:00 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

gaurav7261 requested a review from Copilot February 23, 2026 13:01

Copilot started reviewing on behalf of gaurav7261 February 23, 2026 13:01 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

Comments

Conversation

gaurav7261 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

CLAassistant commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rschu1ze commented Feb 16, 2026

Uh oh!

antaljanosbenjamin commented Feb 16, 2026

Uh oh!

chernser commented Feb 17, 2026

Uh oh!

gaurav7261 commented Feb 17, 2026

Uh oh!

gaurav7261 commented Feb 17, 2026

Uh oh!

mzitnik commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaurav7261 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gaurav7261 commented Feb 18, 2026

Uh oh!

gaurav7261 commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gaurav7261 commented Feb 13, 2026 •

edited

Loading

CLAassistant commented Feb 13, 2026 •

edited

Loading

mzitnik commented Feb 17, 2026 •

edited

Loading

gaurav7261 commented Feb 17, 2026 •

edited

Loading