Address hydra issue + extend logging #54

BentsiLeviav · 2025-12-24T15:27:39Z

Summary

This PR adds the following:

Use mutation_sync=3 and alter_aync=3 to make sure we execute alter/mutation operations only on available replicas, to avoid getting code: 341, message: Mutation is not finished because some replicas are inactive right now error from ClickHouse. Due to that, we moved GetAllReplicasActiveQuery to warn on bad replica status, and not fail. close Support hydra architecture #36
Add logs to many operations, and specifically add logs around the context being cancelled, close Extend context cancellation logging #51, and partially takes care of Extend logging for troubleshooting #50
Add stats to all client operations, which will be logged out every 100 queries.

…etely

…e warning only

Copilot

Pull request overview

This PR enhances the ClickHouse destination connector's reliability and observability by addressing replica synchronization issues and adding comprehensive logging. The main purpose is to prevent "Mutation is not finished because some replicas are inactive right now" errors by using mutation_sync=3 and alter_sync=3 settings, which ensure operations execute only on available replicas. Additionally, the PR adds detailed logging throughout all operations and implements connection statistics tracking.

Key changes:

Changed synchronization settings from 2 to 3 for mutations and alter operations to avoid inactive replica errors
Added extensive logging at operation entry/exit points and error conditions, with context cancellation tracking
Implemented connection statistics tracking that logs metrics every 100 queries

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
destination/service/server.go	Added comprehensive logging to all RPC handlers including operation start/completion, error conditions, and file processing progress
destination/db/clickhouse.go	Changed `alter_sync` and `mutations_sync` to 3, demoted replica availability failures from errors to warnings, added connection stats tracking, and enhanced error messages with context state
destination/db/sql/sql.go	Updated `GetAllReplicasActiveQuery` to exclude Hydra read-only instances by checking `disable_insertion_and_mutation` setting
destination/db/sql/sql_test.go	Updated test to match new query structure for replica availability checking
destination/common/log/logger.go	Added caller information to log output for better traceability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

destination/db/clickhouse.go

Co-authored-by: Copilot <[email protected]>

destination/db/clickhouse.go

mshustov · 2025-12-27T13:46:21Z

destination/service/server.go

 }

 func (s *Server) WriteHistoryBatch(ctx context.Context, in *pb.WriteHistoryBatchRequest) (*pb.WriteBatchResponse, error) {
+	log.Info(fmt.Sprintf("[WriteHistoryBatch] Starting for %s.%s with earliest_start_files=%d, replace_files=%d, update_files=%d, delete_files=%d",


does Fivetran provide any recommendations about log levels? this one might be more appropriate for debug

does Fivetran provide any recommendations about log levels? this one might be more appropriate for debug

Yes, can be found here https://fivetran.com/docs/connector-sdk/technical-reference/connector-sdk-code/connector-sdk-logs

I updated all the Fivetran-related functions, like

WriteHistoryBatch

WriteBatch

processReplaceFiles

processEarliestStartFilesForHistoryBatch
to be a Notice level (debug mode) and keep all ClickHouse-related operations like

DescribeTable

CreateTable

AlterTable

Truncate
to be Info level

destination/db/clickhouse.go

mshustov · 2025-12-27T14:09:56Z

destination/service/server.go


 func (s *Server) Test(ctx context.Context, in *pb.TestRequest) (*pb.TestResponse, error) {
+
+	log.Info(fmt.Sprintf("[Test_%s] Starting test", in.Name))


do we log a stop reason anywhere?

Yes:

In case of error, we log the error with indication of which test was run

In case of success, we log this line before returning the success response

log.Info(fmt.Sprintf("[Test_%s] Test completed successfully", in.Name))

…ra-and-extend-logging # Conflicts: # destination/db/clickhouse.go

mzitnik

I see a lot of error messages that are not capitalized. Is it some convention?

destination/db/clickhouse.go

mzitnik · 2026-01-05T13:43:29Z

destination/db/clickhouse.go

+
+func (conn *ClickHouseConnection) recordQuery(duration time.Duration, success bool) {
+	conn.queryCount++
+	conn.totalDuration += duration


In totalDuration, we also account for queries that encountered an error when examining conn.totalDuration/time.Duration(conn.queryCount+1))), the average can be skewed by queries that got an error. I think we should separate it totalDuration/totalDurationError average can be problematic
lets take an example q1: 100 q2: 101 q3: 105 q4: 2 (with an error) = AVG = (100+101+105+2)/4 = 77
This is very problematic
So would go for Percentiles/Histogram one for success/failure queries

I think that level of sophistication is beyond the scope of this PR. The current stats are meant to be basic connection health indicators rather than comprehensive performance metrics. Implementing proper percentile tracking or histograms would require:

Additional memory overhead for storing duration buckets/samples

More complex aggregation logic

Decisions about retention windows and bucket sizes

Potentially a metrics library to handle this properly

This feels like it should be its own feature request with proper design discussion. Would you mind opening a ticket for enhanced connection metrics?

Can you open an issue for the correct calcluations

mzitnik · 2026-01-05T13:46:51Z

destination/db/clickhouse.go

 	}
+
+	log.Info(fmt.Sprintf("Initializing ClickHouse connection to %s:%s",
+		configuration[config.HostKey], configuration[config.PortKey]))


if it's possible to add the username with which we connect with

to understand what user was configured in case of some premissions probleam or mistakes

Before customers set up a ClickHouse connection, 2 tests are run by the connector:

ConnectionTest to check connectivity with the provided user

GrantsTest to check proper permissions are set

It is important to mention that all these logs are not visible to the end users and can be retrieved only by requesting them from the Fivetran team.

mzitnik · 2026-01-05T14:02:01Z

destination/db/clickhouse.go

 		log.Error(err)
 		return err
 	}
+	log.Info(fmt.Sprintf("Successfully executed %s [query_id=%s] in %v", op, queryID, time.Since(startTime)))


Using time.Since on lines 160 and 154 can be problematic (if there is a context switch). Since time.Since mainly doing return Now().Sub(t) getting every method call a new Now() I would calculate the since once separately use it in 160 and 154

See for reference: https://github.com/golang/go/blob/master/src/time/time.go#L1227

mzitnik · 2026-01-05T14:02:28Z

destination/db/clickhouse.go

+		return conn.Query(ctx, queryWithID)
 	}, ctx, string(op), benchmark)
+
+	conn.recordQuery(time.Since(startTime), err == nil)


mzitnik · 2026-01-05T16:01:45Z

destination/db/clickhouse.go

 	err = conn.WaitAllNodesAvailable(ctx, schemaName, tableName)
 	if err != nil {
-		return false, err
+		log.Warn(fmt.Sprintf("It seems like not all nodes are available: %v. We strongly recommend to check the cluster health and availability to avoid inconsistency between replicas", err))


Previously, we returned with an error. Is this still the case since we removed return false, err

Was described in the PR description:

we moved GetAllReplicasActiveQuery to warn on bad replica status, and not fail.

destination/db/clickhouse.go

Co-authored-by: Mark Zitnik <[email protected]>

mzitnik

We still left a few places with not capitalized text for error logging

mzitnik · 2026-01-06T10:40:54Z

destination/db/clickhouse.go

+
+func (conn *ClickHouseConnection) recordQuery(duration time.Duration, success bool) {
+	conn.queryCount++
+	conn.totalDuration += duration


Can you open an issue for the correct calcluations

BentsiLeviav added 8 commits December 15, 2025 13:14

replace GetAllReplicasActiveQuery to exclude read only replicas compl…

7031f8a

…etely

Merge branch 'main' into fix-hydra-checking

ef2b761

add caller to logs

82827f5

add logs to all operations

91fb1ee

add stats and logs to client execution

8d038cb

use the new alter/mutation_sync=3 and move WaitAllNodesAvailable to b…

a39aaad

…e warning only

update TestGetAllReplicasActiveQuery

73af1c1

add context information to logs

69ed25d

BentsiLeviav requested review from Copilot and slvrtrn December 24, 2025 15:47

BentsiLeviav changed the title ~~Fix hydra checking~~ Address hydra issue + extend logging Dec 24, 2025

Copilot AI reviewed Dec 24, 2025

View reviewed changes

BentsiLeviav and others added 3 commits December 24, 2025 17:50

Update destination/db/clickhouse.go

93b40eb

Co-authored-by: Copilot <[email protected]>

Update destination/db/clickhouse.go

3b82297

Co-authored-by: Copilot <[email protected]>

Update destination/db/clickhouse.go

aa1d776

Co-authored-by: Copilot <[email protected]>

BentsiLeviav requested a review from mzitnik December 25, 2025 12:03

mshustov reviewed Dec 27, 2025

View reviewed changes

destination/db/clickhouse.go Outdated Show resolved Hide resolved

mshustov reviewed Dec 27, 2025

View reviewed changes

destination/db/clickhouse.go Outdated Show resolved Hide resolved

mshustov reviewed Dec 27, 2025

View reviewed changes

BentsiLeviav added 4 commits December 28, 2025 12:01

remove warning prefix

f5d435f

update info logs to notice (debug)

e63d88e

Merge remote-tracking branch 'origin/fix-hydra-checking' into fix-hyd…

c5afa02

…ra-and-extend-logging # Conflicts: # destination/db/clickhouse.go

add query id to logs, statement and query execution

039df3d

BentsiLeviav requested a review from mshustov December 28, 2025 14:29

mzitnik reviewed Jan 6, 2026

View reviewed changes

BentsiLeviav and others added 4 commits January 6, 2026 09:26

Update destination/db/clickhouse.go

75e8473

Co-authored-by: Mark Zitnik <[email protected]>

Update destination/db/clickhouse.go

bdefc6f

Co-authored-by: Mark Zitnik <[email protected]>

Update destination/db/clickhouse.go

2b323af

Co-authored-by: Mark Zitnik <[email protected]>

Update destination/db/clickhouse.go

2fa4a42

Co-authored-by: Mark Zitnik <[email protected]>

BentsiLeviav added 2 commits January 6, 2026 09:36

Fix average skew

b8cfae9

cr fixes

af7467d

BentsiLeviav requested a review from mzitnik January 6, 2026 10:30

mzitnik approved these changes Jan 6, 2026

View reviewed changes

account only for successful queries

6a537ae

BentsiLeviav merged commit e9aef44 into main Jan 6, 2026
3 checks passed

BentsiLeviav deleted the fix-hydra-checking branch January 6, 2026 11:15


		func (s Server) Test(ctx context.Context, in pb.TestRequest) (*pb.TestResponse, error) {

		log.Info(fmt.Sprintf("[Test_%s] Starting test", in.Name))

Address hydra issue + extend logging #54

Address hydra issue + extend logging #54

Uh oh!

Conversation

BentsiLeviav commented Dec 24, 2025

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzitnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mzitnik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants