-
Notifications
You must be signed in to change notification settings - Fork 82
Description
I've noticed a few times now that we have had a small inconsistency in the data synced from MySQL to ClickHouse. It looks to be missing records like they were skipped entirely.
Unfortunately tracking down when it happens and seeing logs isn't easy I'm running ~ 12 instances each one syncing ~ 3000 tables across 50 databases and running on EKS using spot instances so logs aren't currently persisted and usually by the time we noticed it, there is nothing to check.
I suspect what is happening is that clickhouse and / or keeper is restarted and during that time the sink exceeds the number of retries and skips some of the queries
I think it is probably similar to what is captured here #463 and although there are retries, it looks like after the retry count is hit it moves on, whereas that probably keeps things moving, it does so at the risk of data integrity.
For me, I'd rather have it keep retrying and fall behind requiring manual intervention rather than have data issues that could be business critical.
It may well be there is an option to halt on errors but I can't see one in the docs or indeed from a quick scan of the code.