-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Problem
Currently, there is no dedicated column or field in records ingested by the connector that captures the exact timestamp (in milliseconds) when the record is written/ingested into BigQuery by the Sink Connector.
The existing kafkaDataFieldName configuration creates a bigquery column with insertTime but this is the timestamp when the connector writes into the row - not the time that bigquery commits the write.
Objective
Add a feature to support automatic inclusion of a column/field in each BigQuery row (with millisecond precision) representing the timestamp at which the record is committed to the BigQuery table. This field should be:
- Populated using the worker's wall-clock time in milliseconds immediately before or at the moment of actual ingestion.
- Optionally configurable (e.g., field name, enabled/disabled) in the connector configuration.
- Available regardless of whether the built-in
kafkaDataFieldNamemetadata field is enabled
Benefits
- Debug data unavailability issues
- Measure end-to-end pipeline latency
- Distinguish between Kafka ingestion delay vs BigQuery write delay
- Perform SLA monitoring more accurately
Workaround
Tables partitioned by ingestion-time have a pseudo column named _PARTITIONTIME that stores metadata about the ingestion time for each row. However, the timestamp is truncated at the partition boundary, and will not give you an exact ingestion time.
BigQuery’s CURRENT_TIMESTAMP() function returns the current date and time as a timestamp object. However, the value of this function is captured only once at the very start of the query statement, not individually for each row at the point of writing to BigQuery. So while this might reflect the commit time for single-row inserts or very fast statements quite accurately, it becomes increasingly inaccurate as the duration of the INSERT statement increases. Also, when having this as a new column, have to ensure table and record schema are still matched.