Skip to content

Feature request: Add BigQuery Ingestion Timestamp to Sink Records #193

@tanbt

Description

@tanbt

Problem

Currently, there is no dedicated column or field in records ingested by the connector that captures the exact timestamp (in milliseconds) when the record is written/ingested into BigQuery by the Sink Connector.

The existing kafkaDataFieldName configuration creates a bigquery column with insertTime but this is the timestamp when the connector writes into the row - not the time that bigquery commits the write.

Objective

Add a feature to support automatic inclusion of a column/field in each BigQuery row (with millisecond precision) representing the timestamp at which the record is committed to the BigQuery table. This field should be:

  • Populated using the worker's wall-clock time in milliseconds immediately before or at the moment of actual ingestion.
  • Optionally configurable (e.g., field name, enabled/disabled) in the connector configuration.
  • Available regardless of whether the built-in kafkaDataFieldName metadata field is enabled

Benefits

  • Debug data unavailability issues
  • Measure end-to-end pipeline latency
  • Distinguish between Kafka ingestion delay vs BigQuery write delay
  • Perform SLA monitoring more accurately

Workaround

Tables partitioned by ingestion-time have a pseudo column named _PARTITIONTIME that stores metadata about the ingestion time for each row. However, the timestamp is truncated at the partition boundary, and will not give you an exact ingestion time.

BigQuery’s CURRENT_TIMESTAMP() function returns the current date and time as a timestamp object. However, the value of this function is captured only once at the very start of the query statement, not individually for each row at the point of writing to BigQuery. So while this might reflect the commit time for single-row inserts or very fast statements quite accurately, it becomes increasingly inaccurate as the duration of the INSERT statement increases. Also, when having this as a new column, have to ensure table and record schema are still matched.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions