Skip to content

GCSToBQLoadRunnable does not detect error during load and removes blobs even though they were not loaded #312

@zinok

Description

@zinok

How it was discovered?

I have found out that some data is missing from table, even though records were present in Kafka.

How that happens?

GCSToBQLoadRunnable checks if job is complete here, but in fact it does not check for a potential error during load. As a result the job is treated as successful even tough it was failed.

Evidence

[2022-12-22 21:31:30,663] TRACE Job is marked done: id=JobId{project=archimedes-337602, job=7d28940e-7882-414b-9bb3-c6c8c7e1307c, location=us-west1}, status=JobStatus{state=DONE, error=BigQueryError{reason=quotaExceeded, location=partition_modifications_per_column_partitioned_table.long, message=Quota exceeded: Your table exceeded quota for Number of partition modifications to a column partitioned table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas}, executionErrors=[BigQueryError{reason=quotaExceeded, location=partition_modifications_per_column_partitioned_table.long, message=Quota exceeded: Your table exceeded quota for Number of partition modifications to a column partitioned table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas}]} (com.wepay.kafka.connect.bigquery.GCSToBQLoadRunnable)

As we can see that state is DONE, but there was an error related to quotas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions