[REQUIRED] Step 2: Describe your configuration
- Extension name: firestore-bigquery-export
- Extension version: 0.3.1 (regression appears to have been introduced in 0.3.0 / change-tracker 2.x β see analysis below)
- Configuration values (redacted):
COLLECTION_PATH: myCollection
TIME_PARTITIONING: DAY
TIME_PARTITIONING_FIELD: order_week
TIME_PARTITIONING_FIELD_TYPE: DATE
TIME_PARTITIONING_FIRESTORE_FIELD: order_week
The Firestore field order_week holds an ISO 8601 date string (e.g. "2026-01-01"), and the BigQuery column type is DATE. This worked correctly under 0.2.x because BigQuery streaming insert implicitly casts ISO 8601 date strings to DATE.
[REQUIRED] Step 3: Describe the problem
After upgrading from 0.2.x to 0.3.1, the partition column is written as NULL for every row whose Firestore field is a string (including valid ISO 8601 date strings such as "2026-01-01"). All such rows now end up in the __NULL__ partition, which breaks partition pruning and existing analytics queries.
The change is not mentioned in the CHANGELOG:
0.2.11: only documents chore: bump firestore-bigquery-change-tracker dependency to v2
0.3.0: only documents partitioning config-validation changes and NONE / omit sentinel normalization
The CHANGELOG gives no hint that the runtime contract for partition field values has narrowed.
Steps to reproduce
- Create a Firestore collection where each document has a string ISO 8601 date field, e.g.
{ order_week: "2026-01-01" }
- Install firestore-bigquery-export
0.3.1 with:
TIME_PARTITIONING=DAY
TIME_PARTITIONING_FIELD=order_week
TIME_PARTITIONING_FIELD_TYPE=DATE
TIME_PARTITIONING_FIRESTORE_FIELD=order_week
- Write a document to the collection
- Inspect the row in
<table>_raw_changelog
Expected result
Top-level order_week column equals 2026-01-01; the row lives in the 2026-01-01 DATE partition. This matches 0.2.x behaviour and is what BigQuery streaming insert supports natively.
Actual result
Top-level order_week column is NULL; the row lives in the __NULL__ partition. The data JSON column still contains the original order_week string, confirming the value was not lost upstream β only the partition column extraction drops it.
SELECT
document_id,
order_week, -- top-level partition column: NULL
JSON_VALUE(data, '$.order_week') AS data_order_week -- still present
FROM `<project>.<dataset>.<table>_raw_changelog`
WHERE order_week IS NULL
ORDER BY timestamp DESC
LIMIT 5;
Possible cause (from reading the source)
In 0.2.x, Partitioning.getPartitionValue() appears to accept strings as-is via isValidPartitionTypeString, leaving the cast to BigQuery on streaming insert:
|
|
|
/* Return as Datetime value */ |
|
if (timePartitioningFieldType === PartitionFieldType.DATETIME) { |
|
return BigQuery.datetime(fieldValue.toISOString()).value; |
|
} |
|
|
|
/* Return as Date value */ |
|
if (timePartitioningFieldType === PartitionFieldType.DATE) { |
|
return BigQuery.date(fieldValue.toISOString().substring(0, 10)).value; |
|
} |
|
|
|
/* Return as Timestamp */ |
|
return BigQuery.timestamp(fieldValue).value; |
|
} |
|
|
|
/* |
|
Extracts a valid Partition field from the Document Change Event. |
|
Matches result based on a pre-defined Firestore field matching the event data object. |
|
Return an empty object if no field name or value provided. |
|
Returns empty object if not a string or timestamp (or result of serializing a timestamp) |
|
Logs warning if not a valid datatype |
|
Delete changes events have no data, return early as cannot partition on empty data. |
|
**/ |
|
getPartitionValue(event: FirestoreDocumentChangeEvent) { |
|
// When old data is disabled and the operation is delete |
|
// the data and old data will be null |
|
if (event.data == null && event.oldData == null) return {}; |
|
|
|
const firestoreFieldName = this.config.timePartitioningFirestoreField; |
|
const fieldName = this.config.timePartitioningField; |
|
const fieldValue = |
|
event.operation === ChangeType.DELETE |
|
? event.oldData[firestoreFieldName] |
|
: event.data[firestoreFieldName]; |
|
|
|
if (!fieldName || !fieldValue) { |
|
return {}; |
|
} |
|
|
|
if (this.isValidPartitionTypeString(fieldValue)) { |
|
return { [fieldName]: fieldValue }; |
In 0.3.x, after the partitioning refactor (#2447), PartitionValueConverter.convert() seems to only accept firebase.firestore.Timestamp, { _seconds, _nanoseconds }, or Date, and to return null for any other type β which would include ISO 8601 date strings:
https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/firestore-bigquery-change-tracker/src/bigquery/partitioning/converter.ts
getPartitionValue then omits the column, which would explain the NULL we observe in BigQuery:
https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/firestore-bigquery-change-tracker/src/bigquery/partitioning/index.ts
Related
[REQUIRED] Step 2: Describe your configuration
COLLECTION_PATH:myCollectionTIME_PARTITIONING:DAYTIME_PARTITIONING_FIELD:order_weekTIME_PARTITIONING_FIELD_TYPE:DATETIME_PARTITIONING_FIRESTORE_FIELD:order_weekThe Firestore field
order_weekholds an ISO 8601 date string (e.g."2026-01-01"), and the BigQuery column type isDATE. This worked correctly under 0.2.x because BigQuery streaming insert implicitly casts ISO 8601 date strings toDATE.[REQUIRED] Step 3: Describe the problem
After upgrading from 0.2.x to 0.3.1, the partition column is written as
NULLfor every row whose Firestore field is a string (including valid ISO 8601 date strings such as"2026-01-01"). All such rows now end up in the__NULL__partition, which breaks partition pruning and existing analytics queries.The change is not mentioned in the CHANGELOG:
0.2.11: only documentschore: bump firestore-bigquery-change-tracker dependency to v20.3.0: only documents partitioning config-validation changes andNONE/omitsentinel normalizationThe CHANGELOG gives no hint that the runtime contract for partition field values has narrowed.
Steps to reproduce
{ order_week: "2026-01-01" }0.3.1with:TIME_PARTITIONING=DAYTIME_PARTITIONING_FIELD=order_weekTIME_PARTITIONING_FIELD_TYPE=DATETIME_PARTITIONING_FIRESTORE_FIELD=order_week<table>_raw_changelogExpected result
Top-level
order_weekcolumn equals2026-01-01; the row lives in the2026-01-01DATE partition. This matches 0.2.x behaviour and is what BigQuery streaming insert supports natively.Actual result
Top-level
order_weekcolumn isNULL; the row lives in the__NULL__partition. ThedataJSON column still contains the originalorder_weekstring, confirming the value was not lost upstream β only the partition column extraction drops it.Possible cause (from reading the source)
In 0.2.x,
Partitioning.getPartitionValue()appears to accept strings as-is viaisValidPartitionTypeString, leaving the cast to BigQuery on streaming insert:extensions/firestore-bigquery-export/firestore-bigquery-change-tracker/src/bigquery/partitioning.ts
Lines 173 to 213 in 7f52a39
In 0.3.x, after the partitioning refactor (#2447),
PartitionValueConverter.convert()seems to only acceptfirebase.firestore.Timestamp,{ _seconds, _nanoseconds }, orDate, and to returnnullfor any other type β which would include ISO 8601 date strings:https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/firestore-bigquery-change-tracker/src/bigquery/partitioning/converter.ts
getPartitionValuethen omits the column, which would explain theNULLwe observe in BigQuery:https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/firestore-bigquery-change-tracker/src/bigquery/partitioning/index.ts
Related