Skip to content

fix(firestore-bigquery-export): accept ISO 8601 string partition values#2813

Merged
cabljac merged 9 commits into
nextfrom
fix/bq-part
May 11, 2026
Merged

fix(firestore-bigquery-export): accept ISO 8601 string partition values#2813
cabljac merged 9 commits into
nextfrom
fix/bq-part

Conversation

@cabljac
Copy link
Copy Markdown
Contributor

@cabljac cabljac commented May 8, 2026

Summary

  • Restore 0.2.x behavior where Firestore string fields used as time partition values (e.g. order_week: "2026-01-01" with TIME_PARTITIONING_FIELD_TYPE=DATE) are accepted. The 0.3.0 refactor narrowed PartitionValueConverter to Firestore Timestamp / timestamp-like / Date only, silently coercing strings to NULL and landing rows in the __NULL__ partition.
  • PartitionValueConverter.convert now parses strings via new Date(value). Unparseable strings still return null and trigger the existing warning log.
  • Defensive try/catch around the BigQuery formatter switch so any serialization failure degrades to null + warn rather than crashing the row write.

Fixes #2803

Test plan

  • cd firestore-bigquery-export/firestore-bigquery-change-tracker && npm run test:local -- src/__tests__/bigquery/partitioning/converter.test.ts — 29/29 pass
  • Full tracker npm run test:local — converter + config + unit suites pass; the 8 pre-existing failures in e2e.test.ts are present on origin/next baseline (verified) and unrelated to this change
  • npx tsc --noEmit — clean
  • Maintainer to bump extension.yaml version to 0.3.2 as part of release workflow (CHANGELOG entry added)

Restores 0.2.x behavior for Firestore string partition fields. The 0.3.0
refactor of PartitionValueConverter narrowed accepted inputs to Firestore
Timestamp / timestamp-like / Date, silently coercing strings (including
ISO 8601 dates such as "2026-01-01") to NULL and landing rows in the
__NULL__ partition.

PartitionValueConverter.convert now parses strings via new Date(value).
Unparseable strings still return null and trigger the existing warning.
Adds a defensive try/catch around the BigQuery formatter switch so any
serialization failure degrades to null + warn rather than crashing the
row write.

Fixes #2803
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restores support for ISO 8601 date/datetime strings as partition field values, addressing a regression introduced in version 0.3.0. The changes update the PartitionValueConverter to parse string inputs and include comprehensive unit tests for various date formats and edge cases. Feedback indicates that the DATETIME conversion needs to strip the 'Z' suffix for BigQuery compatibility and suggests adding a default case to the switch statement for improved robustness.

…witch

Adds default: return null to PartitionValueConverter.convert. fieldType is
typed as a strict union, so exhaustiveness is already enforced at compile
time, but the value comes from external config — a runtime mismatch falls
through cleanly to null instead of returning undefined.

Per gemini-code-assist review on #2813.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

Thanks for the review. Addressing each point:

1. Stripping Z from DATETIME: I verified empirically — BigQuery.datetime("2024-01-15T10:30:00.000Z") already normalizes to "2024-01-15 10:30:00.000" (canonical BQ DATETIME with space separator). Stripping Z would actually regress the output to "2024-01-15T10:30:00.000" (T-separator passthrough). This same Z-suffixed code path has been shipping since 0.3.0 for Firestore Timestamp / Date inputs, so we'd see existing breakage if the library rejected it.

2. default: return null: Good catch — applied in d0654fa. TS exhaustiveness covers this at compile time, but fieldType comes from external config so a runtime mismatch is worth handling cleanly.

3. Logging in catch: The caller Partitioning.getPartitionValue() already logs whenever convert() returns null (index.ts L79–L86) — so the existing warn fires for the catch path too. By design, the converter is pure and the caller logs.

@cabljac cabljac marked this pull request as ready for review May 8, 2026 09:44
@cabljac cabljac requested a review from CorieW May 8, 2026 09:49
@CorieW
Copy link
Copy Markdown
Member

CorieW commented May 8, 2026

The overall direction looks right, but I think this needs one more change before merging.

The current implementation uses new Date(value) to validate string partition values. That is risky because JavaScript date parsing is too permissive and can silently normalize invalid or partial dates.

For example:

new Date("2024-02-30").toISOString(); // Wrong: "2024-02-30" is not a real date, but JS normalizes it to "2024-03-01T00:00:00.000Z".
new Date("2024-01").toISOString();    // Wrong: this is a partial date, but JS treats it as "2024-01-01T00:00:00.000Z".
new Date("1").toISOString();          // Wrong: this is not a date format we should accept, but JS parses it as "2001-01-01T00:00:00.000Z".

…ion strings

JavaScript's Date parser is too permissive for partition value validation:

  new Date("2024-02-30")        -> 2024-03-01  (silent month overflow)
  new Date("2024-01")           -> 2024-01-01  (silent partial-date fill)
  new Date("1")                 -> 2001-01-01  (bare numeric as year)
  new Date("2023-02-29")        -> 2023-03-01  (non-leap-year overflow)
  new Date("2024-01-15T10:30")  -> engine-dependent local-time interpretation

Replaces the loose new Date() check with a strict YYYY-MM-DD prefix regex
that requires an explicit timezone designator (Z or +/-HH:MM) when a time
component is present, plus a calendar validator built via setUTCFullYear
that rejects calendar-invalid components like Feb 30, non-leap-year Feb 29,
month 13, and day 32.

Per CorieW review on #2813.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

Good catch, agreed — new Date() alone is too permissive. Verified your three examples plus a couple more silent normalizations on Node 22:

"2024-02-30"        -> 2024-03-01  (month overflow)
"2024-01"           -> 2024-01-01  (partial-date fill)
"1"                 -> 2001-01-01  (bare numeric as year)
"2023-02-29"        -> 2023-03-01  (non-leap-year overflow)
"2024-01-15T10:30"  -> engine-dependent (local-time interpretation)

Tightened in 4392a4f:

  • Strict regex requiring YYYY-MM-DD prefix, with optional T (or space) followed by HH:MM[:SS[.ffffff]] and a required timezone designator (Z or ±HH:MM) when the time portion is present. Rejects partial dates, bare numerics, and implicit-local-time datetimes.
  • Calendar validator built via setUTCFullYear(y, m-1, d) then comparing UTC components. Catches Feb 30, non-leap-year Feb 29, month 13, day 32, etc. Uses setUTCFullYear rather than Date.UTC() to avoid the legacy 2-digit-year quirk for years 1–99.
  • Existing pinned tests (date-only, Z-suffixed datetime, timezone-shifted DATE column) still pass; added 10 new rejection tests.

BigQuery DATE / DATETIME / TIMESTAMP all reject year 0 — the supported
range is 0001-01-01 to 9999-12-31. The previous strict validator passed
"0000-01-01" through (setUTCFullYear(0, 0, 1) yields year 0, matching
input components) but BigQuery rejects it server-side, causing the same
NULL-partition symptom this PR is fixing.

Reject yearN < 1 client-side so the user gets a clear warning log.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

One more gap caught on self-review, fixed in fba4ca3.

"0000-01-01" was passing through all my validation layers:

  • regex matches (\d{4} accepts 0000)
  • calendar validator passes (setUTCFullYear(0, 0, 1) yields year 0, matching the input components)
  • new Date("0000-01-01") parses fine

But BigQuery DATE / DATETIME / TIMESTAMP all reject year 0 (the supported range is 0001-01-01 to 9999-12-31), so the row would have failed server-side and ended up as the same NULL-partition symptom this PR is meant to fix, just on a more obscure input. Added an explicit yearN < 1 guard plus tests covering year 0 (rejected), year 0001 (accepted, BQ minimum), and year 9999 (accepted, BQ maximum).

Other edge cases I probed against the regex + validator + parser chain and confirmed are handled:

  • hour 25, minute 60, offset > 24h: new Date() returns NaN, the isNaN(parsed.getTime()) check catches
  • Feb 30 with a non-UTC offset (e.g. "2024-02-30T22:00:00-08:00"): calendar validator catches the Y/M/D mismatch independent of timezone
  • single-digit month like "2024-1-15": regex \d{2} requires two digits, rejected
  • comma fractional separator ("...,123Z"): regex requires ., rejected

41/41 converter tests passing.

@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the firestore-bigquery-export extension to version 0.3.2, restoring support for ISO 8601 date and datetime strings as partition field values. The implementation introduces strict regex-based parsing and calendar validation within the PartitionValueConverter to ensure data integrity before BigQuery insertion. Comprehensive tests have been added to verify various date formats and boundary conditions. Feedback suggests optimizing performance by moving the regex to a static constant.

Adds 4 cases to PartitionValueConverter TIMESTAMP suite:

- out-of-range hour ("25:00:00Z") -> null
- out-of-range minute ("23:60:00Z") -> null
- timezone offset without colon ("+0800") -> accepted
- fractional seconds beyond millisecond precision -> accepted

The first two pin the implicit time-validation behavior (regex matches
H/M/S as 2-digit pairs, then new Date() rejects out-of-range values).
The last two document accepted RFC 3339 variants.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 11, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restores support for ISO 8601 date and datetime strings as partition field values, addressing a regression introduced in version 0.3.0. The changes include strict validation logic using regex and calendar checks to ensure inputs fall within BigQuery's supported ranges (0001-9999) and formats. Extensive unit tests have been added to verify these improvements. Feedback indicates that the 'Z' suffix should be removed when converting to BigQuery DATETIME types to avoid potential serialization errors.

cabljac added 3 commits May 11, 2026 11:16
Adds a strict-equality test that asserts the DATETIME converter outputs
"2024-01-15 10:30:00.000" (space separator, no Z) when fed an ISO 8601
string with a Z suffix. The existing tests only checked that the result
contained "2024-01-15", which left the canonical-form contract unpinned.

@google-cloud/bigquery's BigQuery.datetime() already normalises Z-suffixed
input to BigQuery's canonical DATETIME form, so the production code is
correct. This test guards against future regressions.
The regex in PartitionValueConverter accepts both "T" and a literal space
as the date/time separator (RFC 3339 allows either), but no test covered
the space form. Pinned so the contract is explicit.
ISO 8601 allows "24:00:00" as an alias for next-day midnight, and JS Date
parses it that way. The new string parsing path in 0.3.2 would therefore
silently roll inputs like "2024-01-15T24:00:00Z" forward to the
"2024-01-16" partition. That is a regression vs 0.2.x, where the raw
string was passed straight to BigQuery and BigQuery rejected hour 24
outright, surfacing the bad row instead of misfiling it.

Reject hour 24 explicitly so the caller logs firestoreTimePartitionFieldError
and the row lands in the __NULL__ partition (loud failure, matching the
calendar-validator philosophy already applied to Feb 30 etc.). Minute
and second out-of-range values are still caught by new Date() returning
Invalid Date in the existing check below.

Adds a test pinning the hour 24 rejection.
@cabljac cabljac merged commit 2626667 into next May 11, 2026
7 checks passed
@cabljac cabljac deleted the fix/bq-part branch May 11, 2026 11:22
cabljac added a commit that referenced this pull request May 11, 2026
Bumps 7 extensions to capture security and minor dep bumps that have
accumulated since eea0e40 (the last collective bump on 2026-04-14).
firestore-bigquery-export is already current at 0.3.2 from #2813.

- delete-user-data 0.1.28 -> 0.1.29
- firestore-counter 0.2.15 -> 0.2.16
- firestore-send-email 0.2.8 -> 0.2.9
- firestore-shorten-urls-bitly 0.2.4 -> 0.2.5
- firestore-translate-text 0.1.28 -> 0.1.29
- rtdb-limit-child-nodes 0.1.18 -> 0.1.19
- storage-resize-images 0.3.3 -> 0.3.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [firestore-bigquery-export] String partition values silently become NULL after 0.3.0 (regression vs 0.2.x)

2 participants