Replication of Null-values in nested JSON records to BigQuery

The current loading of nested `JSON` / `JSONB` data structures to BigQuery (v2) destination is dropping some information as it does not differentiate between a nested object containing only `NULL` values (explicit null) compared to a non-existing object (implicit null).

Let's suppose a source table in e.g. PostgreSQL with three records. The table has, among others, the column `person`. The contents of the `person` column for the three records are:
1. `{"name": "Person A", "job": {"employer": "Best employer", "employed_since_date": "2020-04-01"}}`
2. `{"name": "Person B", "job": {"employer": null, "employed_since_date": null}}` (The person is unemployed, so the job info is an explicit null)
3. `{"name": "Person C"}` (The job information is unknown, i.e. implicit null)

As such the source table in the pgSQL DB, while containing null values for the `job` object's sub-objects in the second record, carries some information in itself: it is known that`Person A` is employed, `Person B` is unemployed, while `Person C`'s job info is unknown.

However, both of these are replicated to BigQuery with the following schema:
- `person` - `RECORD`
    - `person.name` - `STRING`: `Person A` / `Person B` / `Person C`
    - `person.job` - `RECORD`
        - `person.job.employer` - `STRING`: `null` for the two latter records
        - `person.job.employed_since_date` - `DATE`: `null` for the two latter records

In the BigQuery v1 connector where `JSON` / `JSONB` columns were replicated as `STRING`, the information was carried over (the existence or lack of a `job` object in the `JSON` string), but is indistinguishable in the v2 connector.

Sure, this could be fixed on the DB side by adding e.g. an `is_employed`: (`true`/`false`) object to the `person.job` object and populate it for every record, but a version upgrade in the BigQuery destination connector should not cause need for changes on the source side for customers. We need a possibility to differentiate between explicit and implicit nulls, as they may carry some information in and of themselves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replication of Null-values in nested JSON records to BigQuery #1098

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replication of Null-values in nested JSON records to BigQuery #1098

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions