Skip to content

Commit 786142e

Browse files
authored
Improve wording about error cases in VariantShredding (#523)
1 parent 9fd57b5 commit 786142e

File tree

1 file changed

+17
-16
lines changed

1 file changed

+17
-16
lines changed

VariantShredding.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous values.
2323
Query engines encode each Variant value in a self-describing format, and store it as a group containing `value` and `metadata` binary fields in Parquet.
2424
Since data is often partially homogeneous, it can be beneficial to extract certain fields into separate Parquet columns to further improve performance.
25-
This process is **shredding**.
25+
This process is called **shredding**.
2626

2727
Shredding enables the use of Parquet's columnar representation for more compact data encoding, column statistics for data skipping, and partial projections.
2828

@@ -202,21 +202,22 @@ As a result, reads when both `value` and `typed_value` are defined may be incons
202202

203203
The table below shows how the series of objects in the first column would be stored:
204204

205-
| Event object | `value` | `typed_value` | `typed_value.event_type.value` | `typed_value.event_type.typed_value` | `typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes |
206-
|------------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------|
207-
| `{"event_type": "noop", "event_ts": 1729794114937}` | null | non-null | null | `noop` | null | 1729794114937 | Fully shredded object |
208-
| `{"event_type": "login", "event_ts": 1729794146402, "email": "user@example.com"}` | `{"email": "user@example.com"}` | non-null | null | `login` | null | 1729794146402 | Partially shredded object |
209-
| `{"error_msg": "malformed: ..."}` | `{"error_msg", "malformed: ..."}` | non-null | null | null | null | null | Object with all shredded fields missing |
210-
| `"malformed: not an object"` | `malformed: not an object` | null | | | | | Not an object (stored as Variant string) |
211-
| `{"event_ts": 1729794240241, "click": "_button"}` | `{"click": "_button"}` | non-null | null | null | null | 1729794240241 | Field `event_type` is missing |
212-
| `{"event_type": null, "event_ts": 1729794954163}` | null | non-null | `00` (field exists, is null) | null | null | 1729794954163 | Field `event_type` is present and is null |
213-
| `{"event_type": "noop", "event_ts": "2024-10-24"}` | null | non-null | null | `noop` | `"2024-10-24"` | null | Field `event_ts` is present but not a timestamp |
214-
| `{ }` | null | non-null | null | null | null | null | Object is present but empty |
215-
| null | `00` (null) | null | | | | | Object/value is null |
216-
| missing | null | null | | | | | Object/value is missing |
217-
| INVALID | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID: Shredded field is present in `value` |
218-
| INVALID | `"a"` | non-null | null | null | null | null | INVALID: `typed_value` is present for non-object |
219-
| INVALID | `02 00` (object with 0 fields) | null | | | | | INVALID: `typed_value` is null for object |
205+
| Event object | `value` | `typed_value` | `typed_value.event_type.value` | `typed_value.event_type.typed_value` | `typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes |
206+
|-----------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|----------------------------------------------------------------------------|
207+
| `{"event_type": "noop", "event_ts": 1729794114937}` | null | non-null | null | `noop` | null | 1729794114937 | Fully shredded object |
208+
| `{"event_type": "login", "event_ts": 1729794146402, "email": "user@example.com"}` | `{"email": "user@example.com"}` | non-null | null | `login` | null | 1729794146402 | Partially shredded object |
209+
| `{"error_msg": "malformed: ..."}` | `{"error_msg", "malformed: ..."}` | non-null | null | null | null | null | Object with all shredded fields missing |
210+
| `"malformed: not an object"` | `malformed: not an object` | null | | | | | Not an object (stored as Variant string) |
211+
| `{"event_ts": 1729794240241, "click": "_button"}` | `{"click": "_button"}` | non-null | null | null | null | 1729794240241 | Field `event_type` is missing |
212+
| `{"event_type": null, "event_ts": 1729794954163}` | null | non-null | `00` (field exists, is null) | null | null | 1729794954163 | Field `event_type` is present and is null |
213+
| `{"event_type": "noop", "event_ts": "2024-10-24"}` | null | non-null | null | `noop` | `"2024-10-24"` | null | Field `event_ts` is present but not a timestamp |
214+
| `{ }` | null | non-null | null | null | null | null | Object is present but empty |
215+
| null | `00` (null) | null | | | | | Object/value is null |
216+
| missing | null | null | | | | | Object/value is missing |
217+
| INVALID: `{"event_type": "login", "event_ts": 1729795057774}` | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID: Shredded field is present in `value` |
218+
| INVALID: `{"event_type": "login"}` | `{"event_type": "login"}` | null | | | | | INVALID: Shredded field is present in `value`, while `typed_value` is null |
219+
| INVALID: `"a"` | `"a"` | non-null | null | null | null | null | INVALID: `typed_value` is present and `value` is not an object |
220+
| INVALID: `{}` | `02 00` (object with 0 fields) | null | | | | | INVALID: `typed_value` is null for object |
220221

221222
Invalid cases in the table above must not be produced by writers.
222223
Readers must return an object when `typed_value` is non-null containing the shredded fields.

0 commit comments

Comments
 (0)