|
22 | 22 | The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous values. |
23 | 23 | Query engines encode each Variant value in a self-describing format, and store it as a group containing `value` and `metadata` binary fields in Parquet. |
24 | 24 | Since data is often partially homogeneous, it can be beneficial to extract certain fields into separate Parquet columns to further improve performance. |
25 | | -This process is **shredding**. |
| 25 | +This process is called **shredding**. |
26 | 26 |
|
27 | 27 | Shredding enables the use of Parquet's columnar representation for more compact data encoding, column statistics for data skipping, and partial projections. |
28 | 28 |
|
@@ -202,21 +202,22 @@ As a result, reads when both `value` and `typed_value` are defined may be incons |
202 | 202 |
|
203 | 203 | The table below shows how the series of objects in the first column would be stored: |
204 | 204 |
|
205 | | -| Event object | `value` | `typed_value` | `typed_value.event_type.value` | `typed_value.event_type.typed_value` | `typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes | |
206 | | -|------------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------| |
207 | | -| `{"event_type": "noop", "event_ts": 1729794114937}` | null | non-null | null | `noop` | null | 1729794114937 | Fully shredded object | |
208 | | -| `{"event_type": "login", "event_ts": 1729794146402, "email": "user@example.com"}` | `{"email": "user@example.com"}` | non-null | null | `login` | null | 1729794146402 | Partially shredded object | |
209 | | -| `{"error_msg": "malformed: ..."}` | `{"error_msg", "malformed: ..."}` | non-null | null | null | null | null | Object with all shredded fields missing | |
210 | | -| `"malformed: not an object"` | `malformed: not an object` | null | | | | | Not an object (stored as Variant string) | |
211 | | -| `{"event_ts": 1729794240241, "click": "_button"}` | `{"click": "_button"}` | non-null | null | null | null | 1729794240241 | Field `event_type` is missing | |
212 | | -| `{"event_type": null, "event_ts": 1729794954163}` | null | non-null | `00` (field exists, is null) | null | null | 1729794954163 | Field `event_type` is present and is null | |
213 | | -| `{"event_type": "noop", "event_ts": "2024-10-24"}` | null | non-null | null | `noop` | `"2024-10-24"` | null | Field `event_ts` is present but not a timestamp | |
214 | | -| `{ }` | null | non-null | null | null | null | null | Object is present but empty | |
215 | | -| null | `00` (null) | null | | | | | Object/value is null | |
216 | | -| missing | null | null | | | | | Object/value is missing | |
217 | | -| INVALID | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID: Shredded field is present in `value` | |
218 | | -| INVALID | `"a"` | non-null | null | null | null | null | INVALID: `typed_value` is present for non-object | |
219 | | -| INVALID | `02 00` (object with 0 fields) | null | | | | | INVALID: `typed_value` is null for object | |
| 205 | +| Event object | `value` | `typed_value` | `typed_value.event_type.value` | `typed_value.event_type.typed_value` | `typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes | |
| 206 | +|-----------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|----------------------------------------------------------------------------| |
| 207 | +| `{"event_type": "noop", "event_ts": 1729794114937}` | null | non-null | null | `noop` | null | 1729794114937 | Fully shredded object | |
| 208 | +| `{"event_type": "login", "event_ts": 1729794146402, "email": "user@example.com"}` | `{"email": "user@example.com"}` | non-null | null | `login` | null | 1729794146402 | Partially shredded object | |
| 209 | +| `{"error_msg": "malformed: ..."}` | `{"error_msg", "malformed: ..."}` | non-null | null | null | null | null | Object with all shredded fields missing | |
| 210 | +| `"malformed: not an object"` | `malformed: not an object` | null | | | | | Not an object (stored as Variant string) | |
| 211 | +| `{"event_ts": 1729794240241, "click": "_button"}` | `{"click": "_button"}` | non-null | null | null | null | 1729794240241 | Field `event_type` is missing | |
| 212 | +| `{"event_type": null, "event_ts": 1729794954163}` | null | non-null | `00` (field exists, is null) | null | null | 1729794954163 | Field `event_type` is present and is null | |
| 213 | +| `{"event_type": "noop", "event_ts": "2024-10-24"}` | null | non-null | null | `noop` | `"2024-10-24"` | null | Field `event_ts` is present but not a timestamp | |
| 214 | +| `{ }` | null | non-null | null | null | null | null | Object is present but empty | |
| 215 | +| null | `00` (null) | null | | | | | Object/value is null | |
| 216 | +| missing | null | null | | | | | Object/value is missing | |
| 217 | +| INVALID: `{"event_type": "login", "event_ts": 1729795057774}` | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID: Shredded field is present in `value` | |
| 218 | +| INVALID: `{"event_type": "login"}` | `{"event_type": "login"}` | null | | | | | INVALID: Shredded field is present in `value`, while `typed_value` is null | |
| 219 | +| INVALID: `"a"` | `"a"` | non-null | null | null | null | null | INVALID: `typed_value` is present and `value` is not an object | |
| 220 | +| INVALID: `{}` | `02 00` (object with 0 fields) | null | | | | | INVALID: `typed_value` is null for object | |
220 | 221 |
|
221 | 222 | Invalid cases in the table above must not be produced by writers. |
222 | 223 | Readers must return an object when `typed_value` is non-null containing the shredded fields. |
|
0 commit comments