Commit a4d8f3f
fix: bracket access for object keys containing special characters (e.g. periods) (#309)
## Problem
`col['key.with.period']` silently returns `NULL` instead of the key's
value.
This affects any Snowflake VARIANT column whose JSON was produced by a
system that uses dotted strings as flat key names — a common pattern in
tracing and observability platforms (e.g. OpenTelemetry attributes,
Braintrust span metadata).
## Root cause: two separate bugs in `indices_to_json_extract`
### Bug 1 — dot in key name treated as JSONPath path separator
`col['a.b']` was transformed to:
```sql
col -> '$.a.b'
```
DuckDB's JSONPath engine interprets `.` as a path separator, so this
traverses *into* a nested object (`a` → `b`) rather than looking up the
literal key `"a.b"`. If no nested structure exists the result is `NULL`.
The fix exploits a DuckDB feature: [a path that does not start with `$`
is treated as a direct key
name](https://duckdb.org/docs/extensions/json#json-extraction-functions),
bypassing JSONPath interpretation entirely. Keys that aren't simple
identifiers (`[A-Za-z0-9_]+`) now omit the `$.` prefix:
```
col['simple'] → col -> '$.simple' (unchanged)
col['a.b'] → col -> 'a.b' (direct key lookup, no JSONPath)
```
### Bug 2 — `::string` / `::text` not recognised as string casts
In Snowflake, `::varchar`, `::string`, and `::text` are all equivalent
string casts. However sqlglot parses them to different `DataType`
values:
| Snowflake syntax | sqlglot `DataType.Type` |
|---|---|
| `::varchar` | `VARCHAR` |
| `::string` / `::text` | `TEXT` |
| `::nvarchar` | `NVARCHAR` |
The existing check was `== exp.DataType.Type.VARCHAR` only, so
`::string` fell through to `JSONExtract` (`->`) instead of
`JSONExtractScalar` (`->>`). This returned the JSON-encoded value (e.g.
`'"sess1"'` with surrounding quotes) rather than the plain string
(`'sess1'`).
The fix introduces `_STRING_CAST_TYPES = {VARCHAR, TEXT, NVARCHAR}` and
checks `in _STRING_CAST_TYPES`.
**Note:** these two bugs interact. In the common real-world pattern
`metadata['request_context.channel_session_id']::string`, both bugs fire
simultaneously — the key returns `NULL` (bug 1), and even if it didn't,
the string value would still be returned JSON-quoted (bug 2).
## Changes
**`fakesnow/transforms/transforms.py`**
- Add `import re`
- Add module-level `_SIMPLE_JSON_KEY` regex and `_STRING_CAST_TYPES` set
- `indices_to_json_extract`: expand string-type check from `== VARCHAR`
to `in _STRING_CAST_TYPES`
- `indices_to_json_extract`: for string keys that don't match
`_SIMPLE_JSON_KEY`, emit the key directly instead of prefixing with `$.`
**`tests/test_semis.py`**
- `test_object_key_with_period`: end-to-end test — inserts a row with a
period-containing key, asserts correct values for bare access
(JSON-typed result with surrounding quotes) and `::string`-cast access
(unquoted plain string), and `NULL` for a missing dotted key
**`tests/test_transforms.py`**
- `test_indices_to_object`: two new assertions — `->` output for a
period key without cast, and `->>` output for a period key with
`::varchar` cast
## Verification
```
310 passed, 1 xfailed in 26.84s
```
No regressions against the existing suite.
Co-authored-by: Andy Mackinlay <andy.mackinlay@xero.com>1 parent be614cf commit a4d8f3f
File tree
3 files changed
+55
-4
lines changed- fakesnow/transforms
- tests
3 files changed
+55
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
583 | 584 | | |
584 | 585 | | |
585 | 586 | | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
586 | 599 | | |
587 | 600 | | |
588 | 601 | | |
| |||
603 | 616 | | |
604 | 617 | | |
605 | 618 | | |
606 | | - | |
607 | | - | |
608 | | - | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
609 | 622 | | |
610 | 623 | | |
611 | 624 | | |
612 | 625 | | |
613 | | - | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
614 | 632 | | |
615 | 633 | | |
616 | 634 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
165 | 182 | | |
166 | 183 | | |
167 | 184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
463 | 463 | | |
464 | 464 | | |
465 | 465 | | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
466 | 482 | | |
467 | 483 | | |
468 | 484 | | |
| |||
0 commit comments