Skip to content

Commit a4d8f3f

Browse files
admackinAndy Mackinlay
andauthored
fix: bracket access for object keys containing special characters (e.g. periods) (#309)
## Problem `col['key.with.period']` silently returns `NULL` instead of the key's value. This affects any Snowflake VARIANT column whose JSON was produced by a system that uses dotted strings as flat key names — a common pattern in tracing and observability platforms (e.g. OpenTelemetry attributes, Braintrust span metadata). ## Root cause: two separate bugs in `indices_to_json_extract` ### Bug 1 — dot in key name treated as JSONPath path separator `col['a.b']` was transformed to: ```sql col -> '$.a.b' ``` DuckDB's JSONPath engine interprets `.` as a path separator, so this traverses *into* a nested object (`a` → `b`) rather than looking up the literal key `"a.b"`. If no nested structure exists the result is `NULL`. The fix exploits a DuckDB feature: [a path that does not start with `$` is treated as a direct key name](https://duckdb.org/docs/extensions/json#json-extraction-functions), bypassing JSONPath interpretation entirely. Keys that aren't simple identifiers (`[A-Za-z0-9_]+`) now omit the `$.` prefix: ``` col['simple'] → col -> '$.simple' (unchanged) col['a.b'] → col -> 'a.b' (direct key lookup, no JSONPath) ``` ### Bug 2 — `::string` / `::text` not recognised as string casts In Snowflake, `::varchar`, `::string`, and `::text` are all equivalent string casts. However sqlglot parses them to different `DataType` values: | Snowflake syntax | sqlglot `DataType.Type` | |---|---| | `::varchar` | `VARCHAR` | | `::string` / `::text` | `TEXT` | | `::nvarchar` | `NVARCHAR` | The existing check was `== exp.DataType.Type.VARCHAR` only, so `::string` fell through to `JSONExtract` (`->`) instead of `JSONExtractScalar` (`->>`). This returned the JSON-encoded value (e.g. `'"sess1"'` with surrounding quotes) rather than the plain string (`'sess1'`). The fix introduces `_STRING_CAST_TYPES = {VARCHAR, TEXT, NVARCHAR}` and checks `in _STRING_CAST_TYPES`. **Note:** these two bugs interact. In the common real-world pattern `metadata['request_context.channel_session_id']::string`, both bugs fire simultaneously — the key returns `NULL` (bug 1), and even if it didn't, the string value would still be returned JSON-quoted (bug 2). ## Changes **`fakesnow/transforms/transforms.py`** - Add `import re` - Add module-level `_SIMPLE_JSON_KEY` regex and `_STRING_CAST_TYPES` set - `indices_to_json_extract`: expand string-type check from `== VARCHAR` to `in _STRING_CAST_TYPES` - `indices_to_json_extract`: for string keys that don't match `_SIMPLE_JSON_KEY`, emit the key directly instead of prefixing with `$.` **`tests/test_semis.py`** - `test_object_key_with_period`: end-to-end test — inserts a row with a period-containing key, asserts correct values for bare access (JSON-typed result with surrounding quotes) and `::string`-cast access (unquoted plain string), and `NULL` for a missing dotted key **`tests/test_transforms.py`** - `test_indices_to_object`: two new assertions — `->` output for a period key without cast, and `->>` output for a period key with `::varchar` cast ## Verification ``` 310 passed, 1 xfailed in 26.84s ``` No regressions against the existing suite. Co-authored-by: Andy Mackinlay <andy.mackinlay@xero.com>
1 parent be614cf commit a4d8f3f

File tree

3 files changed

+55
-4
lines changed

3 files changed

+55
-4
lines changed

fakesnow/transforms/transforms.py

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from __future__ import annotations
22

3+
import re
34
from pathlib import Path
45
from string import Template
56
from typing import ClassVar, cast
@@ -583,6 +584,18 @@ def identifier(expression: exp.Expression, params: MutableParams | None) -> exp.
583584
return expression
584585

585586

587+
_SIMPLE_JSON_KEY = re.compile(r"^[A-Za-z0-9_]+$")
588+
589+
# Snowflake string-like types that warrant using JSONExtractScalar (->>)
590+
# to return an unquoted string rather than a JSON-encoded value.
591+
# ::varchar -> VARCHAR, ::string / ::text -> TEXT, ::nvarchar -> NVARCHAR
592+
_STRING_CAST_TYPES = {
593+
exp.DataType.Type.VARCHAR,
594+
exp.DataType.Type.TEXT,
595+
exp.DataType.Type.NVARCHAR,
596+
}
597+
598+
586599
def indices_to_json_extract(expression: exp.Expression) -> exp.Expression:
587600
"""Convert indices on objects and arrays to json_extract or json_extract_string
588601
@@ -603,14 +616,19 @@ def indices_to_json_extract(expression: exp.Expression) -> exp.Expression:
603616
and isinstance(index, exp.Literal)
604617
and index.this
605618
):
606-
if isinstance(expression.parent, exp.Cast) and expression.parent.to.this == exp.DataType.Type.VARCHAR:
607-
# If the parent is a cast to varchar, we need to use JSONExtractScalar
608-
# to get the unquoted string value.
619+
if isinstance(expression.parent, exp.Cast) and expression.parent.to.this in _STRING_CAST_TYPES:
620+
# If the parent is a cast to a string type (::varchar, ::string, ::text, ::nvarchar),
621+
# use JSONExtractScalar to return an unquoted string value.
609622
klass = exp.JSONExtractScalar
610623
else:
611624
klass = exp.JSONExtract
612625
if index.is_string:
613-
return klass(this=expression.this, expression=exp.Literal(this=f"$.{index.this}", is_string=True))
626+
key = index.this
627+
# Simple identifiers use standard JSONPath dot notation ($.key).
628+
# Keys with special characters (e.g. periods) are passed without the $. prefix so
629+
# DuckDB treats the value as a direct key lookup, bypassing JSONPath path separation.
630+
path = f"$.{key}" if _SIMPLE_JSON_KEY.match(key) else key
631+
return klass(this=expression.this, expression=exp.Literal(this=path, is_string=True))
614632
else:
615633
return klass(this=expression.this, expression=exp.Literal(this=f"$[{index.this}]", is_string=True))
616634

tests/test_semis.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,23 @@ def test_semi_structured_types(cur: snowflake.connector.cursor.SnowflakeCursor):
162162
]
163163

164164

165+
def test_object_key_with_period(cur: snowflake.connector.cursor.SnowflakeCursor):
166+
# Keys that contain periods must be treated as literal key names, not JSONPath expressions.
167+
# e.g. obj['a.b'] is a single key containing a period, not a nested path a -> b.
168+
cur.execute("""create or replace table dotkeys (obj variant)""")
169+
cur.execute("""insert into dotkeys select parse_json('{"a.b": "val1"}')""")
170+
171+
cur.execute("select obj['a.b'] from dotkeys")
172+
assert cur.fetchall() == [('"val1"',)]
173+
174+
cur.execute("select obj['a.b']::string from dotkeys")
175+
assert cur.fetchall() == [("val1",)]
176+
177+
# A missing key should return NULL, not raise an error
178+
cur.execute("select obj['x.y']::string from dotkeys")
179+
assert cur.fetchall() == [(None,)]
180+
181+
165182
def test_try_parse_json(dcur: snowflake.connector.cursor.DictCursor):
166183
dcur.execute("""SELECT TRY_PARSE_JSON('{"first":"foo", "last":"bar"}') AS j""")
167184
assert dindent(dcur.fetchall()) == [{"J": '{\n "first": "foo",\n "last": "bar"\n}'}]

tests/test_transforms.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,22 @@ def test_indices_to_object() -> None:
463463
== "SELECT name -> '$.k' FROM semi"
464464
)
465465

466+
# Keys with special characters (e.g. periods) must use direct key lookup instead of
467+
# JSONPath dot notation, otherwise DuckDB interprets dots as path separators.
468+
assert (
469+
sqlglot.parse_one("SELECT meta['key.with.period'] FROM semi")
470+
.transform(indices_to_json_extract)
471+
.sql(dialect="duckdb")
472+
== "SELECT meta -> 'key.with.period' FROM semi"
473+
)
474+
475+
assert (
476+
sqlglot.parse_one("SELECT meta['key.with.period']::varchar FROM semi", read="snowflake")
477+
.transform(indices_to_json_extract)
478+
.sql(dialect="duckdb")
479+
== "SELECT CAST(meta ->> 'key.with.period' AS TEXT) FROM semi"
480+
)
481+
466482

467483
def test_integer_precision() -> None:
468484
assert (

0 commit comments

Comments
 (0)