-
-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Problem
The _transform method in cursor.py (lines 304-355) chains 63 sequential .transform() calls to convert Snowflake SQL to DuckDB. Each call uses sqlglot's default copy=True (see sqlglot source), which triggers a full AST deepcopy.
For simple queries this doesn't matter much, but for complex queries with large ASTs (lots of JOINs, CTEs, nested CASE expressions), the overhead adds up fast. We were seeing ~658K deepcopy operations across our test suite, and switching to copy=False gave us a ~3x speedup.
The copy=True default exists so you don't accidentally mutate an AST that's used elsewhere. But in FakeSnow's case:
- Each query parses fresh SQL - there's no AST reuse between queries
- The transforms run sequentially on a local variable
- The original expression is thrown away after transpilation
- The whole thing is scoped to a single cursor execution
So, the immutability guarantee doesn't seem to buy anything here.
Current workaround
We're monkey-patching Expression.transform in our test fixtures to default copy=False:
@pytest.fixture(scope="session", autouse=True)
def optimize_fakesnow_transforms() -> Iterator[None]:
original_transform = exp.Expression.transform
def patched_transform(
self: exp.Expression,
fun: Callable[..., Any],
*args: Any,
copy: bool = False,
**kwargs: Any
) -> exp.Expression:
return original_transform(self, fun, *args, copy=copy, **kwargs)
exp.Expression.transform = patched_transform
yield
exp.Expression.transform = original_transformSuggested fix
Add copy=False to the transform chain in cursor.py. Could be on every call, or just on the first one (since after that you're already working on a copy anyway):
# First transform creates the working copy
expression = expression.transform(lambda e: transforms.identifier(e, params), copy=False)
# Rest operate in-place on that copy
.transform(transforms.upper_case_unquoted_identifiers)
.transform(transforms.alter_session)
# ...Environment
- fakesnow 0.11.1
- sqlglot 28.5.0
- Python 3.11