Skip to content

Perf: transform chain repeatedly deepcopies AST on every query #303

@sayeaud-accelins

Description

@sayeaud-accelins

Problem

The _transform method in cursor.py (lines 304-355) chains 63 sequential .transform() calls to convert Snowflake SQL to DuckDB. Each call uses sqlglot's default copy=True (see sqlglot source), which triggers a full AST deepcopy.

For simple queries this doesn't matter much, but for complex queries with large ASTs (lots of JOINs, CTEs, nested CASE expressions), the overhead adds up fast. We were seeing ~658K deepcopy operations across our test suite, and switching to copy=False gave us a ~3x speedup.

The copy=True default exists so you don't accidentally mutate an AST that's used elsewhere. But in FakeSnow's case:

  • Each query parses fresh SQL - there's no AST reuse between queries
  • The transforms run sequentially on a local variable
  • The original expression is thrown away after transpilation
  • The whole thing is scoped to a single cursor execution

So, the immutability guarantee doesn't seem to buy anything here.

Current workaround

We're monkey-patching Expression.transform in our test fixtures to default copy=False:

  @pytest.fixture(scope="session", autouse=True)
  def optimize_fakesnow_transforms() -> Iterator[None]:
      original_transform = exp.Expression.transform

      def patched_transform(
          self: exp.Expression,
          fun: Callable[..., Any],
          *args: Any,
          copy: bool = False,
          **kwargs: Any
      ) -> exp.Expression:
          return original_transform(self, fun, *args, copy=copy, **kwargs)

      exp.Expression.transform = patched_transform
      yield
      exp.Expression.transform = original_transform

Suggested fix

Add copy=False to the transform chain in cursor.py. Could be on every call, or just on the first one (since after that you're already working on a copy anyway):

  # First transform creates the working copy
  expression = expression.transform(lambda e: transforms.identifier(e, params), copy=False)
  # Rest operate in-place on that copy
      .transform(transforms.upper_case_unquoted_identifiers)
      .transform(transforms.alter_session)
      # ...

Environment

  • fakesnow 0.11.1
  • sqlglot 28.5.0
  • Python 3.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions