Skip to content

feat: Snowflake-to-DuckDB data type translation and driver compatibility fixes#46

Open
roend83 wants to merge 15 commits intonnnkkk7:mainfrom
roend83:feat/snowflake-to-duckdb-type-translation
Open

feat: Snowflake-to-DuckDB data type translation and driver compatibility fixes#46
roend83 wants to merge 15 commits intonnnkkk7:mainfrom
roend83:feat/snowflake-to-duckdb-type-translation

Conversation

@roend83
Copy link

@roend83 roend83 commented Feb 13, 2026

Summary

  • Add Snowflake-to-DuckDB data type translation for DDL statements (CREATE TABLE, ALTER TABLE) and CAST expressions, enabling real Snowflake DDL to work against the emulator
  • Fix INFORMATION_SCHEMA queries by stripping backticks added by vitess-sqlparser
  • Fix DDL/DML response format to include empty rowtype/rowset arrays for .NET driver compatibility
  • Use Snowflake wire protocol type names (FIXED, REAL) instead of SQL-level names (NUMBER, FLOAT) in column metadata

Details

Data type translation (17 mappings)

Snowflake DuckDB
NUMBER NUMERIC
TEXT, STRING, CHAR, CHARACTER VARCHAR
TIMESTAMP_NTZ TIMESTAMP
TIMESTAMP_LTZ, TIMESTAMP_TZ TIMESTAMPTZ
VARIANT, OBJECT, ARRAY JSON
BINARY, VARBINARY BLOB
FLOAT4 FLOAT
FLOAT8 DOUBLE
BYTEINT TINYINT
DATETIME TIMESTAMP

DDL uses word-boundary-aware string replacement with string literal protection. DML targets only convert() expressions to avoid false positives on column names matching type names.

INFORMATION_SCHEMA fix

vitess-sqlparser backtick-quotes MySQL reserved words (e.g., INFORMATION_SCHEMA.`tables`). DuckDB rejects backtick quoting, so they are stripped during post-processing.

Driver compatibility fixes

  • Some Snowflake drivers (e.g., the .NET Snowflake.Data driver) expect rowtype and rowset fields in every query response. Removed omitempty from JSON tags and initialize as empty arrays for DDL/DML.
  • The Snowflake wire protocol uses FIXED for numeric types and REAL for floating-point, not NUMBER/FLOAT. Updated the type mapper to match.

Test plan

  • 43 new unit tests covering DDL, CAST, ALTER, INFORMATION_SCHEMA, and edge cases
  • All existing tests pass (go test ./...)
  • End-to-end testing with all three examples (gosnowflake, restapi, docker)

Closes #45

🤖 Generated with Claude Code

roend83 and others added 4 commits February 13, 2026 17:20
DDL statements (CREATE TABLE, ALTER TABLE) and CAST expressions now
translate Snowflake-native type names to DuckDB equivalents, enabling
real Snowflake DDL to work against the emulator without modification.

Supports 17 type mappings including NUMBER→NUMERIC, TEXT/STRING→VARCHAR,
TIMESTAMP_NTZ→TIMESTAMP, VARIANT/OBJECT/ARRAY→JSON, and more. DDL uses
word-boundary-aware replacement with string literal protection; DML
targets only convert() expressions to avoid false positives on column
names that match type names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vitess-sqlparser adds MySQL-style backticks around reserved words when
serializing the AST back to SQL (e.g., INFORMATION_SCHEMA.`tables`).
DuckDB rejects backtick-quoted identifiers, causing all system table
queries to fail. Strip backticks during post-processing to fix this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Some Snowflake drivers (e.g. the .NET Snowflake.Data driver) expect
rowtype and rowset fields to always be present in query responses, even
for DDL/DML statements that return no rows. When these fields were
omitted (via omitempty), drivers failed to construct a result set,
causing null reference errors during ExecuteNonQuery calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Snowflake driver SDK expects specific wire protocol type names
(FIXED, REAL, TEXT, etc.) in the rowtype metadata, not SQL-level type
names (NUMBER, FLOAT). The driver parses these into an SFDataType enum
and throws "Unknown column type" for unrecognized names.

Changed type mapper: BIGINT/INTEGER/DECIMAL→FIXED, DOUBLE/FLOAT→REAL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roend83 roend83 force-pushed the feat/snowflake-to-duckdb-type-translation branch from c6cc2c5 to f39319a Compare February 13, 2026 22:21
@nnnkkk7 nnnkkk7 self-requested a review February 14, 2026 01:21
roend83 and others added 2 commits February 16, 2026 13:27
Intercept CREATE DATABASE, CREATE DATABASE IF NOT EXISTS, and
CREATE OR REPLACE DATABASE in the executor and route through the
metadata repository instead of passing to DuckDB (which doesn't
support CREATE DATABASE syntax).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ement SQL

When CREATE DATABASE is the first statement in a multi-statement batch
(e.g., creation scripts with CREATE SCHEMA and CREATE TABLE following),
the remaining statements are now passed through to DuckDB for execution
instead of being silently dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roend83 roend83 force-pushed the feat/snowflake-to-duckdb-type-translation branch from 6a2cc12 to 66d2acc Compare February 16, 2026 18:28
@roend83
Copy link
Author

roend83 commented Feb 16, 2026

@nnnkkk7 I ended up putting a few different things in this PR based on the needs of my current integration testing suite:

  1. Support for Snowflake to DuckDB type differences
  2. Support for querying the INFORMATION_SCHEMA tables
  3. Fixed some API issues that caused the .NET snowflake driver to complain
  4. Added support for the snowflake CREATE DATABASE query instead of calling the create database API

Claude generated all of this code for me. It looked reasonable to me but I'm also not very proficient in Go so I'm happy to make any changes you think are necessary.

roend83 and others added 9 commits February 17, 2026 19:26
Replace goto with boolean flag in executeCreateDatabase, use
FindStringSubmatchIndex for precise match positioning, scope backtick
removal to only unwrap identifier quoting (preserving backticks in
string literals), cache Classifier on Executor, and define type
mappings in length-descending order without runtime sort.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CREATE DATABASE in the emulator only created a DuckDB schema (not a
separate database file), so it didn't provide real database isolation.
Scripts using CREATE OR REPLACE DATABASE followed by CREATE SCHEMA
would fail on re-run because the schema created in the remaining SQL
was never cleaned up by the drop. Users should use CREATE OR REPLACE
SCHEMA directly instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The translator was skipping type translation for SQL batches starting
with DROP, TRUNCATE, SHOW, etc. When a multi-statement batch starts
with DROP SCHEMA and contains CREATE TABLE statements with Snowflake
types (NUMBER, TEXT, TIMESTAMP_NTZ), those types were passed through
untranslated to DuckDB, causing "Type with name NUMBER does not exist"
errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Snowflake .NET driver sends named bindings (e.g., :p0, :p1, :foo)
rather than numeric positional bindings (:1, :2). Use regex word-boundary
matching to replace bind placeholders, which prevents partial matches
like :p1 corrupting :p10 or :foo corrupting :foo_bar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The gosnowflake wire protocol query handler was ignoring parameter
bindings sent by clients. The .NET Snowflake driver sends named
bindings (e.g., :OperationId) in the request body, but the query
handler was not passing them through to the executor.

Added bindings parameter to ExecuteWithHistory and QueryWithHistory
so both history tracking and binding substitution happen consistently.
Updated QueryRequest.Bindings type from map[string]interface{} to
map[string]*BindingValue to properly parse structured binding values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Snowflake .NET driver sends null binding values (JSON null) and
timestamps as nanoseconds since epoch (e.g., 1704067200000000000).
Changed BindingValue.Value to *string to distinguish null from empty
string, and added epoch nanosecond parsing for TIMESTAMP_NTZ bindings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoid recompiling the static \? regex on every call to
replaceQuestionMarkPlaceholders. Follows the idiomatic Go pattern
of precompiling regexes at package level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…esponse format

Update statement type IDs to match Snowflake's actual wire protocol values
(e.g. SELECT=0x1000, INSERT=0x3100, UPDATE=0x3200, DELETE=0x3300) and return
affected row counts in the rowset data for DML statements. The Snowflake .NET
driver reads row counts from rowset, not the total field, so EF Core was seeing
0 affected rows and rolling back transactions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Bindings

Sort binding keys by length descending and use byte-level word-boundary
checks instead of compiling a new regex for every binding key on each call.
Also add comment explaining ROLLBACK's intentional fallthrough to generic DML.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roend83
Copy link
Author

roend83 commented Feb 19, 2026

I reverted the change to support CREATE DATABASE commands after I realized that this emulator doesn't really support multiple databases (it's translating database creations to creating new schemas).

I also added several more changes to get the emulator working with the .NET snowflake driver. I think these are all changes to better emulate realistic snowflake responses. I don't believe I need any other changes to get my integration test suites working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Type Translation

1 participant