feat: Snowflake-to-DuckDB data type translation and driver compatibility fixes#46
Open
roend83 wants to merge 15 commits intonnnkkk7:mainfrom
Open
feat: Snowflake-to-DuckDB data type translation and driver compatibility fixes#46roend83 wants to merge 15 commits intonnnkkk7:mainfrom
roend83 wants to merge 15 commits intonnnkkk7:mainfrom
Conversation
DDL statements (CREATE TABLE, ALTER TABLE) and CAST expressions now translate Snowflake-native type names to DuckDB equivalents, enabling real Snowflake DDL to work against the emulator without modification. Supports 17 type mappings including NUMBER→NUMERIC, TEXT/STRING→VARCHAR, TIMESTAMP_NTZ→TIMESTAMP, VARIANT/OBJECT/ARRAY→JSON, and more. DDL uses word-boundary-aware replacement with string literal protection; DML targets only convert() expressions to avoid false positives on column names that match type names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vitess-sqlparser adds MySQL-style backticks around reserved words when serializing the AST back to SQL (e.g., INFORMATION_SCHEMA.`tables`). DuckDB rejects backtick-quoted identifiers, causing all system table queries to fail. Strip backticks during post-processing to fix this. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Some Snowflake drivers (e.g. the .NET Snowflake.Data driver) expect rowtype and rowset fields to always be present in query responses, even for DDL/DML statements that return no rows. When these fields were omitted (via omitempty), drivers failed to construct a result set, causing null reference errors during ExecuteNonQuery calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Snowflake driver SDK expects specific wire protocol type names (FIXED, REAL, TEXT, etc.) in the rowtype metadata, not SQL-level type names (NUMBER, FLOAT). The driver parses these into an SFDataType enum and throws "Unknown column type" for unrecognized names. Changed type mapper: BIGINT/INTEGER/DECIMAL→FIXED, DOUBLE/FLOAT→REAL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c6cc2c5 to
f39319a
Compare
Intercept CREATE DATABASE, CREATE DATABASE IF NOT EXISTS, and CREATE OR REPLACE DATABASE in the executor and route through the metadata repository instead of passing to DuckDB (which doesn't support CREATE DATABASE syntax). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ement SQL When CREATE DATABASE is the first statement in a multi-statement batch (e.g., creation scripts with CREATE SCHEMA and CREATE TABLE following), the remaining statements are now passed through to DuckDB for execution instead of being silently dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6a2cc12 to
66d2acc
Compare
Author
|
@nnnkkk7 I ended up putting a few different things in this PR based on the needs of my current integration testing suite:
Claude generated all of this code for me. It looked reasonable to me but I'm also not very proficient in Go so I'm happy to make any changes you think are necessary. |
Replace goto with boolean flag in executeCreateDatabase, use FindStringSubmatchIndex for precise match positioning, scope backtick removal to only unwrap identifier quoting (preserving backticks in string literals), cache Classifier on Executor, and define type mappings in length-descending order without runtime sort. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CREATE DATABASE in the emulator only created a DuckDB schema (not a separate database file), so it didn't provide real database isolation. Scripts using CREATE OR REPLACE DATABASE followed by CREATE SCHEMA would fail on re-run because the schema created in the remaining SQL was never cleaned up by the drop. Users should use CREATE OR REPLACE SCHEMA directly instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The translator was skipping type translation for SQL batches starting with DROP, TRUNCATE, SHOW, etc. When a multi-statement batch starts with DROP SCHEMA and contains CREATE TABLE statements with Snowflake types (NUMBER, TEXT, TIMESTAMP_NTZ), those types were passed through untranslated to DuckDB, causing "Type with name NUMBER does not exist" errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Snowflake .NET driver sends named bindings (e.g., :p0, :p1, :foo) rather than numeric positional bindings (:1, :2). Use regex word-boundary matching to replace bind placeholders, which prevents partial matches like :p1 corrupting :p10 or :foo corrupting :foo_bar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The gosnowflake wire protocol query handler was ignoring parameter
bindings sent by clients. The .NET Snowflake driver sends named
bindings (e.g., :OperationId) in the request body, but the query
handler was not passing them through to the executor.
Added bindings parameter to ExecuteWithHistory and QueryWithHistory
so both history tracking and binding substitution happen consistently.
Updated QueryRequest.Bindings type from map[string]interface{} to
map[string]*BindingValue to properly parse structured binding values.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Snowflake .NET driver sends null binding values (JSON null) and timestamps as nanoseconds since epoch (e.g., 1704067200000000000). Changed BindingValue.Value to *string to distinguish null from empty string, and added epoch nanosecond parsing for TIMESTAMP_NTZ bindings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoid recompiling the static \? regex on every call to replaceQuestionMarkPlaceholders. Follows the idiomatic Go pattern of precompiling regexes at package level. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…esponse format Update statement type IDs to match Snowflake's actual wire protocol values (e.g. SELECT=0x1000, INSERT=0x3100, UPDATE=0x3200, DELETE=0x3300) and return affected row counts in the rowset data for DML statements. The Snowflake .NET driver reads row counts from rowset, not the total field, so EF Core was seeing 0 affected rows and rolling back transactions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Bindings Sort binding keys by length descending and use byte-level word-boundary checks instead of compiling a new regex for every binding key on each call. Also add comment explaining ROLLBACK's intentional fallthrough to generic DML. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
I reverted the change to support CREATE DATABASE commands after I realized that this emulator doesn't really support multiple databases (it's translating database creations to creating new schemas). I also added several more changes to get the emulator working with the .NET snowflake driver. I think these are all changes to better emulate realistic snowflake responses. I don't believe I need any other changes to get my integration test suites working. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CREATE TABLE,ALTER TABLE) and CAST expressions, enabling real Snowflake DDL to work against the emulatorINFORMATION_SCHEMAqueries by stripping backticks added by vitess-sqlparserrowtype/rowsetarrays for .NET driver compatibilityFIXED,REAL) instead of SQL-level names (NUMBER,FLOAT) in column metadataDetails
Data type translation (17 mappings)
NUMBERNUMERICTEXT,STRING,CHAR,CHARACTERVARCHARTIMESTAMP_NTZTIMESTAMPTIMESTAMP_LTZ,TIMESTAMP_TZTIMESTAMPTZVARIANT,OBJECT,ARRAYJSONBINARY,VARBINARYBLOBFLOAT4FLOATFLOAT8DOUBLEBYTEINTTINYINTDATETIMETIMESTAMPDDL uses word-boundary-aware string replacement with string literal protection. DML targets only
convert()expressions to avoid false positives on column names matching type names.INFORMATION_SCHEMA fix
vitess-sqlparser backtick-quotes MySQL reserved words (e.g.,
INFORMATION_SCHEMA.`tables`). DuckDB rejects backtick quoting, so they are stripped during post-processing.Driver compatibility fixes
Snowflake.Datadriver) expectrowtypeandrowsetfields in every query response. Removedomitemptyfrom JSON tags and initialize as empty arrays for DDL/DML.FIXEDfor numeric types andREALfor floating-point, notNUMBER/FLOAT. Updated the type mapper to match.Test plan
go test ./...)Closes #45
🤖 Generated with Claude Code