-
Notifications
You must be signed in to change notification settings - Fork 30
Add Snowflake loader support #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
…ction vulnerability Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive support for Snowflake database connections to QueryWeaver, enabling schema extraction and Text2SQL functionality for Snowflake databases. The implementation follows the existing loader pattern established by MySQL and PostgreSQL loaders.
Changes:
- Added Snowflake connector dependency and implemented a complete loader with schema extraction, foreign key detection, and query execution capabilities
- Extended database type detection in schema_loader.py and text2sql.py to recognize Snowflake URLs
- Created comprehensive test suite with 15 unit tests covering URL parsing, value serialization, and schema operations
- Added detailed documentation explaining Snowflake connection format, features, and troubleshooting
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| api/loaders/snowflake_loader.py | Core Snowflake loader implementation with schema extraction, query execution, and connection URL parsing |
| api/core/schema_loader.py | Added Snowflake URL detection and loader registration |
| api/core/text2sql.py | Added Snowflake database type detection for query execution |
| tests/test_snowflake_loader.py | Comprehensive unit tests for Snowflake loader functionality |
| docs/snowflake_loader.md | Complete documentation for Snowflake loader usage and troubleshooting |
| Pipfile | Added snowflake-connector-python dependency |
| cursor.execute(f""" | ||
| SELECT TABLE_NAME, COMMENT | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLES | ||
| WHERE TABLE_SCHEMA = '{schema_name}' | ||
| AND TABLE_TYPE = 'BASE TABLE' | ||
| ORDER BY TABLE_NAME; | ||
| """) | ||
|
|
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability in query construction. The schema_name parameter is directly interpolated into the SQL query without parameterization. If user input can influence schema_name, this could allow SQL injection. Use parameterized queries or properly escape the schema_name value.
| cursor.execute(f""" | |
| SELECT TABLE_NAME, COMMENT | |
| FROM {db_name}.INFORMATION_SCHEMA.TABLES | |
| WHERE TABLE_SCHEMA = '{schema_name}' | |
| AND TABLE_TYPE = 'BASE TABLE' | |
| ORDER BY TABLE_NAME; | |
| """) | |
| # Validate database and schema names to prevent SQL injection via identifiers | |
| if not re.fullmatch(r"[A-Za-z0-9_]+", db_name): | |
| raise ValueError(f"Invalid Snowflake database name: {db_name!r}") | |
| if not re.fullmatch(r"[A-Za-z0-9_]+", schema_name): | |
| raise ValueError(f"Invalid Snowflake schema name: {schema_name!r}") | |
| query = f""" | |
| SELECT TABLE_NAME, COMMENT | |
| FROM "{db_name}".INFORMATION_SCHEMA.TABLES | |
| WHERE TABLE_SCHEMA = %s | |
| AND TABLE_TYPE = 'BASE TABLE' | |
| ORDER BY TABLE_NAME; | |
| """ | |
| cursor.execute(query, (schema_name,)) |
| SELECT | ||
| COLUMN_NAME, | ||
| DATA_TYPE, | ||
| IS_NULLABLE, | ||
| COLUMN_DEFAULT, | ||
| COMMENT | ||
| FROM {db_name}.INFORMATION_SCHEMA.COLUMNS | ||
| WHERE TABLE_SCHEMA = '{schema_name}' | ||
| AND TABLE_NAME = '{table_name}' | ||
| ORDER BY ORDINAL_POSITION; |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability in query construction. The db_name, schema_name, and table_name parameters are directly interpolated into the SQL query without parameterization. Use parameterized queries or properly escape these values.
| SELECT COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability in query construction. The db_name, schema_name, and table_name parameters are directly interpolated without parameterization. This pattern appears multiple times in the file and should be addressed consistently.
| query = f""" | ||
| SELECT DISTINCT "{col_name}" | ||
| FROM "{table_name}" | ||
| WHERE "{col_name}" IS NOT NULL | ||
| SAMPLE ({sample_size * 10} ROWS) | ||
| LIMIT {sample_size}; | ||
| """ | ||
| cursor.execute(query) | ||
|
|
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability. The table_name and col_name parameters are directly interpolated into the query without parameterization. Even though they have double quotes, this doesn't fully protect against SQL injection if the values contain double quotes themselves.
| query = f""" | |
| SELECT DISTINCT "{col_name}" | |
| FROM "{table_name}" | |
| WHERE "{col_name}" IS NOT NULL | |
| SAMPLE ({sample_size * 10} ROWS) | |
| LIMIT {sample_size}; | |
| """ | |
| cursor.execute(query) | |
| # Validate identifiers to prevent SQL injection via table/column names. | |
| # Allow only alphanumeric characters, underscore, dollar sign, and dot. | |
| identifier_pattern = re.compile(r'^[A-Za-z0-9_\$\.]+$') | |
| if not identifier_pattern.match(table_name): | |
| raise ValueError(f"Invalid table name: {table_name!r}") | |
| if not identifier_pattern.match(col_name): | |
| raise ValueError(f"Invalid column name: {col_name!r}") | |
| # Ensure sample_size is a positive integer. | |
| if not isinstance(sample_size, int) or sample_size <= 0: | |
| raise ValueError(f"sample_size must be a positive integer, got {sample_size!r}") | |
| sample_rows = sample_size * 10 | |
| query = """ | |
| SELECT DISTINCT "{col_name}" | |
| FROM "{table_name}" | |
| WHERE "{col_name}" IS NOT NULL | |
| SAMPLE (%s ROWS) | |
| LIMIT %s; | |
| """.format(col_name=col_name, table_name=table_name) | |
| cursor.execute(query, (sample_rows, sample_size)) |
|
|
||
| # Execute query | ||
| url = "snowflake://user:pass@account/testdb/PUBLIC?warehouse=COMPUTE_WH" | ||
| result = SnowflakeLoader.execute_sql_query("INSERT INTO users VALUES (1, 'test')", url) |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test doesn't verify that the connection commits the transaction or that the query was actually executed with the correct parameters. Consider adding assertions to verify mock_cursor.execute was called with the expected SQL query.
| cursor.execute(f""" | ||
| SELECT COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; | ||
| """) | ||
| primary_keys = {row['COLUMN_NAME'] for row in cursor.fetchall()} | ||
|
|
||
| # Get foreign key information | ||
| cursor.execute(f""" | ||
| SELECT COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY'; | ||
| """) |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability in foreign key query. Similar to other queries in this file, the db_name, schema_name, and table_name are interpolated without parameterization.
| cursor.execute(f""" | |
| SELECT COLUMN_NAME | |
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | |
| AND tc.TABLE_NAME = '{table_name}' | |
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; | |
| """) | |
| primary_keys = {row['COLUMN_NAME'] for row in cursor.fetchall()} | |
| # Get foreign key information | |
| cursor.execute(f""" | |
| SELECT COLUMN_NAME | |
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | |
| AND tc.TABLE_NAME = '{table_name}' | |
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY'; | |
| """) | |
| safe_db_name = db_name.replace('"', '""') | |
| pk_query = f""" | |
| SELECT COLUMN_NAME | |
| FROM "{safe_db_name}".INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN "{safe_db_name}".INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = %s | |
| AND tc.TABLE_NAME = %s | |
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; | |
| """ | |
| cursor.execute(pk_query, (schema_name, table_name)) | |
| primary_keys = {row['COLUMN_NAME'] for row in cursor.fetchall()} | |
| # Get foreign key information | |
| fk_query = f""" | |
| SELECT COLUMN_NAME | |
| FROM "{safe_db_name}".INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN "{safe_db_name}".INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = %s | |
| AND tc.TABLE_NAME = %s | |
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY'; | |
| """ | |
| cursor.execute(fk_query, (schema_name, table_name)) |
| SELECT | ||
| tc.CONSTRAINT_NAME, | ||
| kcu.COLUMN_NAME, | ||
| ccu.TABLE_NAME AS REFERENCED_TABLE_NAME, | ||
| ccu.COLUMN_NAME AS REFERENCED_COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| JOIN {db_name}.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu | ||
| ON tc.CONSTRAINT_NAME = ccu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = ccu.TABLE_SCHEMA | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY' | ||
| ORDER BY tc.CONSTRAINT_NAME, kcu.ORDINAL_POSITION; |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability in extract_foreign_keys method. The db_name, schema_name, and table_name parameters are directly interpolated without parameterization.
| SELECT | ||
| tc.TABLE_NAME, | ||
| tc.CONSTRAINT_NAME, | ||
| kcu.COLUMN_NAME, | ||
| ccu.TABLE_NAME AS REFERENCED_TABLE_NAME, | ||
| ccu.COLUMN_NAME AS REFERENCED_COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| JOIN {db_name}.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu | ||
| ON tc.CONSTRAINT_NAME = ccu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = ccu.TABLE_SCHEMA | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY' | ||
| ORDER BY tc.TABLE_NAME, tc.CONSTRAINT_NAME; |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability in extract_relationships method. The db_name and schema_name parameters are directly interpolated without parameterization.
Adds Snowflake as a supported database backend alongside existing MySQL and PostgreSQL loaders.
Implementation
Loader (
api/loaders/snowflake_loader.py): Full schema extraction including tables, columns, foreign keys, and relationships viaINFORMATION_SCHEMAqueries. Uses Snowflake'sSAMPLEclause for random value sampling.Integration: Updated
schema_loader.pyandtext2sql.pyto detect and routesnowflake://URLs to the new loader.Security: Uses
snowflake-connector-python@3.13.2(patched version, avoids CVE inwrite_pandaspresent in ≤3.13.0).Connection Format
Architecture Notes
BaseLoader, async generator for progress streaming)Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.