This repository contains cleaned AdventureWorks CSV exports plus the SQL needed to recreate and load both the OLTP and DW schemas into Snowflake.
snowflake/
oltp/
create_tables_snowflake.sql # Snowflake DDL for the OLTP schema
local-ingestion.sql # PUT/COPY script for local files
ingestion.sql # Same as above, but under a distinct stage name
data/ # Cleaned OLTP CSV files (tab-delimited)
dw/
create_tables_snowflake.sql
local-ingestion.sql
ingestion.sql
data/ # Cleaned DW CSV files (pipe-delimited)
Both datasets were exported from SQL Server using AdventureWorks’ legacy delimiters. They have already been normalized inside this repo:
snowflake/oltp/data: tab-delimited (\t), Windows newlines (\r\n). Binary columns (e.g.,SPATIALLOCATION) remain as hex strings, XML columns are raw text.snowflake/dw/data: pipe-delimited (|), Windows newlines. UTF-8 with quoted fields preserved when needed.
If you ever re-export the CSVs from the source scripts (AdventureWorks-oltp-install-script or AdventureWorksDW-data-warehouse-install-script), run the normalization steps before loading:
- For OLTP files, replace every
+|with a tab and strip the trailing&|. - For DW files, replace
+|with|and strip the trailing&|.
Each schema has its own DDL file:
snowflake/oltp/create_tables_snowflake.sqlsnowflake/dw/create_tables_snowflake.sql
They create the database (if needed), switch to the correct schema, and create tables with plain Snowflake data types (no FK/PK constraints).
Use SnowSQL (or any client that supports PUT and COPY INTO):
!source snowflake/oltp/local-ingestion.sql
-- or
!source snowflake/dw/local-ingestion.sqlThe scripts do the following:
USE DATABASE <db>; USE SCHEMA <schema>;CREATE OR REPLACE STAGE …CREATE OR REPLACE FILE FORMAT …PUT file://…/*.csv @<stage> AUTO_COMPRESS = TRUE OVERWRITE = TRUE;COPY INTO <table> FROM @<stage>/<File>.csv.gz FILE_FORMAT = (FORMAT_NAME = …);
Key file-format details:
- OLTP:
FIELD_DELIMITER = X'09',RECORD_DELIMITER = '\r\n' - DW:
FIELD_DELIMITER = '|',RECORD_DELIMITER = '\r\n',ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
Adjust PUT paths if you move the repo.
-
“Number of columns … does not match …”
The DW format setsERROR_ON_COLUMN_COUNT_MISMATCH = FALSEto handle rows whose trailing nullable columns are missing. Add the same to any ad-hoc file format you create. -
“Error parsing Geo input / JSON”
Some OLTP exports (e.g.,Store,Address) store XML or binary data. The DDL already defines those columns asVARCHAR/BINARY, so no extra parsing is required. .