-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Docs snowflake 4 detailed migration guide #69728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ian-at-airbyte
wants to merge
6
commits into
master
Choose a base branch
from
docs-snowflake-4-detailed-migration-guide
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+119
−5
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
d83fda1
Add boilerplate migration guide
ian-at-airbyte 222dd69
Outline new process
ian-at-airbyte 1c67397
docs: expand Snowflake v4 migration guide with detailed instructions …
devin-ai-integration[bot] 96825e5
A few more edits
ian-at-airbyte 9607f34
Merge branch 'master' into docs-snowflake-4-detailed-migration-guide
ian-at-airbyte 6cf894e
Minor fix
ian-at-airbyte File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,16 +1,126 @@ | ||
| import MigrationGuide from '@site/static/_migration_guides_upgrade_guide.md'; | ||
|
|
||
| # Snowflake Migration Guide | ||
|
|
||
| ## Upgrading to 4.0.0 | ||
|
|
||
| This version upgrades Destination Snowflake to the [Direct-Load](/platform/using-airbyte/core-concepts/direct-load-tables) paradigm, which improves performance and reduces warehouse spend. If you have unusual requirements around record visibility or schema evolution, read that document for more information about how direct-load differs from Typing and Deduping. | ||
| This version upgrades the Snowflake destination from using [typing and deduping](/platform/using-airbyte/core-concepts/typing-deduping) to [direct loading](/platform/using-airbyte/core-concepts/direct-load-tables). This upgrade improves performance and reduces warehouse spend. If you have unusual requirements around record visibility or schema evolution, read the documentation for those methodologies for more information about how direct loading differs from typing and deduping. | ||
ian-at-airbyte marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| This version also adds an option to enable CDC deletions as soft deletes. | ||
|
|
||
| ### Decide how to handle raw tables | ||
|
|
||
| The exact steps to begin the migration depend on whether you interact with Airbyte's raw tables in Snowflake. | ||
|
|
||
| #### If you don't interact with raw tables | ||
|
|
||
| If you don't interact with the raw tables, you can safely upgrade. There's no breaking change for you. | ||
|
|
||
| #### If you only interact with raw tables | ||
|
|
||
| If you _only_ interact with the raw tables, enable the `Disable Final Tables` option before upgrading. This enables the `Legacy raw tables` option after upgrading. | ||
|
|
||
| 1. In the navigation bar, click **Destinations**. | ||
|
|
||
| 2. Open your **Snowflake** connector. | ||
|
|
||
| 3. Open the **Optional fields** section. | ||
|
|
||
| 4. Turn on **Disable Final Tables**. | ||
|
|
||
| 5. Click **Test and save**. | ||
|
|
||
| :::note | ||
| After upgrading to version 4, this setting appears as **Legacy raw tables** and remains enabled. | ||
| ::: | ||
|
|
||
| #### If you interact with both raw and final tables | ||
|
|
||
| If you interact with both the raw _and_ final tables, this use case is no longer supported. Instead, create two connectors. One with `Disable Final Tables` turned on, and one with it turned off. Starting now, you must run the two connections in parallel. | ||
|
|
||
| 1. In the navigation bar, click **Destinations**. | ||
|
|
||
| 2. For each Snowflake destination you have, add an identical second Snowflake destination. | ||
|
|
||
| 3. Ensure each pair of Snowflake connectors have opposite settings for **Disable Final Tables**. One connector should have this setting turned on, and the other should have it turned off. | ||
|
|
||
| 4. Configure distinct default schemas for each destination to avoid table name collisions: | ||
|
|
||
| - For the destination that creates final tables, set a distinct **Schema** in the Snowflake destination configuration. For example, `ANALYTICS_FINAL_TABLES`. This is where Airbyte writes final tables. | ||
|
|
||
| - For the raw-only destination (with **Disable Final Tables** turned on), set a distinct **Airbyte Internal Table Dataset Name** under the **Advanced** section (for example, `AIRBYTE_INTERNAL_RAW`). This is where Airbyte writes raw tables. | ||
|
|
||
| - Example configuration: | ||
|
|
||
| - Destination A (final tables): Schema = `ANALYTICS_FINAL_TABLES`, Airbyte Internal Table Dataset Name = `AIRBYTE_INTERNAL` | ||
|
|
||
| - Destination B (raw tables): Airbyte Internal Table Dataset Name = `AIRBYTE_INTERNAL_RAW` | ||
|
|
||
| - Using distinct schemas prevents table name collisions when running both destinations in parallel. | ||
|
|
||
| This version also adds an option to enable CDC deletions as soft-deletes. | ||
| 5. Recreate your connections to point to the appropriate destination. | ||
|
|
||
| If you do not interact with the raw tables, you can safely upgrade. There is no breakage for this usecase. | ||
| - Connections that need raw tables only should target the destination with **Disable Final Tables** turned on. | ||
|
|
||
| If you _only_ interact with the raw tables, make sure that you have the `Disable Final Tables` option enabled before upgrading. This will automatically enable the `Legacy raw tables` option after upgrading. | ||
| - Connections that need final tables should target the destination with this setting turned off. | ||
|
|
||
| If you interact with both the raw _and_ final tables, this usecase will no longer be directly supported. You must create two connectors (one with `Disable Final Tables` enabled, and one with it disabled) and run two connections in parallel. | ||
| 6. Run test syncs on both destinations to verify outputs. | ||
|
|
||
| - The raw tables destination should write only to the internal schema. Default: `airbyte_internal`. | ||
|
|
||
| - The standard destination should write only final tables to the target schema. | ||
|
|
||
| 7. After verifying that both destinations work correctly, continue running both connections in parallel going forward. | ||
|
|
||
| ### Do the upgrade | ||
|
|
||
| Follow the standard [connector upgrade steps](#how-to-upgrade) shown below. | ||
|
|
||
| ### Optional: remove legacy raw tables | ||
|
|
||
| The version 4 connector doesn't automatically remove tables created by earlier versions. After upgrading to version 4 and verifying your data, you can optionally remove the old raw tables. | ||
|
|
||
| You can find the raw tables in the schema configured as **Airbyte Internal Table Dataset Name**. This defaults to `airbyte_internal`. If you customized this setting, look in that schema instead. | ||
|
|
||
| The table names match these patterns depending on which version created them: | ||
|
|
||
| - **Before version 4:**: `raw_{namespace}__{stream}` (for example, `airbyte_internal.raw_public__users`) | ||
|
|
||
| - **Version 4 with legacy raw tables mode**: `{namespace}_raw__stream_{stream}` (for example, `airbyte_internal.public_raw__stream__users`) | ||
|
|
||
| The number of underscores between `raw` and `stream` may vary depending on the longest underscore sequence in your namespace and stream names. | ||
|
|
||
| :::note | ||
| Version 4 of the Snowflake destination uses the `airbyte_internal` schema for temporary scratch space. For example, Airbyte needs this for streams running in dedup mode, truncate refreshes, and overwrite syncs. Dropping the entire `airbyte_internal` schema can interrupt active syncs and cause data loss. Only drop the specific raw tables you no longer need. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚫 [vale] reported by reviewdog 🐶 |
||
| ::: | ||
|
|
||
| To remove the old raw tables: | ||
|
|
||
| 1. **Pause or allow active syncs to complete** before dropping any tables to avoid interrupting data transfers. | ||
|
|
||
| 2. **List candidate raw tables** to identify which tables to remove: | ||
|
|
||
| ```sql | ||
| -- For Version 2/3 raw tables: | ||
| SHOW TABLES IN SCHEMA <DATABASE>.<INTERNAL_SCHEMA> LIKE 'RAW\_%'; | ||
|
|
||
| -- For Version 4 legacy raw tables: | ||
| SHOW TABLES IN SCHEMA <DATABASE>.<INTERNAL_SCHEMA> LIKE '%RAW%STREAM%'; | ||
| ``` | ||
|
|
||
| Replace `<DATABASE>` with your Snowflake database name and `<INTERNAL_SCHEMA>` with your internal schema name (default `airbyte_internal`). | ||
|
|
||
| 3. **Drop specific raw tables** you no longer need: | ||
|
|
||
| ```sql | ||
| DROP TABLE IF EXISTS <DATABASE>.<INTERNAL_SCHEMA>.<TABLE_NAME>; | ||
| ``` | ||
|
|
||
| Replace `<TABLE_NAME>` with the specific table name you want to remove. Use fully qualified names (database.schema.table) to avoid ambiguity. | ||
|
|
||
| ### Update downstream pipelines | ||
|
|
||
| If you have downstream apps and resources that interact with raw tables, update them to reference any new schema and table names. | ||
|
|
||
| ## Upgrading to 3.0.0 | ||
|
|
||
|
|
@@ -28,3 +138,7 @@ Learn more about what's new in Destinations V2 [here](/platform/using-airbyte/co | |
| ## Upgrading to 2.0.0 | ||
|
|
||
| Snowflake no longer supports GCS/S3. Please migrate to the Internal Staging option. This is recommended by Snowflake and is cheaper and faster. | ||
|
|
||
| ## How to upgrade {#how-to-upgrade} | ||
|
|
||
| <MigrationGuide /> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.