Skip to content

docs: Add Snowflake source connector documentation and align behavior details#986

Open
Copilot wants to merge 7 commits intomainfrom
copilot/document-snowflake-source-support
Open

docs: Add Snowflake source connector documentation and align behavior details#986
Copilot wants to merge 7 commits intomainfrom
copilot/document-snowflake-source-support

Conversation

Copy link
Contributor

Copilot AI commented Feb 13, 2026

Description

Documents the Snowflake source connector (adbc_snowflake) introduced in risingwavelabs/risingwave#23988, and aligns the docs with merged implementation behavior after review feedback.

Documentation added and updated

  • Core page: /ingestion/sources/snowflake.mdx

    • ADBC connector syntax with clearer placeholder naming (source_table_in_snowflake)
    • Explicit primary key requirement wording for Snowflake tables in RisingWave
    • Authentication methods: username/password, OAuth, JWT, and other supported auth types
    • Added OAuth token parameter (adbc_snowflake.auth_token) in both parameter table and OAuth example
    • Added JWT key options:
      • adbc_snowflake.jwt_private_key_path
      • adbc_snowflake.jwt_private_key_pkcs8_value
      • adbc_snowflake.jwt_private_key_pkcs8_password
    • Clarified JWT requirement note (exactly one key source for auth_jwt)
    • Clarified refresh_interval_sec as optional and documented manual refresh with REFRESH TABLE when omitted
    • Softened snapshot consistency wording to reflect fallback behavior when time travel is unavailable
    • Data type mappings (Snowflake → RisingWave)
    • End-to-end examples and validation query
  • Navigation and overview updates:

    • Added to docs.json under "Database tables (non-CDC)"
    • Updated /ingestion/overview.mdx support matrix:
      • Snowflake marked as supporting one-time batch + periodic refresh
  • Cross-links:

    • Bidirectional link with Snowflake sink page
    • Reference added to Snowflake-Iceberg integration page
    • Replaced unrelated “Data formats and encoding options” link with REFRESH TABLE in “What’s next?”

Example usage

CREATE TABLE snowflake_orders (
    primary key ("order_id")
) WITH (
    connector = 'adbc_snowflake',
    refresh_mode = 'FULL_RELOAD',
    refresh_interval_sec = '3600',
    adbc_snowflake.account = 'myaccount.us-east-1',
    adbc_snowflake.username = 'analytics_user',
    adbc_snowflake.password = 'secure_password',
    adbc_snowflake.database = 'PRODUCTION',
    adbc_snowflake.schema = 'SALES',
    adbc_snowflake.warehouse = 'ANALYTICS_WH',
    adbc_snowflake.table = 'ORDERS'
);

Screenshot

Updated screenshot: https://github.com/user-attachments/assets/74cb6f23-35df-49cb-bf5c-f1d32b13dc7b

Related code PR

risingwavelabs/risingwave#23988

Related doc issue

https://github.com/risingwavelabs/risingwave-docs/issues/[issue_number]

Checklist

  • I have run the documentation build locally to verify the updates are applied correctly.
  • For new pages, I have updated mint.json to include the page in the table of contents.
  • All links and references have been checked and are not broken.
Original prompt

This section details on the original issue you should resolve

<issue_title>Document Snowflake source support</issue_title>
<issue_description>## Context
Slack thread: Request to document this feature

A docs request was raised for the Snowflake source feature added in risingwave#23988.

Problem

Snowflake source support is available in RisingWave, but there is no dedicated documentation explaining how to set it up and use it. This creates a gap for users who need source-side guidance (as opposed to sink-only workflows), including required configuration, setup steps, and known constraints.

Suggested Fix

  • Add a dedicated Snowflake Source documentation page under integrations/sources.
  • Document prerequisites and authentication/connection requirements.
  • Include required connector properties and configuration examples.
  • Provide an end-to-end SQL walkthrough (for example: CREATE CONNECTION and CREATE SOURCE) plus a quick validation query.
  • Clarify current limitations, compatibility notes, and feature availability/licensing expectations.
  • Add cross-links from related Snowflake and connector overview pages to improve discoverability.
Existing Documentation

Related docs PRs:

Codebase References

Related PRs in risingwave:

Source files:

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits February 13, 2026 20:00
Co-authored-by: kwannoel <47273164+kwannoel@users.noreply.github.com>
Co-authored-by: kwannoel <47273164+kwannoel@users.noreply.github.com>
Co-authored-by: kwannoel <47273164+kwannoel@users.noreply.github.com>
Copilot AI changed the title [WIP] Add documentation for Snowflake source support docs: Add Snowflake source connector documentation Feb 13, 2026
Copilot AI requested a review from kwannoel February 13, 2026 20:04
@yuhao-su yuhao-su requested a review from tabVersion February 13, 2026 20:10
Copy link
Collaborator

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM


## Data type mapping

The following table shows the corresponding data types between Snowflake and RisingWave. For details on native RisingWave data types, see [Overview of data types](/sql/data-types/overview).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot you need to handle the doc ref well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what needs to be changed with the doc ref on line 130? The current reference [Overview of data types](/sql/data-types/overview) follows the same pattern used in other source/sink documentation (e.g., Snowflake sink line 57, SQL Server CDC line 220). Should I use a different link format or reference a different page?

Copilot AI requested a review from tabVersion February 17, 2026 11:58
@tabVersion tabVersion marked this pull request as ready for review February 20, 2026 09:44
Copilot AI review requested due to automatic review settings February 20, 2026 09:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive documentation for the Snowflake source connector introduced in risingwavelabs/risingwave#23988. The connector enables periodic batch ingestion from Snowflake tables using the ADBC (Arrow Database Connectivity) protocol with automatic schema inference and FULL_RELOAD refresh mode.

Changes:

  • New documentation page for Snowflake source connector with syntax, authentication methods, data type mappings, and examples
  • Updated navigation and support matrix in the ingestion overview
  • Added bidirectional cross-links between Snowflake source and sink documentation
  • Integrated references in the Snowflake-Iceberg documentation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ingestion/sources/snowflake.mdx Complete documentation for the ADBC Snowflake source connector including prerequisites, connection syntax, authentication methods (username/password, JWT, OAuth), parameter descriptions, data type mappings, and usage examples
docs.json Added Snowflake source to the "Database tables (non-CDC)" navigation group
ingestion/overview.mdx Added Snowflake to the sources list with "Latest" badge and added entry to the ingestion method support matrix
integrations/destinations/snowflake.mdx Added cross-reference note linking to the new Snowflake source documentation
iceberg/integ-snowflake.mdx Added Note section with bidirectional links to both Snowflake source and sink documentation

adbc_snowflake.auth_type = 'auth_oauth'
);
```

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OAuth authentication example sets adbc_snowflake.auth_type = 'auth_oauth' but doesn't include any OAuth-specific parameters such as an OAuth token or token endpoint. OAuth authentication typically requires additional parameters beyond just setting the auth type. The example should either include the required OAuth parameters or add a note explaining what additional configuration is needed for OAuth to work properly.

Suggested change
Note: This example only shows how to set the authentication type to OAuth. You must also configure the required OAuth credentials and related settings for the ADBC Snowflake connector (for example, via its configuration or environment) for authentication to work correctly.

Copilot uses AI. Check for mistakes.
adbc_snowflake.database = 'database_name',
adbc_snowflake.schema = 'schema_name',
adbc_snowflake.warehouse = 'warehouse_name',
adbc_snowflake.table = 'table_name'
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the syntax example, the parameter adbc_snowflake.table is set to 'table_name', which creates ambiguity since table_name is also used as the CREATE TABLE name on line 25. Consider using a different placeholder like 'source_table_name' or 'snowflake_table_name' to make it clear that these refer to different tables (the Snowflake source table vs. the RisingWave table being created).

Suggested change
adbc_snowflake.table = 'table_name'
adbc_snowflake.table = 'source_table_name'

Copilot uses AI. Check for mistakes.
| **Google Cloud Storage** | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| **Azure Blob** | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| **Apache Iceberg** | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools |
| **Snowflake** | ❌ | ❌ | ✅ | Periodic refresh with `refresh_interval_sec` |
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The support matrix shows "❌" for One-Time Batch ingestion for Snowflake, but the connector supports manual refresh via REFRESH TABLE command (similar to Iceberg). According to the documentation, if refresh_interval_sec is omitted, the table will only refresh when manually triggered. This is effectively one-time batch ingestion. Consider marking this as "✅" to be consistent with Apache Iceberg (line 288) which also supports FULL_RELOAD mode with manual refresh.

Suggested change
| **Snowflake** || || Periodic refresh with `refresh_interval_sec` |
| **Snowflake** || || Manual `REFRESH TABLE` when `refresh_interval_sec` is omitted; periodic when set |

Copilot uses AI. Check for mistakes.
| :--------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| connector | Must be `adbc_snowflake`. |
| refresh_mode | Must be `FULL_RELOAD`. The entire table is re-read on each refresh. |
| refresh_interval_sec | The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. |
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter description for refresh_interval_sec states it's required ("All parameters are required unless specified otherwise"), but this conflicts with the behavior described in the Limitations section and similar connectors like Iceberg, where omitting this parameter means manual refresh only. The description should clarify that this parameter is optional, and if omitted, the table will only refresh when manually triggered via REFRESH TABLE.

Suggested change
| refresh_interval_sec | The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. |
| refresh_interval_sec | **Optional.** The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. If omitted, the table is not refreshed automatically and only updates when you run `REFRESH TABLE`. |

Copilot uses AI. Check for mistakes.
| refresh_interval_sec | The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. |
| adbc_snowflake.account | The Snowflake account identifier (e.g., `myaccount.us-east-1` or `myaccount`). |
| adbc_snowflake.username | The Snowflake username for authentication. |
| adbc_snowflake.password | **Optional**. The password for username/password authentication. Required if using the default `auth_snowflake` authentication type. |
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter description states that adbc_snowflake.password is "Required if using the default auth_snowflake authentication type", but the default authentication type is described as auth_snowflake on line 63. There's an inconsistency in naming - the description uses "auth_snowflake" but should verify this matches the actual implementation. Based on the parameter name pattern and auth type examples, this should likely be just the default (username/password) rather than specifically named "auth_snowflake".

Suggested change
| adbc_snowflake.password | **Optional**. The password for username/password authentication. Required if using the default `auth_snowflake` authentication type. |
| adbc_snowflake.password | **Optional**. The password for username/password authentication. Required when using username/password authentication (the default auth type). |

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +17
## Prerequisites

* A Snowflake account with access to the database and tables you want to ingest.
* The Snowflake account identifier (e.g., `myaccount.us-east-1`).
* Valid authentication credentials (username/password, OAuth token, JWT private key, etc.).
* Network access from RisingWave to your Snowflake instance.

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other new features in the documentation (e.g., MySQL table source has "Added in v2.2.0"), this Snowflake source connector should include a version note indicating when it was added to RisingWave. This helps users understand feature availability across different RisingWave versions. The note should be added after the introductory description, following the pattern seen in other connector documentation.

Copilot uses AI. Check for mistakes.

* [Sink data to Snowflake](/integrations/destinations/snowflake) - Learn how to write data from RisingWave back to Snowflake
* [Work with Snowflake and Iceberg](/iceberg/integ-snowflake) - Integrate Snowflake with Apache Iceberg catalogs
* [Data formats and encoding options](/ingestion/formats-and-encoding-options) - Understand supported data formats
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "What's next?" section links to "Data formats and encoding options", but the Snowflake ADBC connector doesn't use FORMAT/ENCODE options - it relies on automatic schema inference from the Snowflake table. This link may confuse users as it's not relevant to the Snowflake source connector. Consider replacing this with a more relevant link, such as documentation about refreshable tables (REFRESH TABLE command) or other batch ingestion patterns.

Suggested change
* [Data formats and encoding options](/ingestion/formats-and-encoding-options) - Understand supported data formats

Copilot uses AI. Check for mistakes.
@tabVersion
Copy link
Collaborator

I checked this docs PR against the merged implementation in risingwave#23988.

Overall this is a great addition, but a few points are not fully aligned with actual behavior yet:

  1. Primary key is effectively required for refreshable Snowflake tables

    • Current wording says PK is needed only "if your table requires one".
    • In practice, refreshable tables require a user-defined PK (otherwise bind error).
    • Suggest updating the wording to make PK requirement explicit.
  2. OAuth example is incomplete (auth_token missing)

    • The OAuth example sets adbc_snowflake.auth_type = auth_oauth but does not include adbc_snowflake.auth_token.
    • Please add adbc_snowflake.auth_token to both parameter table and OAuth example.
  3. JWT requirements are too strict in docs

    • Docs currently say adbc_snowflake.jwt_private_key_path is required for auth_jwt.
    • Implementation also supports adbc_snowflake.jwt_private_key_pkcs8_value (+ optional password).
    • Suggest rewording as: one of the supported JWT key options is required.
  4. refresh_interval_sec should not be documented as always required

    • For FULL_RELOAD, the interval can be omitted and users can trigger refresh manually (REFRESH TABLE).
    • Suggest marking refresh_interval_sec as optional for scheduled refresh.
  5. Time travel consistency wording is currently absolute

    • Docs say snapshots are ensured via time travel.
    • Runtime has fallback when snapshot/time-travel query is unavailable (falls back to current data).
    • Suggest softening wording to reflect fallback behavior.

If helpful, I can also propose concrete wording edits for each section.

@tabVersion
Copy link
Collaborator

I checked this docs PR against the merged implementation in risingwave#23988.

Overall this is a great addition, but a few points are not fully aligned with actual behavior yet:

  1. Primary key is effectively required for refreshable Snowflake tables

    • Current wording says PK is needed only "if your table requires one".
    • In practice, refreshable tables require a user-defined PK (otherwise bind error).
    • Suggest updating the wording to make PK requirement explicit.
  2. OAuth example is incomplete (auth_token missing)

    • The OAuth example sets adbc_snowflake.auth_type = auth_oauth but does not include adbc_snowflake.auth_token.
    • Please add adbc_snowflake.auth_token to both parameter table and OAuth example.
  3. JWT requirements are too strict in docs

    • Docs currently say adbc_snowflake.jwt_private_key_path is required for auth_jwt.
    • Implementation also supports adbc_snowflake.jwt_private_key_pkcs8_value (+ optional password).
    • Suggest rewording as: one of the supported JWT key options is required.
  4. refresh_interval_sec should not be documented as always required

    • For FULL_RELOAD, the interval can be omitted and users can trigger refresh manually (REFRESH TABLE).
    • Suggest marking refresh_interval_sec as optional for scheduled refresh.
  5. Time travel consistency wording is currently absolute

    • Docs say snapshots are ensured via time travel.
    • Runtime has fallback when snapshot/time-travel query is unavailable (falls back to current data).
    • Suggest softening wording to reflect fallback behavior.

If helpful, I can also propose concrete wording edits for each section.

@copilot please fix per requested

Copilot AI and others added 2 commits February 26, 2026 09:43
Co-authored-by: tabVersion <14017336+tabVersion@users.noreply.github.com>
Co-authored-by: tabVersion <14017336+tabVersion@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 26, 2026

I checked this docs PR against the merged implementation in risingwave#23988.

Overall this is a great addition, but a few points are not fully aligned with actual behavior yet:
...

Addressed in commit c919d54.

I updated the docs to reflect implementation behavior:

  • Primary key requirement is explicit for Snowflake tables in RisingWave.
  • Added adbc_snowflake.auth_token in the parameter table and OAuth example.
  • Documented JWT key options (jwt_private_key_path or jwt_private_key_pkcs8_value) with optional PKCS8 password.
  • Marked refresh_interval_sec optional and documented manual refresh via REFRESH TABLE when omitted.
  • Softened snapshot consistency wording to include fallback behavior.
  • Updated ingestion matrix to mark Snowflake as supporting one-time batch + periodic refresh.

Screenshot: https://github.com/user-attachments/assets/74cb6f23-35df-49cb-bf5c-f1d32b13dc7b

Copilot AI changed the title docs: Add Snowflake source connector documentation docs: Add Snowflake source connector documentation and align behavior details Feb 26, 2026
@tabVersion
Copy link
Collaborator

Re-review after latest updates: much closer now 👍

I found one remaining behavior mismatch that can mislead users:

  • The docs currently state that when refresh_interval_sec is omitted, data is loaded once at table creation.
    • ingestion/sources/snowflake.mdx (Note under Parameters)
    • ingestion/overview.mdx support matrix note (One-time batch load ...)

In the current implementation, batch refresh loaders run on RefreshStart mutations, i.e. scheduled refresh or manual REFRESH TABLE; there is no implicit initial load at CREATE TABLE.

Code evidence:

  • src/stream/src/executor/source/batch_source/batch_adbc_snowflake_list.rs only starts split generation on Mutation::RefreshStart.
  • Existing refreshable batch test also shows no initial load right after CREATE TABLE (e2e_test/source_inline/refresh/refresh_table.slt: count(*) = 0 before REFRESH TABLE).

Suggested wording:

  • If refresh_interval_sec is omitted, no automatic refresh is scheduled; run REFRESH TABLE to load/refresh data.
  • In the matrix, avoid "One-time batch load" unless it explicitly says manual one-time load via REFRESH TABLE.

@tabVersion
Copy link
Collaborator

Re-review after latest updates: much closer now 👍

I found one remaining behavior mismatch that can mislead users:

  • The docs currently state that when refresh_interval_sec is omitted, data is loaded once at table creation.

    • ingestion/sources/snowflake.mdx (Note under Parameters)
    • ingestion/overview.mdx support matrix note (One-time batch load ...)

In the current implementation, batch refresh loaders run on RefreshStart mutations, i.e. scheduled refresh or manual REFRESH TABLE; there is no implicit initial load at CREATE TABLE.

Code evidence:

  • src/stream/src/executor/source/batch_source/batch_adbc_snowflake_list.rs only starts split generation on Mutation::RefreshStart.
  • Existing refreshable batch test also shows no initial load right after CREATE TABLE (e2e_test/source_inline/refresh/refresh_table.slt: count(*) = 0 before REFRESH TABLE).

Suggested wording:

  • If refresh_interval_sec is omitted, no automatic refresh is scheduled; run REFRESH TABLE to load/refresh data.
  • In the matrix, avoid "One-time batch load" unless it explicitly says manual one-time load via REFRESH TABLE.

@copilot code review[agent] please fix per requested

Copy link
Contributor Author

Copilot AI commented Feb 26, 2026

@tabVersion Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub.

You can ask me to try again later by mentioning me in a new comment.

If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: 2ce812e8-7e97-4bcb-b31e-c0916437b3d8

Sorry for the inconvenience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document Snowflake source support

4 participants