docs: Add Snowflake source connector documentation and align behavior details#986
docs: Add Snowflake source connector documentation and align behavior details#986
Conversation
Co-authored-by: kwannoel <47273164+kwannoel@users.noreply.github.com>
Co-authored-by: kwannoel <47273164+kwannoel@users.noreply.github.com>
Co-authored-by: kwannoel <47273164+kwannoel@users.noreply.github.com>
|
|
||
| ## Data type mapping | ||
|
|
||
| The following table shows the corresponding data types between Snowflake and RisingWave. For details on native RisingWave data types, see [Overview of data types](/sql/data-types/overview). |
There was a problem hiding this comment.
Could you clarify what needs to be changed with the doc ref on line 130? The current reference [Overview of data types](/sql/data-types/overview) follows the same pattern used in other source/sink documentation (e.g., Snowflake sink line 57, SQL Server CDC line 220). Should I use a different link format or reference a different page?
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive documentation for the Snowflake source connector introduced in risingwavelabs/risingwave#23988. The connector enables periodic batch ingestion from Snowflake tables using the ADBC (Arrow Database Connectivity) protocol with automatic schema inference and FULL_RELOAD refresh mode.
Changes:
- New documentation page for Snowflake source connector with syntax, authentication methods, data type mappings, and examples
- Updated navigation and support matrix in the ingestion overview
- Added bidirectional cross-links between Snowflake source and sink documentation
- Integrated references in the Snowflake-Iceberg documentation
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| ingestion/sources/snowflake.mdx | Complete documentation for the ADBC Snowflake source connector including prerequisites, connection syntax, authentication methods (username/password, JWT, OAuth), parameter descriptions, data type mappings, and usage examples |
| docs.json | Added Snowflake source to the "Database tables (non-CDC)" navigation group |
| ingestion/overview.mdx | Added Snowflake to the sources list with "Latest" badge and added entry to the ingestion method support matrix |
| integrations/destinations/snowflake.mdx | Added cross-reference note linking to the new Snowflake source documentation |
| iceberg/integ-snowflake.mdx | Added Note section with bidirectional links to both Snowflake source and sink documentation |
| adbc_snowflake.auth_type = 'auth_oauth' | ||
| ); | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The OAuth authentication example sets adbc_snowflake.auth_type = 'auth_oauth' but doesn't include any OAuth-specific parameters such as an OAuth token or token endpoint. OAuth authentication typically requires additional parameters beyond just setting the auth type. The example should either include the required OAuth parameters or add a note explaining what additional configuration is needed for OAuth to work properly.
| Note: This example only shows how to set the authentication type to OAuth. You must also configure the required OAuth credentials and related settings for the ADBC Snowflake connector (for example, via its configuration or environment) for authentication to work correctly. |
ingestion/sources/snowflake.mdx
Outdated
| adbc_snowflake.database = 'database_name', | ||
| adbc_snowflake.schema = 'schema_name', | ||
| adbc_snowflake.warehouse = 'warehouse_name', | ||
| adbc_snowflake.table = 'table_name' |
There was a problem hiding this comment.
In the syntax example, the parameter adbc_snowflake.table is set to 'table_name', which creates ambiguity since table_name is also used as the CREATE TABLE name on line 25. Consider using a different placeholder like 'source_table_name' or 'snowflake_table_name' to make it clear that these refer to different tables (the Snowflake source table vs. the RisingWave table being created).
| adbc_snowflake.table = 'table_name' | |
| adbc_snowflake.table = 'source_table_name' |
ingestion/overview.mdx
Outdated
| | **Google Cloud Storage** | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools | | ||
| | **Azure Blob** | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools | | ||
| | **Apache Iceberg** | ❌ | ✅ | ⚠️ | Batch only; periodic via external tools | | ||
| | **Snowflake** | ❌ | ❌ | ✅ | Periodic refresh with `refresh_interval_sec` | |
There was a problem hiding this comment.
The support matrix shows "❌" for One-Time Batch ingestion for Snowflake, but the connector supports manual refresh via REFRESH TABLE command (similar to Iceberg). According to the documentation, if refresh_interval_sec is omitted, the table will only refresh when manually triggered. This is effectively one-time batch ingestion. Consider marking this as "✅" to be consistent with Apache Iceberg (line 288) which also supports FULL_RELOAD mode with manual refresh.
| | **Snowflake** | ❌ | ❌ | ✅ | Periodic refresh with `refresh_interval_sec` | | |
| | **Snowflake** | ❌ | ✅ | ✅ | Manual `REFRESH TABLE` when `refresh_interval_sec` is omitted; periodic when set | |
ingestion/sources/snowflake.mdx
Outdated
| | :--------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | connector | Must be `adbc_snowflake`. | | ||
| | refresh_mode | Must be `FULL_RELOAD`. The entire table is re-read on each refresh. | | ||
| | refresh_interval_sec | The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. | |
There was a problem hiding this comment.
The parameter description for refresh_interval_sec states it's required ("All parameters are required unless specified otherwise"), but this conflicts with the behavior described in the Limitations section and similar connectors like Iceberg, where omitting this parameter means manual refresh only. The description should clarify that this parameter is optional, and if omitted, the table will only refresh when manually triggered via REFRESH TABLE.
| | refresh_interval_sec | The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. | | |
| | refresh_interval_sec | **Optional.** The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. If omitted, the table is not refreshed automatically and only updates when you run `REFRESH TABLE`. | |
ingestion/sources/snowflake.mdx
Outdated
| | refresh_interval_sec | The refresh interval in seconds. Determines how frequently data is fetched from Snowflake. | | ||
| | adbc_snowflake.account | The Snowflake account identifier (e.g., `myaccount.us-east-1` or `myaccount`). | | ||
| | adbc_snowflake.username | The Snowflake username for authentication. | | ||
| | adbc_snowflake.password | **Optional**. The password for username/password authentication. Required if using the default `auth_snowflake` authentication type. | |
There was a problem hiding this comment.
The parameter description states that adbc_snowflake.password is "Required if using the default auth_snowflake authentication type", but the default authentication type is described as auth_snowflake on line 63. There's an inconsistency in naming - the description uses "auth_snowflake" but should verify this matches the actual implementation. Based on the parameter name pattern and auth type examples, this should likely be just the default (username/password) rather than specifically named "auth_snowflake".
| | adbc_snowflake.password | **Optional**. The password for username/password authentication. Required if using the default `auth_snowflake` authentication type. | | |
| | adbc_snowflake.password | **Optional**. The password for username/password authentication. Required when using username/password authentication (the default auth type). | |
| ## Prerequisites | ||
|
|
||
| * A Snowflake account with access to the database and tables you want to ingest. | ||
| * The Snowflake account identifier (e.g., `myaccount.us-east-1`). | ||
| * Valid authentication credentials (username/password, OAuth token, JWT private key, etc.). | ||
| * Network access from RisingWave to your Snowflake instance. | ||
|
|
There was a problem hiding this comment.
Similar to other new features in the documentation (e.g., MySQL table source has "Added in v2.2.0"), this Snowflake source connector should include a version note indicating when it was added to RisingWave. This helps users understand feature availability across different RisingWave versions. The note should be added after the introductory description, following the pattern seen in other connector documentation.
ingestion/sources/snowflake.mdx
Outdated
|
|
||
| * [Sink data to Snowflake](/integrations/destinations/snowflake) - Learn how to write data from RisingWave back to Snowflake | ||
| * [Work with Snowflake and Iceberg](/iceberg/integ-snowflake) - Integrate Snowflake with Apache Iceberg catalogs | ||
| * [Data formats and encoding options](/ingestion/formats-and-encoding-options) - Understand supported data formats |
There was a problem hiding this comment.
The "What's next?" section links to "Data formats and encoding options", but the Snowflake ADBC connector doesn't use FORMAT/ENCODE options - it relies on automatic schema inference from the Snowflake table. This link may confuse users as it's not relevant to the Snowflake source connector. Consider replacing this with a more relevant link, such as documentation about refreshable tables (REFRESH TABLE command) or other batch ingestion patterns.
| * [Data formats and encoding options](/ingestion/formats-and-encoding-options) - Understand supported data formats |
|
I checked this docs PR against the merged implementation in Overall this is a great addition, but a few points are not fully aligned with actual behavior yet:
If helpful, I can also propose concrete wording edits for each section. |
@copilot please fix per requested |
Co-authored-by: tabVersion <14017336+tabVersion@users.noreply.github.com>
Co-authored-by: tabVersion <14017336+tabVersion@users.noreply.github.com>
Addressed in commit I updated the docs to reflect implementation behavior:
Screenshot: https://github.com/user-attachments/assets/74cb6f23-35df-49cb-bf5c-f1d32b13dc7b |
|
Re-review after latest updates: much closer now 👍 I found one remaining behavior mismatch that can mislead users:
In the current implementation, batch refresh loaders run on Code evidence:
Suggested wording:
|
@copilot code review[agent] please fix per requested |
|
@tabVersion Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
Description
Documents the Snowflake source connector (
adbc_snowflake) introduced in risingwavelabs/risingwave#23988, and aligns the docs with merged implementation behavior after review feedback.Documentation added and updated
Core page:
/ingestion/sources/snowflake.mdxsource_table_in_snowflake)adbc_snowflake.auth_token) in both parameter table and OAuth exampleadbc_snowflake.jwt_private_key_pathadbc_snowflake.jwt_private_key_pkcs8_valueadbc_snowflake.jwt_private_key_pkcs8_passwordauth_jwt)refresh_interval_secas optional and documented manual refresh withREFRESH TABLEwhen omittedNavigation and overview updates:
docs.jsonunder "Database tables (non-CDC)"/ingestion/overview.mdxsupport matrix:Cross-links:
REFRESH TABLEin “What’s next?”Example usage
Screenshot
Updated screenshot: https://github.com/user-attachments/assets/74cb6f23-35df-49cb-bf5c-f1d32b13dc7b
Related code PR
risingwavelabs/risingwave#23988
Related doc issue
https://github.com/risingwavelabs/risingwave-docs/issues/[issue_number]
Checklist
mint.jsonto include the page in the table of contents.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.