Skip to content

Commit 23b587c

Browse files
authored
Ingest v2: Snowflake connectors - add example SQL statements and more how-to instructions, and expand details for role and unique record ID (#450)
1 parent fd3727d commit 23b587c

File tree

6 files changed

+147
-24
lines changed

6 files changed

+147
-24
lines changed

snippets/destination_connectors/snowflake.sh.mdx

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@
66
unstructured \
77
local \
88
--input-path $LOCAL_FILE_INPUT_DIR \
9-
--partition-by-api \
10-
--strategy hi_res \
119
--chunking-strategy by_title \
1210
--embedding-provider huggingface \
1311
--partition-by-api \
@@ -18,10 +16,13 @@ unstructured \
1816
--account $SNOWFLAKE_ACCOUNT \
1917
--user $SNOWFLAKE_USER \
2018
--password $SNOWFLAKE_PASSWORD \
19+
--role $SNOWFLAKE_ROLE \
2120
--host $SNOWFLAKE_HOST \
2221
--port $SNOWFLAKE_PORT \
2322
--database $SNOWFLAKE_DATABASE \
24-
--schema PUBLIC \
25-
--table-name elements \
26-
--batch-size 50
23+
--schema $SNOWFLAKE_SCHEMA \
24+
--batch-size 50 \
25+
--table-name $SNOWFLAKE_TABLE \
26+
--record-id-key $SNOWFLAKE_RECORD_ID_KEY \
27+
2728
```

snippets/destination_connectors/snowflake.v2.py.mdx

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,18 +40,22 @@ if __name__ == "__main__":
4040
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
4141
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
4242
destination_connection_config=SnowflakeConnectionConfig(
43-
access_config=SnowflakeAccessConfig(password=os.getenv("SNOWFLAKE_PASSWORD")),
43+
access_config=SnowflakeAccessConfig(
44+
password=os.getenv("SNOWFLAKE_PASSWORD")
45+
),
4446
account=os.getenv("SNOWFLAKE_ACCOUNT"),
4547
user=os.getenv("SNOWFLAKE_USER"),
4648
host=os.getenv("SNOWFLAKE_HOST"),
4749
port=os.getenv("SNOWFLAKE_PORT"),
4850
database=os.getenv("SNOWFLAKE_DATABASE"),
49-
schema="PUBLIC"
51+
schema=os.getenv("SNOWFLAKE_SCHEMA"),
52+
role=os.getenv("SNOWFLAKE_ROLE")
5053
),
5154
stager_config=SnowflakeUploadStagerConfig(),
5255
uploader_config=SnowflakeUploaderConfig(
5356
batch_size=50,
54-
table_name="elements"
57+
table_name=os.getenv("SNOWFLAKE_TABLE"),
58+
record_id_key=os.getenv("SNOWFLAKE_RECORD_ID_KEY")
5559
)
5660
).run()
5761
```

snippets/general-shared-text/snowflake-cli-api.mdx

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,16 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d
1010

1111
These environment variables:
1212

13-
- `SNOWFLAKE_ACCOUNT` - The ID of the Snowflake account, represented by `--account` (CLI) or `account` (Python).
14-
- `SNOWFLAKE_USER` - The name of the Snowflake user, represented by `--user` (CLI) or `user` (Python).
13+
- `SNOWFLAKE_ACCOUNT` - The ID of the target Snowflake account, represented by `--account` (CLI) or `account` (Python).
14+
- `SNOWFLAKE_USER` - The name of the target Snowflake user, represented by `--user` (CLI) or `user` (Python).
1515
- `SNOWFLAKE_PASSWORD` - The user's password, represented by `--password` (CLI) or `password` (Python).
16-
- `SNOWFLAKE_HOST` - The hostname for the Snowflake account, represented by `--host` (CLI) or `host` (Python).
17-
- `SNOWFLAKE_PORT` - The host's port number, represented by `--port` (CLI) or `port` (Python).
18-
- `SNOWFLAKE_DATABASE` - The name of the Snowflake database, represented by `--database` (CLI) or `database` (Python).
16+
- `SNOWFLAKE_ROLE` - The target role for the user, represented by `--role` (CLI) or `role` (Python).
17+
- `SNOWFLAKE_HOST` - The hostname for the target Snowflake warehouse, represented by `--host` (CLI) or `host` (Python).
18+
- `SNOWFLAKE_PORT` - The warehouse's port number, represented by `--port` (CLI) or `port` (Python). The default is `443` if not otherwise specified.
19+
- `SNOWFLAKE_DATABASE` - The name of the target Snowflake database, represented by `--database` (CLI) or `database` (Python).
20+
- `SNOWFLAKE_SCHEMA` - The name of the target schema in the database, represented by `--schema` (CLI) or `schema` (Python).
21+
- `SNOWFLAKE_TABLE` - The name of the target table in the schema, represented by `--table-name` (CLI) or `table_name` (Python). For the destination connector, the default is `elements` if not otherwise specified.
22+
- `SNOWFLAKE_RECORD_ID_KEY` - The name of the column in the table that uniquely identifies each record, represented by:
23+
24+
- For the source connector, `--id-column` (CLI) or `id_column` (Python).
25+
- For the destination connector, `--record-id-key` (CLI) or `record_id_key` (Python). For the destination connector, the default is `record_id` if not otherwise specified.

snippets/general-shared-text/snowflake.mdx

Lines changed: 110 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
- A Snowflake [account](https://signup.snowflake.com/) and its [identifier](https://docs.snowflake.com/user-guide/admin-account-identifier).
1+
- A Snowflake [account](https://signup.snowflake.com/) and its account identifier.
22

33
<iframe
44
width="560"
@@ -10,7 +10,19 @@
1010
allowfullscreen
1111
></iframe>
1212

13-
- The Snowflake [username and its password](https://docs.snowflake.com/user-guide/admin-user-management#creating-users) in the account.
13+
To get the identifier for the current Snowflake account:
14+
15+
1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
16+
2. In Snowsight, on the navigation menu, click your username, and then click **Account > View account details**.
17+
3. On the **Account** tab, note the value of the **Account Identifier** field.
18+
19+
Alternatively, the following Snowflake query returns the current account's identifier:
20+
21+
```text
22+
SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME() AS "Account Identifier"
23+
```
24+
25+
- The Snowflake [user's login name (not its username) and its password](https://docs.snowflake.com/user-guide/admin-user-management#creating-users) in the account.
1426

1527
<iframe
1628
width="560"
@@ -22,7 +34,35 @@
2234
allowfullscreen
2335
></iframe>
2436

25-
- The Snowflake [hostname and its port number](https://docs.snowflake.com/sql-reference/functions/system_allowlist) in the account.
37+
To view the login name for a user:
38+
39+
1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
40+
2. In Snowsight, on the navigation menu, click **Admin > Users & Roles**.
41+
3. On the **Users** tab, in the list of available users, click the name of the target user.
42+
4. In the **About** tile, note the **Login Name** for the user.
43+
44+
Alternatively, the following Snowflake query returns information about the user with the username of `<my-user>`, including their `login_name` value representing their login name:
45+
46+
```text
47+
SHOW USERS LIKE '<my-user>';
48+
```
49+
50+
- The Snowflake warehouse's [hostname and its port number](https://docs.snowflake.com/sql-reference/functions/system_allowlist) in the account.
51+
52+
To view a list of available warehouses in the current Snowflake account:
53+
54+
1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
55+
2. In Snowsight, on the navigation menu, click **Admin > Warehouses**. This view does not provide access to the warehouses' hostnames or port numbers. To get this information, you must run a Snowflake query.
56+
57+
The following Snowflake query returns a list of available warehouse types, hostnames, and port numbers in the current account. Look for the row with a `type` of `SNOWFLAKE_DEPLOYMENT`:
58+
59+
```text
60+
SELECT t.VALUE:type::VARCHAR as type,
61+
t.VALUE:host::VARCHAR as host,
62+
t.VALUE:port as port
63+
FROM TABLE(FLATTEN(input => PARSE_JSON(SYSTEM$ALLOWLIST()))) AS t;
64+
```
65+
2666
- The name of the Snowflake [database](https://docs.snowflake.com/sql-reference/sql/create-database) in the account.
2767

2868
<iframe
@@ -35,6 +75,17 @@
3575
allowfullscreen
3676
></iframe>
3777

78+
To view a list of available databases in the current Snowflake account:
79+
80+
1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
81+
2. In Snowsight, on the navigation menu, click **Data > Databases**.
82+
83+
Alternatively, the following Snowflake query returns a list of available databases in the current account:
84+
85+
```text
86+
SHOW DATABASES;
87+
```
88+
3889
- The name of the [schema](https://docs.snowflake.com/sql-reference/sql/create-schema) in the database.
3990

4091
<iframe
@@ -47,6 +98,24 @@
4798
allowfullscreen
4899
></iframe>
49100

101+
To view a list of available schemas for a database in the current Snowflake account:
102+
103+
1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
104+
2. In Snowsight, on the navigation menu, click **Data > Databases**.
105+
3. Expand the name of the target database.
106+
107+
Alternatively, the following Snowflake query returns a list of available schemas in the current account:
108+
109+
```text
110+
SHOW SCHEMAS;
111+
```
112+
113+
The following Snowflake query returns a list of available schemas for the database named `<database-name>` in the current account:
114+
115+
```text
116+
SHOW SCHEMAS IN DATABASE <database-name>;
117+
```
118+
50119
- The name of the [table](https://docs.snowflake.com/sql-reference/sql/create-table) in the schema.
51120

52121
<iframe
@@ -59,6 +128,21 @@
59128
allowfullscreen
60129
></iframe>
61130

131+
To view a list of available tables for a schema in a database in the current Snowflake account:
132+
133+
1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
134+
2. In Snowsight, on the navigation menu, click **Data > Databases**.
135+
3. Expand the name of the database that contains the target schema.
136+
4. Expand the name of the target schema.
137+
5. Expand **Tables**.
138+
139+
Alternatively, the following Snowflake query returns a list of available tables for the schema named `<schema-name>` in the datbase named
140+
`<database-name>` in the current account:
141+
142+
```text
143+
SHOW TABLES IN SCHEMA <database-name>.<schema-name>;
144+
```
145+
62146
Snowflake requires the target table to have a defined schema before Unstructured can write to the table. The recommended table
63147
schema for Unstructured is as follows:
64148

@@ -107,3 +191,26 @@
107191
PRIMARY KEY (ID)
108192
);
109193
```
194+
195+
- The name of the column in the table that uniquely identifies each record (for example, `RECORD_ID`).
196+
- The name of the Snowflake [role](https://docs.snowflake.com/sql-reference/sql/create-role) that the user belongs to and that also has sufficient access to the Snowflake database, schema, table, and host.
197+
198+
- To create a database in Snowflake, the role needs to be granted `CREATE DATABASE` privilege at the current account level; and `USAGE` privilege on the warehouse that is used to create the database.
199+
- To create a schema in a database in Snowflake, the role needs to be granted `USAGE` privilege on the database and the warehouse that is used to create the schema; and `CREATE SCHEMA` on the database.
200+
- To create a table in a schema in Snowflake, the role needs to be granted `USAGE` privilege on the database and schema and the warehouse that is used to create the table; and `CREATE TABLE` on the schema.
201+
- To write to a table in Snowflake, the role needs to be granted `USAGE` privilege on the database and schema and the warehouse that is used to write to the table; and `INSERT` on the table.
202+
- To read from a table in Snowflake, the role needs to be granted `USAGE` privilege on the database and schema and the warehouse that is used to write to the table; and `SELECT` on the table.
203+
204+
To view a list of available roles in the current Snowflake account:
205+
206+
1. Log in to [Snowsight](https://docs.snowflake.com/user-guide/ui-snowsight-homepage) with your Snowflake account.
207+
2. In Snowsight, on the navigation menu, click **Admin > Users & Roles**.
208+
3. Click the **Roles** tab.
209+
210+
Alternatively, the following Snowflake query returns a list of available roles in the current account:
211+
212+
```text
213+
SHOW ROLES;
214+
```
215+
216+
[Grant privileges to a role](https://docs.snowflake.com/sql-reference/sql/grant-privilege). [Learn more](https://docs.snowflake.com/user-guide/security-access-control-privileges).

snippets/source_connectors/snowflake.sh.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,14 @@ unstructured \
66
--account $SNOWFLAKE_ACCOUNT \
77
--user $SNOWFLAKE_USER \
88
--password $SNOWFLAKE_PASSWORD \
9+
--role $SNOWFLAKE_ROLE \
910
--host $SNOWFLAKE_HOST \
1011
--port $SNOWFLAKE_PORT \
1112
--database $SNOWFLAKE_DATABASE \
12-
--schema PUBLIC \
13-
--table-name elements\
14-
--id-column id \
15-
--batch-size 100 \
13+
--schema $SNOWFLAKE_SCHEMA \
14+
--batch-size 50 \
15+
--table-name $SNOWFLAKE_TABLE \
16+
--id-column $SNOWFLAKE_RECORD_ID_KEY \
1617
--download-dir $LOCAL_FILE_DOWNLOAD_DIR\
1718
--partition-by-api \
1819
--api-key $UNSTRUCTURED_API_KEY \

snippets/source_connectors/snowflake.v2.py.mdx

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,19 +21,22 @@ if __name__ == "__main__":
2121
Pipeline.from_configs(
2222
context=ProcessorConfig(),
2323
indexer_config=SnowflakeIndexerConfig(
24-
table_name="elements",
25-
id_column="id",
24+
table_name=os.getenv("SNOWFLAKE_TABLE"),
25+
id_column=os.getenv("SNOWFLAKE_RECORD_ID_KEY"),
2626
batch_size=100
2727
),
2828
downloader_config=SnowflakeDownloaderConfig(download_dir=os.getenv("LOCAL_FILE_DOWNLOAD_DIR")),
2929
source_connection_config=SnowflakeConnectionConfig(
30-
access_config=SnowflakeAccessConfig(password=os.getenv("SNOWFLAKE_PASSWORD")),
30+
access_config=SnowflakeAccessConfig(
31+
password=os.getenv("SNOWFLAKE_PASSWORD")
32+
),
3133
account=os.getenv("SNOWFLAKE_ACCOUNT"),
3234
user=os.getenv("SNOWFLAKE_USER"),
3335
host=os.getenv("SNOWFLAKE_HOST"),
3436
port=os.getenv("SNOWFLAKE_PORT"),
3537
database=os.getenv("SNOWFLAKE_DATABASE"),
36-
schema="PUBLIC"
38+
schema=os.getenv("SNOWFLAKE_SCHEMA"),
39+
role=os.getenv("SNOWFLAKE_ROLE")
3740
),
3841
partitioner_config=PartitionerConfig(
3942
partition_by_api=True,

0 commit comments

Comments
 (0)