You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nf-snowflake is a [Nextflow](https://www.nextflow.io/docs/latest/overview.html) plugin which allows Nextflow pipeline to be run inside [Snowpark Container Service](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview).
3
+
## Overview
4
+
nf-snowflake is a [Nextflow](https://www.nextflow.io/docs/latest/overview.html) plugin that enables Nextflow pipelines to run inside [Snowpark Container Service](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview).
5
5
6
-
This plugin requires both Nextflow main process and worker process being run as a container job inside Snowflake. Each process/task in Nextflow will be translated to a [Snowflake Job Service](https://docs.snowflake.com/en/sql-reference/sql/execute-job-service). The main process can be a job service or a long-running service. Intermediate result between different Nextflow processes will be shared via [stage mount](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/snowflake-stage-volume), so the same stage mount configuration needs to be applied to both main process container and worker process container.
6
+
Each Nextflow task is translated to a [Snowflake Job Service](https://docs.snowflake.com/en/sql-reference/sql/execute-job-service) and executed as an SPCS job. The Nextflow main/driver program can run in two modes:
7
7
8
-
## QuickStart
8
+
1.**Locally** - Running on your local machine or CI/CD environment, connecting to Snowflake via JDBC
9
+
2.**Inside SPCS** - Running as a separate SPCS job within Snowpark Container Services
9
10
10
-
This quick start guide assumes you are familiar with both Nextflow and Snowpark Container Service.
11
+
These two execution modes correspond to the two authentication methods supported by the plugin. When the main/driver program runs inside an SPCS job, Snowflake automatically injects the required environment variables (such as `SNOWFLAKE_ACCOUNT`, `SNOWFLAKE_HOST`, etc.) and the session token file (`/snowflake/session/token`). The plugin automatically discovers and uses these credentials for authentication.
11
12
12
-
1. Create a compute pool
13
+
Intermediate results between different Nextflow processes are shared via [Snowflake stages](https://docs.snowflake.com/en/user-guide/data-load-local-file-system-create-stage), which must be configured as the working directory.
When the Nextflow main/driver program runs inside an SPCS job, Snowflake automatically injects the session token file at `/snowflake/session/token` and the following environment variables:
35
+
36
+
-`SNOWFLAKE_ACCOUNT`
37
+
-`SNOWFLAKE_HOST`
38
+
-`SNOWFLAKE_DATABASE`
39
+
-`SNOWFLAKE_SCHEMA`
40
+
-`SNOWFLAKE_WAREHOUSE` (optional)
41
+
42
+
The plugin automatically discovers and uses these credentials for authentication. No additional configuration is required.
When the Nextflow main/driver program runs locally (on your machine or in CI/CD), the plugin uses the Snowflake [connections.toml](https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-configure#connecting-using-the-connections-toml-file) configuration file for authentication.
If no `connectionName` is specified, the plugin will use:
82
+
1. Connection name from `SNOWFLAKE_DEFAULT_CONNECTION_NAME` environment variable
83
+
2. The `default` connection from connections.toml
84
+
85
+
## Configuration Reference
86
+
87
+
All plugin configurations are defined under the `snowflake` scope in your `nextflow.config`:
88
+
89
+
### computePool
90
+
91
+
The name of the Snowflake compute pool to use for executing jobs.
92
+
93
+
```groovy
94
+
snowflake {
95
+
computePool = 'MY_COMPUTE_POOL'
96
+
}
97
+
```
98
+
99
+
### registryMappings
100
+
101
+
Docker registry mappings for container images. Snowflake does not support pulling images directly from arbitrary external registries. Instead, you must first replicate container images from external registries (such as Docker Hub, GitHub Container Registry, etc.) to Snowflake image repositories.
102
+
103
+
The `registryMappings` configuration allows you to automatically replace external registry hostnames with Snowflake image repository names in your pipeline's container specifications.
104
+
105
+
**Format:** Comma-separated list of mappings in the form `external_registry:snowflake_repository`
1. First, replicate images to your Snowflake image repository:
115
+
```bash
116
+
docker pull docker.io/alpine:latest
117
+
docker tag docker.io/alpine:latest <snowflake_repo_url>/alpine:latest
118
+
docker push <snowflake_repo_url>/alpine:latest
119
+
```
120
+
121
+
2. Then, when your process uses `container 'docker.io/alpine:latest'`, the plugin automatically replaces `docker.io` with your Snowflake image repository URL, resulting in the correct Snowflake image reference.
122
+
123
+
### connectionName
124
+
125
+
The name of the connection to use from the connections.toml file. When specified, the JDBC driver will use the connection configuration defined under this name.
126
+
127
+
```groovy
128
+
snowflake {
129
+
connectionName = 'production'
130
+
}
131
+
```
132
+
133
+
**Note:** This is only used when the session token file is not available (i.e., when running outside Snowpark Container Services).
134
+
135
+
## Quick Start
136
+
137
+
This guide assumes you are familiar with both Nextflow and Snowpark Container Services.
138
+
139
+
### 1. Create a Compute Pool
140
+
141
+
```sql
142
+
CREATE COMPUTE POOL my_compute_pool
15
143
MIN_NODES =2
16
-
MAX_NODES = 2
144
+
MAX_NODES =5
17
145
INSTANCE_FAMILY = CPU_X64_M
18
-
auto_suspend_secs=3600
19
-
;
146
+
AUTO_SUSPEND_SECS =3600;
20
147
```
21
-
2. Create Snowflake Internal Stage for working directory
22
-
```
23
-
create or replace stage nxf_workdir encryption=(type = 'SNOWFLAKE_SSE');
148
+
149
+
### 2. Create a Snowflake Internal Stage for Working Directory
150
+
151
+
```sql
152
+
CREATE OR REPLACE STAGE nxf_workdir
153
+
ENCRYPTION = (TYPE ='SNOWFLAKE_SSE');
24
154
```
25
-
4. Build the container image for each Nextflow [process](https://www.nextflow.io/docs/latest/process.html), upload the image to [Snowflake Image Registry](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/working-with-registry-repository) and update the each process's [container](https://www.nextflow.io/docs/latest/reference/process.html#process-container) field.
26
-
e.g.
155
+
156
+
### 3. Set Up Image Repository
157
+
158
+
```sql
159
+
CREATE IMAGE REPOSITORY IF NOT EXISTS my_images;
27
160
```
161
+
162
+
### 4. Build and Upload Container Images
163
+
164
+
Build the container image for each Nextflow [process](https://www.nextflow.io/docs/latest/process.html), upload the image to [Snowflake Image Registry](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/working-with-registry-repository), and update each process's [container](https://www.nextflow.io/docs/latest/reference/process.html#process-container) field.
**IMPORTANT:** The Nextflow working directory (`workDir`) **must** be a Snowflake stage using the `snowflake://` URI scheme. This is a strict requirement for the plugin to function correctly.
0 commit comments