Skip to content

Commit 6d6d2db

Browse files
authored
Support connectionName config (#45)
1 parent c55874c commit 6d6d2db

File tree

4 files changed

+295
-29
lines changed

4 files changed

+295
-29
lines changed

README.md

Lines changed: 261 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,173 @@
11
# nf-snowflake plugin
22

3-
## Overview
4-
nf-snowflake is a [Nextflow](https://www.nextflow.io/docs/latest/overview.html) plugin which allows Nextflow pipeline to be run inside [Snowpark Container Service](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview).
3+
## Overview
4+
nf-snowflake is a [Nextflow](https://www.nextflow.io/docs/latest/overview.html) plugin that enables Nextflow pipelines to run inside [Snowpark Container Service](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview).
55

6-
This plugin requires both Nextflow main process and worker process being run as a container job inside Snowflake. Each process/task in Nextflow will be translated to a [Snowflake Job Service](https://docs.snowflake.com/en/sql-reference/sql/execute-job-service). The main process can be a job service or a long-running service. Intermediate result between different Nextflow processes will be shared via [stage mount](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/snowflake-stage-volume), so the same stage mount configuration needs to be applied to both main process container and worker process container.
6+
Each Nextflow task is translated to a [Snowflake Job Service](https://docs.snowflake.com/en/sql-reference/sql/execute-job-service) and executed as an SPCS job. The Nextflow main/driver program can run in two modes:
77

8-
## QuickStart
8+
1. **Locally** - Running on your local machine or CI/CD environment, connecting to Snowflake via JDBC
9+
2. **Inside SPCS** - Running as a separate SPCS job within Snowpark Container Services
910

10-
This quick start guide assumes you are familiar with both Nextflow and Snowpark Container Service.
11+
These two execution modes correspond to the two authentication methods supported by the plugin. When the main/driver program runs inside an SPCS job, Snowflake automatically injects the required environment variables (such as `SNOWFLAKE_ACCOUNT`, `SNOWFLAKE_HOST`, etc.) and the session token file (`/snowflake/session/token`). The plugin automatically discovers and uses these credentials for authentication.
1112

12-
1. Create a compute pool
13+
Intermediate results between different Nextflow processes are shared via [Snowflake stages](https://docs.snowflake.com/en/user-guide/data-load-local-file-system-create-stage), which must be configured as the working directory.
14+
15+
## Prerequisites
16+
17+
Before using this plugin, you should have:
18+
19+
- **Nextflow** (version 23.04.0 or later)
20+
- **Snowflake account** with access to:
21+
- Snowpark Container Services (Compute Pools/Image Registries)
22+
- Internal stages
23+
- **Familiarity with**:
24+
- Nextflow pipelines and configuration
25+
- Docker/container images
26+
- Snowflake authentication methods
27+
28+
## Authentication
29+
30+
The plugin supports two authentication methods, corresponding to the two execution modes for the main/driver program:
31+
32+
### 1. Session Token Authentication (Main/Driver Running Inside SPCS)
33+
34+
When the Nextflow main/driver program runs inside an SPCS job, Snowflake automatically injects the session token file at `/snowflake/session/token` and the following environment variables:
35+
36+
- `SNOWFLAKE_ACCOUNT`
37+
- `SNOWFLAKE_HOST`
38+
- `SNOWFLAKE_DATABASE`
39+
- `SNOWFLAKE_SCHEMA`
40+
- `SNOWFLAKE_WAREHOUSE` (optional)
41+
42+
The plugin automatically discovers and uses these credentials for authentication. No additional configuration is required.
43+
44+
### 2. Connections.toml Authentication (Main/Driver Running Locally)
45+
46+
When the Nextflow main/driver program runs locally (on your machine or in CI/CD), the plugin uses the Snowflake [connections.toml](https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-configure#connecting-using-the-connections-toml-file) configuration file for authentication.
47+
48+
**File Locations** (searched in order):
49+
1. `~/.snowflake/connections.toml` (if directory exists)
50+
2. Location specified in `SNOWFLAKE_HOME` environment variable
51+
3. OS-specific defaults:
52+
- Linux: `~/.config/snowflake/connections.toml`
53+
- macOS: `~/Library/Application Support/snowflake/connections.toml`
54+
- Windows: `%USERPROFILE%\AppData\Local\snowflake\connections.toml`
55+
56+
**Example connections.toml:**
57+
```toml
58+
[default]
59+
account = "myaccount"
60+
user = "myuser"
61+
password = "mypassword"
62+
database = "mydb"
63+
schema = "myschema"
64+
warehouse = "mywh"
65+
66+
[production]
67+
account = "prodaccount"
68+
authenticator = "externalbrowser"
69+
database = "proddb"
70+
schema = "public"
71+
```
72+
73+
**Specify a connection in nextflow.config:**
74+
```groovy
75+
snowflake {
76+
connectionName = 'production'
77+
computePool = 'MY_COMPUTE_POOL'
78+
}
79+
```
80+
81+
If no `connectionName` is specified, the plugin will use:
82+
1. Connection name from `SNOWFLAKE_DEFAULT_CONNECTION_NAME` environment variable
83+
2. The `default` connection from connections.toml
84+
85+
## Configuration Reference
86+
87+
All plugin configurations are defined under the `snowflake` scope in your `nextflow.config`:
88+
89+
### computePool
90+
91+
The name of the Snowflake compute pool to use for executing jobs.
92+
93+
```groovy
94+
snowflake {
95+
computePool = 'MY_COMPUTE_POOL'
96+
}
97+
```
98+
99+
### registryMappings
100+
101+
Docker registry mappings for container images. Snowflake does not support pulling images directly from arbitrary external registries. Instead, you must first replicate container images from external registries (such as Docker Hub, GitHub Container Registry, etc.) to Snowflake image repositories.
102+
103+
The `registryMappings` configuration allows you to automatically replace external registry hostnames with Snowflake image repository names in your pipeline's container specifications.
104+
105+
**Format:** Comma-separated list of mappings in the form `external_registry:snowflake_repository`
106+
107+
```groovy
108+
snowflake {
109+
registryMappings = 'docker.io:my_registry,ghcr.io:github_registry'
110+
}
13111
```
14-
CREATE COMPUTE POOL cp
112+
113+
**How it works:**
114+
1. First, replicate images to your Snowflake image repository:
115+
```bash
116+
docker pull docker.io/alpine:latest
117+
docker tag docker.io/alpine:latest <snowflake_repo_url>/alpine:latest
118+
docker push <snowflake_repo_url>/alpine:latest
119+
```
120+
121+
2. Then, when your process uses `container 'docker.io/alpine:latest'`, the plugin automatically replaces `docker.io` with your Snowflake image repository URL, resulting in the correct Snowflake image reference.
122+
123+
### connectionName
124+
125+
The name of the connection to use from the connections.toml file. When specified, the JDBC driver will use the connection configuration defined under this name.
126+
127+
```groovy
128+
snowflake {
129+
connectionName = 'production'
130+
}
131+
```
132+
133+
**Note:** This is only used when the session token file is not available (i.e., when running outside Snowpark Container Services).
134+
135+
## Quick Start
136+
137+
This guide assumes you are familiar with both Nextflow and Snowpark Container Services.
138+
139+
### 1. Create a Compute Pool
140+
141+
```sql
142+
CREATE COMPUTE POOL my_compute_pool
15143
MIN_NODES = 2
16-
MAX_NODES = 2
144+
MAX_NODES = 5
17145
INSTANCE_FAMILY = CPU_X64_M
18-
auto_suspend_secs=3600
19-
;
146+
AUTO_SUSPEND_SECS = 3600;
20147
```
21-
2. Create Snowflake Internal Stage for working directory
22-
```
23-
create or replace stage nxf_workdir encryption=(type = 'SNOWFLAKE_SSE');
148+
149+
### 2. Create a Snowflake Internal Stage for Working Directory
150+
151+
```sql
152+
CREATE OR REPLACE STAGE nxf_workdir
153+
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');
24154
```
25-
4. Build the container image for each Nextflow [process](https://www.nextflow.io/docs/latest/process.html), upload the image to [Snowflake Image Registry](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/working-with-registry-repository) and update the each process's [container](https://www.nextflow.io/docs/latest/reference/process.html#process-container) field.
26-
e.g.
155+
156+
### 3. Set Up Image Repository
157+
158+
```sql
159+
CREATE IMAGE REPOSITORY IF NOT EXISTS my_images;
27160
```
161+
162+
### 4. Build and Upload Container Images
163+
164+
Build the container image for each Nextflow [process](https://www.nextflow.io/docs/latest/process.html), upload the image to [Snowflake Image Registry](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/working-with-registry-repository), and update each process's [container](https://www.nextflow.io/docs/latest/reference/process.html#process-container) field.
165+
166+
**Example process definition:**
167+
```groovy
28168
process INDEX {
29169
tag "$transcriptome.simpleName"
30-
container '/db/schema/repo/image_name:1.0'
170+
container '/mydb/myschema/my_images/salmon:1.10.0'
31171
32172
input:
33173
path transcriptome
@@ -41,22 +181,115 @@ process INDEX {
41181
"""
42182
}
43183
```
44-
5. Add a snowflake profile to the nextflow.config file and enable nf-snowflake plugin e.g.
45-
```
46-
...
184+
185+
### 5. Configure Nextflow
186+
187+
Add a Snowflake profile to your `nextflow.config` file and enable the nf-snowflake plugin:
188+
189+
```groovy
47190
plugins {
48-
id 'nf-snowflake@1.0.0'
191+
id 'nf-snowflake@1.0.0'
49192
}
50-
...
51-
snowflake {
52-
process.executor = 'snowflake'
193+
194+
profiles {
195+
snowflake {
196+
process.executor = 'snowflake'
197+
198+
snowflake {
199+
computePool = 'my_compute_pool'
200+
registryMappings = 'docker.io:my_images'
201+
}
202+
}
203+
}
204+
```
205+
206+
### 6. Run Your Pipeline
207+
208+
Execute the Nextflow pipeline with the Snowflake profile:
209+
210+
```bash
211+
nextflow run . -profile snowflake -work-dir snowflake://stage/nxf_workdir/
212+
```
213+
214+
## Snowflake Filesystem and Working Directory
215+
216+
### Snowflake Stage URI
217+
218+
The plugin uses a custom URI scheme to access Snowflake internal stages:
219+
220+
```
221+
snowflake://stage/<stage_name>/<path>
222+
```
223+
224+
**Components:**
225+
- `snowflake://` - URI scheme identifier
226+
- `stage/` - Literal prefix indicating a Snowflake stage
227+
- `<stage_name>` - The name of your Snowflake internal stage
228+
- `<path>` - Optional path within the stage
229+
230+
**Examples:**
231+
```groovy
232+
// Access root of a stage
233+
workDir = 'snowflake://stage/my_stage/'
234+
235+
// Access a subdirectory within a stage
236+
workDir = 'snowflake://stage/my_stage/workflows/pipeline1/'
237+
```
238+
239+
### Working Directory Requirement
240+
241+
**IMPORTANT:** The Nextflow working directory (`workDir`) **must** be a Snowflake stage using the `snowflake://` URI scheme. This is a strict requirement for the plugin to function correctly.
242+
243+
The working directory is used to:
244+
- Store intermediate task results
245+
- Share data between pipeline processes
246+
- Store task execution metadata and logs
247+
248+
**Correct configuration:**
249+
```groovy
250+
profiles {
53251
snowflake {
54-
computePool = 'CP'
252+
process.executor = 'snowflake'
253+
workDir = 'snowflake://stage/nxf_workdir/' // ✓ Valid
254+
255+
snowflake {
256+
computePool = 'my_compute_pool'
257+
}
55258
}
56-
}
57-
...
259+
}
58260
```
59-
6. Run nextflow pipeline with snowflake's Snowpark Container Service
261+
262+
**Or specify on the command line:**
263+
```bash
264+
nextflow run . -profile snowflake -work-dir snowflake://stage/nxf_workdir/
60265
```
61-
nextflow run . -profile snowflake -work-dir snowflake://stage/NXF_WORKDIR/
266+
267+
**Invalid configurations:**
268+
```groovy
269+
workDir = 's3://my-bucket/work/' // ✗ Invalid - not a Snowflake stage
270+
workDir = '/local/path/work/' // ✗ Invalid - local filesystem
271+
workDir = 'snowflake://my_stage/work/' // ✗ Invalid - missing 'stage/' prefix
62272
```
273+
274+
### Stage Setup
275+
276+
Before running your pipeline, ensure your stage is properly configured:
277+
278+
```sql
279+
-- Create an internal stage with encryption
280+
CREATE OR REPLACE STAGE my_workdir
281+
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');
282+
283+
-- Verify stage exists
284+
SHOW STAGES LIKE 'my_workdir';
285+
286+
-- Optional: Test stage access
287+
LIST @my_workdir;
288+
```
289+
290+
## Additional Resources
291+
292+
- [Nextflow Documentation](https://www.nextflow.io/docs/latest/index.html)
293+
- [Snowpark Container Services Documentation](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview)
294+
- [Snowflake JDBC Configuration](https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-configure)
295+
- [Nextflow Plugin Development](https://www.nextflow.io/docs/latest/plugins.html)

src/main/groovy/nextflow/snowflake/SnowflakeConnectionPool.groovy

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import java.util.concurrent.locks.ReentrantLock
1515
import java.util.Queue
1616
import java.util.Map
1717
import net.snowflake.client.jdbc.SnowflakeConnection
18+
import nextflow.snowflake.config.SnowflakeConfig
1819

1920
@Slf4j
2021
@CompileStatic
@@ -28,6 +29,9 @@ class SnowflakeConnectionPool {
2829
private final Map<String, PooledConnection> allConnections = new HashMap<>()
2930
private final ReentrantLock poolLock = new ReentrantLock()
3031

32+
// Configuration
33+
private SnowflakeConfig config
34+
3135
// Singleton instance
3236
private static final SnowflakeConnectionPool INSTANCE = new SnowflakeConnectionPool()
3337

@@ -38,6 +42,13 @@ class SnowflakeConnectionPool {
3842
private SnowflakeConnectionPool() {
3943
}
4044

45+
/**
46+
* Set the Snowflake configuration for connection creation
47+
*/
48+
void setConfig(SnowflakeConfig config) {
49+
this.config = config
50+
}
51+
4152
public static SnowflakeConnectionPool getInstance() {
4253
return INSTANCE
4354
}
@@ -179,7 +190,15 @@ class SnowflakeConnectionPool {
179190
properties)
180191
} else {
181192
// Fall back to connections.toml via jdbc:snowflake:auto
182-
return DriverManager.getConnection("jdbc:snowflake:auto")
193+
// Use connectionName from config if available
194+
String jdbcUrl = "jdbc:snowflake:auto"
195+
if (config?.connectionName) {
196+
jdbcUrl = "jdbc:snowflake:auto?connectionName=${config.connectionName}"
197+
log.debug("Using connection name from config: ${config.connectionName}")
198+
} else {
199+
log.debug("No connection name specified, using jdbc:snowflake:auto (will use default connection)")
200+
}
201+
return DriverManager.getConnection(jdbcUrl)
183202
}
184203
}
185204

src/main/groovy/nextflow/snowflake/SnowflakeExecutor.groovy

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,9 @@ class SnowflakeExecutor extends Executor implements ExtensionPoint {
123123
final Map configMap = session.config.navigate("snowflake") as Map
124124
snowflakeConfig = new SnowflakeConfig(configMap ?: [:])
125125

126+
// Set config on connection pool for connection discovery
127+
SnowflakeConnectionPool.getInstance().setConfig(snowflakeConfig)
128+
126129
// Validate that workDir uses snowflake:// scheme
127130
validateWorkDir()
128131

0 commit comments

Comments
 (0)