Skip to content

Docker permission error using Slurm on HPC with Nextflow #4646

@jdlamstein

Description

@jdlamstein

Bug report

Similar issues include:
#2583
https://forums.docker.com/t/got-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket/113292

Expected behavior and actual behavior

I'm submitting a test workflow into the HPC with Nextflow. The test runs a hello world bash script. When I run this, I get a docker socket permission error:

docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create?name=nxf-vfPUNFEXBmP4OS8RLgRNOlTh": dial unix /var/run/docker.sock: connect: permission denied.

From what I can see, the gid of /var/run/docker.sock is included in id --groups.

Steps to reproduce the problem

I run

nextflow run simpletest.nf

My simpletest.nf file is:


#!/usr/bin/env nextflow
/*
 * pipeline input parameters
 */

params.text = 'Hello World!'


text_ch = Channel.of(params.text)



params.outdir = "results"

log.info """\
    B A S H  H E L L O  W O R L D !
    ===================================
    bash input : ${params.text}
    """
    .stripIndent()



process BASHEX {
    tag "Bash Script Test"

    input:
    val x

    output:
    val true
    
    script:
    """
    test.sh
    """
}

workflow {
    BASHEX(text_ch)
    bashresults_ch = BASHEX.out
}

workflow.onComplete {
    log.info ( workflow.success ? "\nDone! Open the following report in your browser --> $params.outdir/pyexample_report.html\n" : "Oops .. something went wrong" )
}

And my nextflow.config file is:

process.container = 'jdlamstein/datastudy'
docker.runOptions = '-u $(id -u):$(id -g)'
docker.enabled = true
process {
    executor = 'slurm'
    queue = 'work'
    memory = '10 GB'
    time = '30 min'
    cpus = 4
}

Program output

Below is the .nextflow.log


Jan-09 14:12:00.547 [main] DEBUG nextflow.cli.Launcher - $> nextflow run simpletest.nf
Jan-09 14:12:00.654 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 23.10.0
Jan-09 14:12:00.674 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/finkbeiner/imaging/home/jlamstein/.nextflow/plugins; core-plugins: nf-amazon@2.1.4,nf-azure@1.3.2,nf-cloudcache@0.3.0,nf-codecommit@0.1.5,nf-console@1.0.6,nf-ga4gh@1.1.0,nf-google@1.8.3,nf-tower@1.6.3,nf-wave@1.0.0
Jan-09 14:12:00.689 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Jan-09 14:12:00.691 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Jan-09 14:12:00.694 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Jan-09 14:12:00.705 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Jan-09 14:12:00.723 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /imaging-work/datastudy/nextflow.config
Jan-09 14:12:00.725 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file:/imaging-work/datastudy/nextflow.config
Jan-09 14:12:00.746 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Jan-09 14:12:01.478 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 by global default
Jan-09 14:12:01.497 [main] INFO  nextflow.cli.CmdRun - Launching `simpletest.nf` [infallible_pare] DSL2 - revision: f10ea95ea1
Jan-09 14:12:01.498 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Jan-09 14:12:01.498 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
Jan-09 14:12:01.509 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /finkbeiner/imaging/home/jlamstein/.nextflow/secrets/store.json
Jan-09 14:12:01.513 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@3b545206] - activable => nextflow.secret.LocalSecretsProvider@3b545206
Jan-09 14:12:01.583 [main] DEBUG nextflow.Session - Session UUID: f8b8bc40-5320-4bc8-ba45-67bd509b415e
Jan-09 14:12:01.583 [main] DEBUG nextflow.Session - Run name: infallible_pare
Jan-09 14:12:01.584 [main] DEBUG nextflow.Session - Executor pool size: 3
Jan-09 14:12:01.592 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
Jan-09 14:12:01.597 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jan-09 14:12:01.633 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 23.10.0 build 5889
  Created: 15-10-2023 15:07 UTC (08:07 PDT)
  System: Linux 4.18.0-372.26.1.el8_6.x86_64
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 17.0.6+10
  Encoding: UTF-8 (UTF-8)
  Process: 370367@galaxy.gladstone.internal [10.1.101.49]
  CPUs: 3 - Mem: 5.5 GB (1.9 GB) - Swap: 5.9 GB (5.3 GB)
Jan-09 14:12:01.672 [main] DEBUG nextflow.Session - Work-dir:/imaging-work/datastudy/work [nfs]
Jan-09 14:12:01.707 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Jan-09 14:12:01.718 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Jan-09 14:12:01.766 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Jan-09 14:12:01.777 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 4; maxThreads: 1000
Jan-09 14:12:01.983 [main] DEBUG nextflow.Session - Session start
Jan-09 14:12:02.205 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jan-09 14:12:02.251 [main] INFO  nextflow.Nextflow - B A S H  H E L L O  W O R L D !
===================================
bash input : Hello World!

Jan-09 14:12:02.317 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
Jan-09 14:12:02.318 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
Jan-09 14:12:02.329 [main] DEBUG nextflow.executor.Executor - [warm up] executor > slurm
Jan-09 14:12:02.335 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'slurm' > capacity: 100; pollInterval: 5s; dumpInterval: 5m 
Jan-09 14:12:02.337 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: slurm)
Jan-09 14:12:02.344 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
Jan-09 14:12:02.432 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: BASHEX
Jan-09 14:12:02.433 [main] DEBUG nextflow.Session - Igniting dataflow network (2)
Jan-09 14:12:02.433 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > BASHEX
Jan-09 14:12:02.435 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_1670075a3f17f6a4: /imaging-work/datastudy/simpletest.nf
Jan-09 14:12:02.435 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination 
Jan-09 14:12:02.435 [main] DEBUG nextflow.Session - Session await
Jan-09 14:12:02.667 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process BASHEX (Bash Script Test) > jobId: 2188; workDir:/imaging-work/datastudy/work/05/f29fcc996dd802aae2369df6eaae21
Jan-09 14:12:02.668 [Task submitter] INFO  nextflow.Session - [05/f29fcc] Submitted process > BASHEX (Bash Script Test)
Jan-09 14:12:07.382 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 2188; id: 1; name: BASHEX (Bash Script Test); status: COMPLETED; exit: 126; error: -; workDir: /imaging-work/datastudy/work/05/f29fcc996dd802aae2369df6eaae21 started: 1704838327354; exited: 2024-01-09T22:12:03.075067Z; ]
Jan-09 14:12:07.390 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=BASHEX (Bash Script Test); work-dir=/imaging-work/datastudy/work/05/f29fcc996dd802aae2369df6eaae21
  error [nextflow.exception.ProcessFailedException]: Process `BASHEX (Bash Script Test)` terminated with an error exit status (126)
Jan-09 14:12:07.444 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'BASHEX (Bash Script Test)'

Caused by:
  Process `BASHEX (Bash Script Test)` terminated with an error exit status (126)

Command executed:

  test.sh

Command exit status:
  126

Command output:
  (empty)

Command error:
  docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create?name=nxf-vfPUNFEXBmP4OS8RLgRNOlTh": dial unix /var/run/docker.sock: connect: permission denied.
  See 'docker run --help'.

Work dir:
  /imaging-work/datastudy/work/05/f29fcc996dd802aae2369df6eaae21

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
Jan-09 14:12:07.464 [main] DEBUG nextflow.Session - Session await > all processes finished
Jan-09 14:12:07.464 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `BASHEX (Bash Script Test)` terminated with an error exit status (126)
Jan-09 14:12:07.495 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jan-09 14:12:07.496 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: slurm) - terminating tasks monitor poll loop
Jan-09 14:12:07.500 [main] INFO  nextflow.Nextflow - Oops .. something went wrong
Jan-09 14:12:07.505 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=112ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=4; peakMemory=10 GB; ]
Jan-09 14:12:07.722 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jan-09 14:12:07.754 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Environment

  • Nextflow version: 23.10.0.5889
  • Java version: Temurin-17.0.6+10
  • Operating system: RHEL 8
  • Bash version: GNU bash, version 4.4.20(1)-release (x86_64-redhat-linux-gnu)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions