Publishing large file (1.9TB) after process takes very long (6 days) with publish mode copy. #2866
Replies: 1 comment
-
UpdateI did some more digging into the source code and it seems the earlier mentioned publishDir "output", mode: "copy" Since I don't know how to test / improve copying speed in this case, I decided to create a workaround myself. workaround solutionFirst, let me show the process wrapdemux {
label "process_medium"
publishDir "${params.tosenddir}", pattern: "*.{tar,txt}", saveAs: {filename -> params.splitprojectsarchive && params.hassampleproject? "$project_name/$filename": filename}, mode: "copy"
publishDir "${params.logdir}/${task.process}/${task.hash}", pattern: ".*", mode: "copy"
input:
tuple val(project_name), val(samples), path(fastq_files)
output:
tuple val(project_name), val(samples), path("*.{tar,txt}")
path(".*")
script:
"""
# code to generate .tar and .txt files
"""
} I decided to mimic the process {
withName: wrapdemux {
afterScript = {
if(params.hassampleproject && params.splitprojectsarchive) {
copylocation = params.tosenddir + "/" +project_name
} else {
copylocation = params.tosenddir
}
"""
mkdir -p ${copylocation}
find . -name "*.txt" -exec cp -fRL '{}' ${copylocation} \\;
find . -name "*.tar" -exec cp -fRL '{}' ${copylocation} \\;
"""
}
}
} This We don't want to completely remove the To solve this, we make use of how nextflow prioritizes Since we publish the process {
withName: wrapdemux {
afterScript = {
if(params.hassampleproject && params.splitprojectsarchive) {
copylocation = params.tosenddir + "/" +project_name
} else {
copylocation = params.tosenddir
}
"""
mkdir -p ${copylocation}
find . -name "*.txt" -exec cp -fRL '{}' ${copylocation} \\;
find . -name "*.tar" -exec cp -fRL '{}' ${copylocation} \\;
"""
}
publishDir = [
path: {"${params.logdir}/${task.process}/${task.hash}"},
pattern: ".*",
mode: "copy"
]
}
} Now, I added the above code in a separate config file named profiles {
local_server {
includeConfig "conf/send-tar.config"
}
} I hope this can be helpful if someone runs into a similar problem. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Nextflow information
nextflow version 21.10.6.5660
Executor: Local.
Run with docker.
System used:
Problem
We have a process in our pipeline that creates a tar archive. Nextflow takes a very long time (6 days) to publish large tar file (1.9TB) when using the
publishDir
directive with publish mode:copy
.The nextflow command terminated with the following warning message:
At this point, we didn’t get our command line prompt back and the file transfer was still ongoing.
Questions
Which command and parameters are used by Nextflow to copy the files when specifying
copy
option inpublishDir
directive?From what we saw of the source code and the
.command.run
file we assume it’s just acp -fRL
command on the host system.When manually copying the same source file to the same destination, the copying is done in 3 hours, when it took nextflow several days. What would be the reason for this?
Extra information
Cp from inside NextFlow (cp -fRL ?) LocalSource (work directory) remote share (7200RPM HDD)
Bandwith Out +/- 45 Mb/s
Cp -fRL LocalSource (work directory) RemoteShare (7200RPM HDD)
Bandwith Out +/- 1.5Gb/s
If you have any questions or need more information from the log files, please ask.
Beta Was this translation helpful? Give feedback.
All reactions