Skip to content

Running into out of memory issues when working with a large number of files.  #4710

@karthikram-takara

Description

@karthikram-takara

Bug report

When a large number of files (>5000) are needed to be moved/copied over using publishDir, nextflow fails with the following error:

error [java.lang.OutOfMemoryError]: unable to create native thread: possibly out of memory or process/resource limits reached
Feb-01 14:38:05.816 [Task monitor] ERROR nextflow.processor.TaskProcessor - Execution aborted due to an unexpected error
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

From what I can see, the files are created in the workDir, but the failure only happens during the publishDir directive is being enforced.

I have already tried changing different parameters using NXF_OPTS, _JAVA_OPTIONS among others, none of them seem to help fix this issue.

Expected behavior and actual behavior

Expected behavior: output files are copied/moved to the publishDir.
Actual behavior: Pipeline fails with

ERROR ~ Execution aborted due to an unexpected error

 -- Check '.nextflow.log' file for details

with the above detailed error message printed in the log file.

Steps to reproduce the problem

To simulate the scenario, I created the following dummy script and the dummy pipeline that just creates 20,000 files and copies it to another directory using publishDir. I get the same error as above when running it. Dummy script and nextflow pipeline below:

import os

for i in range(20000):
    filename = f"file_{i}.txt"
    with open(filename, "w") as f:
        pass
#!/usr/bin/env nextflow

params.output_dir = ''

process create_files {
        publishDir "${params.output_dir}", mode:'copy'


        output:
                path("*.txt")

        script:
                """
                python /path_to_python_script/test_20000_files/create_20k_files.py
                """
}

workflow {
        create_files()
}

Program output

nextflow.log

Environment

  • Nextflow version: 23.10.1.5891
  • Java version: openjdk 21-internal 2023-09-19
  • Operating system: Linux
  • Bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions