Skip to content

Panaroo killed due to memory usage #362

@mikhailova-es

Description

@mikhailova-es

HI,
I am trying to build orthologous gene clusters for ~4000 E. coli genomes. I ran the following command on an HPC cluster node with 126 GB RAM and 6 threads:
panaroo -i input_panaroo_file.txt --threads 6 -o results --clean-mode strict --remove-invalid-genes

The job was killed by the system after 2.5 days of runtime. According to sacct, the memory usage reached the node limit:

JobID            MaxRSS     ReqMem    Elapsed
34665.batch   127856448K   126000M   2-12:17:49

The last lines in the output and log file:

running cmd: cd-hit -T 6 -i results/combined_protein_CDS.fasta -o results/combined_protein_cdhit_out.txt -c 0.98 -s 0.98 -aL 0.0 -AL 99999999 -aS 0.0 -AS 99 999999 -M 0 -d 999 -g 1 -n 2 
generating initial network... 
Processing paralogs... 
collapse mistranslations... 
Processing depth: 1 ... 
Processing depth: 2 ... 
Processing depth: 3 ... 
collapse gene families... 
Processing depth: 1 ... 
Processing depth: 2 ... 
Processing depth: 3 ... 
trimming contig ends... 
refinding genes... 
Number of searches to perform: 174718261 
Searching... 
1290it [5:35:30,14.43s/it]/var/spool/slurm/d/job34665/slurm_script: line 12: 271468 Killed 

I am trying to figure out is it possible to complete this analysis with 126 GB. Maybe reducing number of threads or using a different --clean-mode will help? If it possible, how long do you think this analysis could take? Would appreciate any advice.
Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions