-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
HI,
I am trying to build orthologous gene clusters for ~4000 E. coli genomes. I ran the following command on an HPC cluster node with 126 GB RAM and 6 threads:
panaroo -i input_panaroo_file.txt --threads 6 -o results --clean-mode strict --remove-invalid-genes
The job was killed by the system after 2.5 days of runtime. According to sacct, the memory usage reached the node limit:
JobID MaxRSS ReqMem Elapsed
34665.batch 127856448K 126000M 2-12:17:49
The last lines in the output and log file:
running cmd: cd-hit -T 6 -i results/combined_protein_CDS.fasta -o results/combined_protein_cdhit_out.txt -c 0.98 -s 0.98 -aL 0.0 -AL 99999999 -aS 0.0 -AS 99 999999 -M 0 -d 999 -g 1 -n 2
generating initial network...
Processing paralogs...
collapse mistranslations...
Processing depth: 1 ...
Processing depth: 2 ...
Processing depth: 3 ...
collapse gene families...
Processing depth: 1 ...
Processing depth: 2 ...
Processing depth: 3 ...
trimming contig ends...
refinding genes...
Number of searches to perform: 174718261
Searching...
1290it [5:35:30,14.43s/it]/var/spool/slurm/d/job34665/slurm_script: line 12: 271468 Killed
I am trying to figure out is it possible to complete this analysis with 126 GB. Maybe reducing number of threads or using a different --clean-mode will help? If it possible, how long do you think this analysis could take? Would appreciate any advice.
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels