-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi Marti,
Thank you for developing this very helpful tool! I recently set it up on our HPC using the NCBI protein viral database with Diamond, blastx and it’s running smoothly.
I’m working with 92 FASTQ files (~71 GB total), and the job has been running for about 9 days without completing yet. I was wondering if you have any recommendations for speeding up the analysis, or if you could provide an estimate of how long a dataset of this size typically takes to process?
Thanks so much for your help!
####Slurm script###
#!/bin/bash
#SBATCH --job-name=marti_viral
#SBATCH --account=loni_virome2025
#SBATCH --partition=single
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --time=7-00:00:00
#SBATCH --output=slurm-%j.out-%N
#SBATCH --error=slurm-%j.err-%N
#SBATCH -D /work/kvigil/marti_out/ONR_viral
--- environment ---
set -euo pipefail
echo "== SLURM info =="
echo "JobID:
echo "CPUs: $SLURM_CPUS_PER_TASK"
MARTi & DIAMOND on PATH (adjust if needed)
export PATH="$HOME/MARTi/bin:$PATH"
which diamond && diamond --version || { echo "diamond not found on PATH"; exit 1; }
Fast node-local temp; DIAMOND uses this
export TMPDIR="/work/kvigil/tmp/${SLURM_JOB_ID}"
mkdir -p "$TMPDIR"
Config file
CONF="$HOME/marti_viral_diamond_longreads.conf"
Sanity: threads config should match cpus-per-task (2 jobs x 8 threads = 16)
grep -E 'LocalSchedulerMaxJobs|BlastThreads' "$CONF" || true
Recommended: resume safely from any partial state
rm -f /work/kvigil/marti_out/ONR_viral/progress.info # uncomment for a clean restart
echo "== Running MARTi =="
marti -config "$CONF" -loglevel
marti_viral_diamond_longreads.conf
ProcessBarcodes:
Scheduler:local
LocalSchedulerMaxJobs:2
InactivityTimeout:10
StopProcessingAfter:0
TaxonomyDir:/work/kvigil/db/taxdump
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50
ConvertFastQ
ReadsPerBlast:8000
ReadFilterMinQ:9
ReadFilterMinLength:500
BlastProcess
Name:diamond-nr
Program:diamond
Database:/work/kvigil/db/viral_proteins_tax.dmnd
MaxE:0.001
MaxTargetSeqs:100
BlastThreads:8
UseToClassify
Options: --ultra-sensitive --long-reads --frameshift 15 --range-culling --outfmt 6