Skip to content

Add GenoFLU dependencies#243

Merged
joverlee521 merged 1 commit intomasterfrom
genoflu-deps
Feb 20, 2025
Merged

Add GenoFLU dependencies#243
joverlee521 merged 1 commit intomasterfrom
genoflu-deps

Conversation

@joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Feb 20, 2025

Description of proposed changes

Add dependencies for GenoFLU to run in avian-flu:

  • ncbi-blast+
  • openpyxl (for pandas.read_excel)

Related issue(s)

Resolves #242

Checklist

  • Checks pass

Add dependencies for GenoFLU to run in avian-flu:
- ncbi-blast+
- openpyxl (for pandas.read_excel)

Resolves <#242>
Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I haven't tested, but I can if it'd be helpful.

@joverlee521
Copy link
Contributor Author

Awesome! I haven't tested, but I can if it'd be helpful.

Thanks! I left the example command in nextstrain/avian-flu#127 (comment).

less \
libgomp1 \
libsqlite3-0 \
ncbi-blast+ \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, just caught up on conversation in #127.

Worth noting this is installing https://packages.debian.org/bookworm/ncbi-blast+ which is v2.12.0.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this via the nextstrain/base:branch-genoflu-deps image and all works. As a bonus it's faster than the homebrew MacOS blast!

LGTM

@joverlee521 joverlee521 merged commit 2bea1bf into master Feb 20, 2025
61 checks passed
@joverlee521 joverlee521 deleted the genoflu-deps branch February 20, 2025 21:48
@joverlee521
Copy link
Contributor Author

Re: whether this significantly increases the image size:

os/arch latest size branch size
linux/amd64 704.38 MB 728.54 MB
linux/arm64 686.68 MB 710.64 MB

@jameshadfield
Copy link
Member

Pasting here the error you get if you try to run GenoFLU (which needs the blast tools added in this PR) with an older docker image:

        python ./vendored-GenoFLU-multi/bin/genoflu-multi.py             -f fauna/data/genoflu/             -n 1 > fauna/logs/run_genoflu.txt
        
cat: write error: Broken pipe
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/nextstrain/build/ingest/./vendored-GenoFLU-multi/bin/genoflu-multi.py", line 41, in run_genoflu
    genoflu.blast_hpai_genomes()
  File "/nextstrain/build/ingest/vendored-GenoFLU-multi/bin/genoflu.py", line 202, in blast_hpai_genomes
    blast_hpai_genotyping = Blast_Fasta(
  File "/nextstrain/build/ingest/vendored-GenoFLU-multi/bin/genoflu.py", line 70, in __init__
    with open(blastout_file, 'r') as blast_file:
FileNotFoundError: [Errno 2] No such file or directory: 'fauna/data/genoflu/temp/1/temp_blast_out.txt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nextstrain/build/ingest/./vendored-GenoFLU-multi/bin/genoflu-multi.py", line 197, in <module>
    pool_data = pool.starmap(run_genoflu, zip(split_strain_records, range(1,cores+1)))
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'fauna/data/genoflu/temp/1/temp_blast_out.txt'
[Sun Feb 23 22:23:03 2025]
Error in rule run_genoflu:
    jobid: 20
    input: fauna/data/genoflu/sequences_pb2.fasta, fauna/data/genoflu/sequences_pb1.fasta, fauna/data/genoflu/sequences_pa.fasta, fauna/data/genoflu/sequences_ha.fasta, fauna/data/genoflu/sequences_np.fasta, fauna/data/genoflu/sequences_na.fasta, fauna/data/genoflu/sequences_mp.fasta, fauna/data/genoflu/sequences_ns.fasta
    output: fauna/data/genoflu/results/results.tsv
    log: fauna/logs/run_genoflu.txt (check log file(s) for error details)
    shell:
        
        python ./vendored-GenoFLU-multi/bin/genoflu-multi.py             -f fauna/data/genoflu/             -n 1 > fauna/logs/run_genoflu.txt
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dependencies for GenoFLU to runtime

3 participants