Skip to content

Commit 1420679

Browse files
authored
Merge pull request #1146 from taylorpaisie/tkp-bindashtree
Add bindashtree v0.1.0
2 parents afab061 + 1b21b85 commit 1420679

File tree

4 files changed

+142
-0
lines changed

4 files changed

+142
-0
lines changed

Program_Licenses.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ The licenses of the open-source software that is contained in these Docker image
2121
| BBTools | non-standard - see `licence.txt` and `legal.txt` that is included in docker image under `/bbmap/docs/`; Also on sourceforge repo for BBTools | https://jgi.doe.gov/disclaimer/ |
2222
| bcftools | MIT & **GNU GPLv3** | https://github.com/samtools/bcftools/blob/develop/LICENSE |
2323
| bedtools | MIT | https://github.com/arq5x/bedtools2/blob/master/LICENSE |
24+
| bindashtree | MIT | https://github.com/jianshu93/bindashtree?tab=MIT-1-ov-file#readme |
2425
| blast+ | Public Domain | https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/scripts/projects/blast/LICENSE |
2526
| bowtie2 | GNU GPLv3 | https://github.com/BenLangmead/bowtie2/blob/master/LICENSE |
2627
| Bracken | GNU GPLv3 | https://github.com/jenniferlu717/Bracken/blob/master/LICENSE |

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ To learn more about the docker pull rate limits and the open source software pro
129129
| [bcftools](https://hub.docker.com/r/staphb/bcftools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bcftools)](https://hub.docker.com/r/staphb/bcftools) | <ul><li>[1.10.2](./bcftools/1.10.2/)</li><li>[1.11](./bcftools/1.11/)</li><li>[1.12](./bcftools/1.12/)</li><li>[1.13](./bcftools/1.13/)</li><li>[1.14](./bcftools/1.14/)</li><li>[1.15](./bcftools/1.15/)</li><li>[1.16](./bcftools/1.16/)</li><li>[1.17](./bcftools/1.17/)</li><li>[1.18](bcftools/1.18/)</li><li>[1.19](./bcftools/1.19/)</li><li>[1.20](./bcftools/1.20/)</li><li>[1.20.c](./bcftools/1.20.c/)</li><li>[1.21](./bcftools/1.21/)</li></ul> | https://github.com/samtools/bcftools |
130130
| [bedtools](https://hub.docker.com/r/staphb/bedtools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bedtools)](https://hub.docker.com/r/staphb/bedtools) | <ul><li>2.29.2</li><li>2.30.0</li><li>[2.31.0](bedtools/2.31.0/)</li><li>[2.31.1](bedtools/2.31.1/)</li></ul> | https://bedtools.readthedocs.io/en/latest/ <br/>https://github.com/arq5x/bedtools2 |
131131
| [berrywood-report-env](https://hub.docker.com/r/staphb/berrywood-report-env/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/berrywood-report-env)](https://hub.docker.com/r/staphb/berrywood-report-env) | <ul><li>1.0</li></ul> | none |
132+
| [bindashtree](https://hub.docker.com/r/staphb/bindashtree/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bindashtree)](https://hub.docker.com/r/staphb/bindashtree) | <ul><li>[0.1.0](./build-files/bindashtree/0.1.0/)</li></ul> | https://github.com/jianshu93/bindashtree |
132133
| [blast+](https://hub.docker.com/r/staphb/blast/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/blast)](https://hub.docker.com/r/staphb/blast) | <ul><li>[2.13.0](blast/2.13.0/)</li><li>[2.14.0](blast/2.14.0/)</li><li>[2.14.1](blast/2.14.1/)</li><li>[2.15.0](blast/2.15.0/)</li><li>[2.16.0](./blast/2.16.0/)</li></ul> | https://www.ncbi.nlm.nih.gov/books/NBK279690/ |
133134
| [bowtie2](https://hub.docker.com/r/staphb/bowtie2/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bowtie2)](https://hub.docker.com/r/staphb/bowtie2) | <ul><li>[2.4.4](./bowtie2/2.4.4/)</li><li>[2.4.5](./bowtie2/2.4.5/)</li><li>[2.5.1](./bowtie2/2.5.1/)</li><li>[2.5.3](./bowtie2/2.5.3/)</li><li>[2.5.4](./bowtie2/2.5.4/)</li></ul> | http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml <br/>https://github.com/BenLangmead/bowtie2 |
134135
| [Bracken](https://hub.docker.com/r/staphb/bracken/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bracken)](https://hub.docker.com/r/staphb/bracken) | <ul><li>[2.9](./bracken/2.9)</li></ul> | https://ccb.jhu.edu/software/bracken/index.shtml?t=manual <br/>https://github.com/jenniferlu717/Bracken |
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Stage 1: Build stage
2+
FROM ubuntu:focal AS builder
3+
4+
# Set global variables
5+
ARG BINDASHTREE_VER="0.1.0"
6+
7+
# Update package manager and install necessary tools
8+
RUN apt-get update && apt-get install -y --no-install-recommends \
9+
wget \
10+
curl \
11+
build-essential \
12+
gcc \
13+
pkg-config \
14+
libssl-dev \
15+
ca-certificates \
16+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
17+
18+
# Install Rust and Cargo using rustup
19+
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y && \
20+
export PATH="$HOME/.cargo/bin:$PATH" && \
21+
rustup default stable
22+
23+
# Ensure Rust and Cargo are available
24+
ENV PATH="/root/.cargo/bin:$PATH"
25+
26+
# Download, extract, and build bindashtree
27+
RUN wget https://github.com/jianshu93/bindashtree/archive/refs/tags/v${BINDASHTREE_VER}.tar.gz && \
28+
tar -xzvf v${BINDASHTREE_VER}.tar.gz && \
29+
cd bindashtree-${BINDASHTREE_VER} && \
30+
/root/.cargo/bin/cargo build --release
31+
32+
# Stage 2: Final image
33+
FROM ubuntu:focal AS app
34+
ARG BINDASHTREE_VER="0.1.0"
35+
36+
# Install wget for test stage compatibility
37+
RUN apt-get update && apt-get install -y --no-install-recommends \
38+
wget \
39+
ca-certificates \
40+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
41+
42+
# Labels for metadata
43+
LABEL base.image="ubuntu:focal" \
44+
dockerfile.version="1" \
45+
software="bindashtree" \
46+
software.version="${BINDASHTREE_VER}" \
47+
description="Binwise Densified MinHash and Rapid Neighbor-joining Tree Construction for microbial genomes." \
48+
website="https://github.com/jianshu93/bindashtree" \
49+
license.url="https://github.com/jianshu93/bindashtree?tab=MIT-1-ov-file#readme" \
50+
maintainer="Taylor K. Paisie" \
51+
maintainer.email="ltj8@cdc.gov"
52+
53+
# Copy built binaries from the builder stage
54+
COPY --from=builder /bindashtree-${BINDASHTREE_VER}/target/release/bindashtree /usr/local/bin/
55+
56+
CMD ["bindashtree", "--help"]
57+
58+
WORKDIR /data
59+
60+
# Stage 3: Test stage
61+
FROM app AS test
62+
63+
# Set working directory
64+
WORKDIR /data/test
65+
66+
# Install wget if not installed (redundancy for safety)
67+
RUN apt-get update && apt-get install -y --no-install-recommends \
68+
wget \
69+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
70+
71+
# Download test files
72+
RUN wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/587/385/GCA_002587385.1_ASM258738v1/GCA_002587385.1_ASM258738v1_genomic.fna.gz && \
73+
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/596/765/GCA_002596765.1_ASM259676v1/GCA_002596765.1_ASM259676v1_genomic.fna.gz && \
74+
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/598/005/GCA_002598005.1_ASM259800v1/GCA_002598005.1_ASM259800v1_genomic.fna.gz
75+
76+
RUN ls /data/test/*.fna.gz > name.txt
77+
78+
#### for highly similar genomes, e.g., > 99.9% ANI, a large sketch size should be used. -s 10204 works well for ANI below 99%.
79+
RUN bindashtree -i name.txt -k 16 -s 10240 -d 1 -t 8 --output_tree try.nwk
80+
81+
FROM app
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# bindashtree container
2+
3+
Main tool: [bindashtree](https://github.com/jianshu93/bindashtree)
4+
5+
Code repository: https://github.com/jianshu93/bindashtree
6+
7+
Basic information on how to use this tool:
8+
- executable: |
9+
10+
```
11+
Binwise Densified MinHash and Rapid Neighbor-joining Tree Construction
12+
13+
Usage: bindashtree [OPTIONS] --input <INPUT_LIST_FILE> --output_tree <OUTPUT_TREE_FILE>
14+
15+
Options:
16+
-i, --input <INPUT_LIST_FILE>
17+
Genome list file (one FASTA/FNA file per line), gz supported
18+
-k, --kmer_size <KMER_SIZE>
19+
K-mer size [default: 16]
20+
-s, --sketch_size <SKETCH_SIZE>
21+
MinHash sketch size [default: 10240]
22+
-d, --densification <DENS_OPT>
23+
Densification strategy: 0=Optimal Densification, 1=Reverse Optimal Densification/faster Densification [default: 0]
24+
-t, --threads <THREADS>
25+
Number of threads to use in parallel [default: 1]
26+
--tree <TREE_METHOD>
27+
Tree construction method: naive, rapidnj, hybrid [default: rapidnj]
28+
--chunk_size <chunk_size>
29+
Chunk size for RapidNJ/Hybrid methods [default: 30]
30+
--naive_percentage <naive_percentage>
31+
Percentage of steps naive for hybrid method [default: 90]
32+
--output_matrix <OUTPUT_MATRIX_FILE>
33+
Output the phylip distance matrix to a file
34+
--output_tree <OUTPUT_TREE_FILE>
35+
Output the resulting tree in Newick format to a file
36+
-h, --help
37+
Print help
38+
-V, --version
39+
Print version
40+
```
41+
42+
Additional information:
43+
One Permutation Hashing with Optimal Densification can be use for genomic distance estimation (1-ANI) and then we can perform rapid neighbor-joining based on the genomic distance. We also provided a new densification strategy called faster densification (or reverse optimal densification), which is more accurate and faster for large sketch size.
44+
45+
46+
Full documentation: https://github.com/jianshu93/bindashtree
47+
48+
## Testing for bindashtree
49+
50+
```
51+
# Download test files
52+
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/587/385/GCA_002587385.1_ASM258738v1/GCA_002587385.1_ASM258738v1_genomic.fna.gz && \
53+
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/596/765/GCA_002596765.1_ASM259676v1/GCA_002596765.1_ASM259676v1_genomic.fna.gz && \
54+
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/598/005/GCA_002598005.1_ASM259800v1/GCA_002598005.1_ASM259800v1_genomic.fna.gz
55+
56+
ls /data/test/*.fna.gz > name.txt
57+
58+
bindashtree -i name.txt -k 16 -s 10240 -d 1 -t 8 --output_tree try.nwk
59+
```

0 commit comments

Comments
 (0)