diff --git a/CONTRIBUTORS.yaml b/CONTRIBUTORS.yaml index aafde063aab68e..bc1fe114274f27 100644 --- a/CONTRIBUTORS.yaml +++ b/CONTRIBUTORS.yaml @@ -3083,6 +3083,14 @@ VerenaMoo: name: Verena Moosmann joined: 2024-12 +vinisalazar: + name: Vini Salazar + joined: 2025-10 + orcid: 0000-0002-8362-3195 + affiliations: + - unimelb + - melbournebioinformatics + vivekbhr: name: Vivek Bhardwaj joined: 2017-09 diff --git a/topics/microbiome/tutorials/metagenomics-binning/comebin_version.md b/topics/microbiome/tutorials/metagenomics-binning/comebin_version.md new file mode 100644 index 00000000000000..910f2b7f66803e --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/comebin_version.md @@ -0,0 +1,35 @@ +## COMEbin + +COMEbin is a relatively new binner that has shown remarkably strong performance in recent benchmarking studies. +However, it also has several drawbacks. +* Due to its implementation, it cannot operate reliably on small test datasets, and therefore we cannot include it in this tutorial. +* It requires substantial computational resources and long runtimes. +* The tool also suffers from other technical issues that can cause runs to fail. + +These problems cannot be resolved on the Galaxy side, and the tool is currently only lightly maintained upstream. + +Nevertheless, because COMEbin can produce some of the best-performing bins when it runs successfully, we still mention it here. It may yield excellent results on real biological datasets and is available in Galaxy. + +> Do not run COMEBin +> +> As said: Due to its implementation, it cannot operate reliably on small test datasets, and therefore we cannot include it in this tutorial. Do not run it on the tutorial dataset — it will fail. +> +{: .warning} + + +### Bin contigs using COMEbin + +> Individual binning of short reads with COMEbin +> +> 1. {% tool [COMEBin](toolshed.g2.bx.psu.edu/repos/iuc/comebin/comebin/1.0.4+galaxy1) %} with the following parameters: +> - {% icon param-collection %} *"Metagenomic assembly file"*: `Contigs` (Input dataset collection) +> - {% icon param-file %} *"Input bam file(s)"*: `Reads` (output of **Samtools sort** {% icon tool %}) +> +> > Parameters +> > +> > The Batch size should be less then the number of contigs. But if this is the case for the batch size of 1014 your input data is likely too small to run with this tool ! +> {: .comment} +> +{: .hands_on} + + diff --git a/topics/microbiome/tutorials/metagenomics-binning/concoct_version.md b/topics/microbiome/tutorials/metagenomics-binning/concoct_version.md new file mode 100644 index 00000000000000..be1101a11e8598 --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/concoct_version.md @@ -0,0 +1,126 @@ +## CONCOCT + +In this tutorial version we will learn how to use **CONCOCT** {%cite Alneberg2014%} through Galaxy. **CONCOCT** is an *unsupervised metagenomic binner* that groups contigs using both **sequence characteristics** and **differential coverage across multiple samples**. In contrast to SemiBin, it does **not** rely on pretrained models or marker-gene constraints; instead, it clusters contig fragments purely based on statistical similarities. + +> CONCOCT jointly models contig abundance profiles from multiple samples using a Gaussian mixture model. By taking advantage of differences in coverage across samples, it can separate genomes with similar sequence composition but distinct abundance patterns. CONCOCT also introduced the now-standard technique of splitting contigs into fixed-length fragments, allowing more consistent and accurate clustering. +> {: .quote author="Alneberg et al., 2014" } + +CONCOCT is widely used in metagenomic binning due to: + +* **Unsupervised probabilistic clustering** + No marker genes, labels, or pretrained models are required. +* **Strong performance with multiple samples** + Differential coverage helps disentangle closely related genomes. +100 +* **Reproducible, transparent workflow** + Its stepwise pipeline—fragmentation, coverage estimation, clustering—yields interpretable results. +* **Complementarity to other binners** + Frequently used alongside SemiBin, MetaBAT2, or MaxBin2 in ensemble pipelines (e.g., MetaWRAP, nf-core/mag). + +### Why preprocessing steps (such as cutting contigs) are required + +CONCOCT relies heavily on preprocessing because its Gaussian mixture model treats **each contig fragment** as an individual data point. One key preprocessing step is **cutting contigs into equal-sized fragments**, typically around 10 kb. Fragmenting contigs helps balance the influence of long versus short contigs, generates uniform data points for statistical modeling, detects local variation or potential misassemblies within long contigs, and improves the resolution of abundance differences across genomes. This fragmentation is therefore mandatory for CONCOCT to function correctly. + +After fragmentation, the **coverage of each fragment** is computed across all samples, providing a measure of abundance that CONCOCT uses alongside sequence statistics. These coverage profiles, together with basic sequence features, are then used as input to the **Gaussian mixture model clustering**, which groups fragments into bins. Once fragments are clustered, the results are mapped back to the original contigs to assign each contig to a specific bin. + +Although CONCOCT produces a table assigning contigs to bins, it does not generate FASTA files for each bin by default. To obtain these sequences for downstream analyses, the tool `CONCOCT: Extract a FASTA file` is used. This tool takes the original contig FASTA and CONCOCT’s cluster assignments, extracts all contigs belonging to a chosen bin, and outputs a **FASTA file representing a single MAG**. This extraction step is essential to work with reconstructed genomes in subsequent analyses. + +### Bin contigs using CONCOCT + +> Cut up contigs +> +> In this step we fragment the assembled contigs into fixed-length pieces, which CONCOCT requires for stable and consistent clustering. +> +> 1. {% tool [CONCOCT: Cut up contigs](toolshed.g2.bx.psu.edu/repos/iuc/concoct_cut_up_fasta/concoct_cut_up_fasta/1.1.0+galaxy2) %} with the following parameters: +> +> * {% icon param-collection %} *"Fasta contigs file"*: `Contigs` (Input dataset collection) +> +> * *"Concatenate final part to last contig?"*: `Yes` +> +> * *"Output bed file with exact regions of the original contigs corresponding to the newly created contigs?"*: `Yes` +> +> > Why this step? +> > +> > CONCOCT requires contigs to be split into equal-sized fragments. This prevents long contigs from dominating the clustering and increases resolution by allowing variation inside long contigs to be captured. +> > {: .comment} +{: .hands_on} + + +> Generate coverage table +> +> This step computes coverage values for each contig fragment across all samples. CONCOCT uses these differential coverage profiles as one of the main signals for clustering. +> +> 1. {% tool [CONCOCT: Generate the input coverage table](toolshed.g2.bx.psu.edu/repos/iuc/concoct_coverage_table/concoct_coverage_table/1.1.0+galaxy2) %} with the following parameters: +> +> * {% icon param-file %} *"Contigs BEDFile"*: `output_bed` (output of **CONCOCT: Cut up contigs** {% icon tool %}) +> * *"Type of assembly used to generate the contigs"*: `Individual assembly: 1 run per BAM file` +> +> * {% icon param-file %} *"Sorted BAM file"*: `output1` (output of **Samtools sort** {% icon tool %}) +> +> > Why this step? +> > +> > CONCOCT relies on variation in abundance across samples. The coverage table generated here provides this information and is essential for identifying contigs that co-vary in abundance. +> > {: .comment} +{: .hands_on} + +> Run CONCOCT +> +> Here we perform the actual CONCOCT clustering. Using both coverage and sequence information, CONCOCT assigns contig fragments to genome bins. +> +> 1. {% tool [CONCOCT](toolshed.g2.bx.psu.edu/repos/iuc/concoct/concoct/1.1.0+galaxy2) %} with the following parameters: +> +> * {% icon param-file %} *"Coverage file"*: `output` (output of **CONCOCT: Generate the input coverage table** {% icon tool %}) +> * {% icon param-file %} *"Composition file with sequences"*: `output_fasta` (output of **CONCOCT: Cut up contigs** {% icon tool %}) +> * In *"Advanced options"*: +> +> * *"Read length for coverage"*: `{'id': 1, 'output_name': 'output'}` +> +> > Why this step? +> > +> > This is the core of the CONCOCT workflow. The Gaussian mixture model groups contig fragments into clusters representing draft genomes (bins). +> > {: .comment} +{: .hands_on} + +> Merge fragment clusters +> +> Since CONCOCT clusters the **fragments**, we must merge them back to produce cluster assignments for the original contigs. +> +> 1. {% tool [CONCOCT: Merge cut clusters](toolshed.g2.bx.psu.edu/repos/iuc/concoct_merge_cut_up_clustering/concoct_merge_cut_up_clustering/1.1.0+galaxy2) %} with the following parameters: +> +> * {% icon param-file %} *"Clusters generated by CONCOCT"*: `output_clustering` (output of **CONCOCT** {% icon tool %}) +> +> > Why this step? +> > +> > This step translates fragment-level cluster assignments into contig-level bin assignments—necessary for producing actual MAGs. +> > {: .comment} +{: .hands_on} + + +> Extract MAG FASTA files +> +> In this final step we extract the contigs belonging to each bin and create FASTA files representing the reconstructed genomes (MAGs). +> +> 1. {% tool [CONCOCT: Extract a fasta file](toolshed.g2.bx.psu.edu/repos/iuc/concoct_extract_fasta_bins/concoct_extract_fasta_bins/1.1.0+galaxy2) %} with the following parameters: +> +> * {% icon param-collection %} *"Original contig file"*: `output` (Input dataset collection) +> +> * {% icon param-file %} *"CONCOCT clusters"*: `output` (output of **CONCOCT: Merge cut clusters** {% icon tool %}) +> +> > Why this step? +> > +> > This tool extracts the contigs belonging to each CONCOCT cluster and outputs them as FASTA files. These represent your preliminary MAGs and can now be evaluated and refined. +> > {: .comment} +{: .hands_on} + +> Binning metrics +> +> 1. How many bins where produced by MaxBin2 for our sample? +> 2. How many contigs are in the bin with most contigs? +> > +> > +> > 1. There are 10 bins for this sample. +> > 2. 50 - while all other bins only contain one contig each ! +> > +> {: .solution} +> +{: .question} \ No newline at end of file diff --git a/topics/microbiome/tutorials/metagenomics-binning/images/Binning_Benchmark.png b/topics/microbiome/tutorials/metagenomics-binning/images/Binning_Benchmark.png new file mode 100644 index 00000000000000..5b04e1601f12b8 Binary files /dev/null and b/topics/microbiome/tutorials/metagenomics-binning/images/Binning_Benchmark.png differ diff --git a/topics/microbiome/tutorials/metagenomics-binning/images/CAMI_Binners.png b/topics/microbiome/tutorials/metagenomics-binning/images/CAMI_Binners.png new file mode 100644 index 00000000000000..64d2f43c6379e9 Binary files /dev/null and b/topics/microbiome/tutorials/metagenomics-binning/images/CAMI_Binners.png differ diff --git a/topics/microbiome/tutorials/metagenomics-binning/binning.png b/topics/microbiome/tutorials/metagenomics-binning/images/binning.png similarity index 100% rename from topics/microbiome/tutorials/metagenomics-binning/binning.png rename to topics/microbiome/tutorials/metagenomics-binning/images/binning.png diff --git a/topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md b/topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md new file mode 100644 index 00000000000000..5844d48a739257 --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md @@ -0,0 +1,50 @@ +## MaxBin2 + +In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes. + +## Bin contigs using MaxBin2 + +> Calculate contig depths +> +> 1. {% tool [Calculate contig depths](toolshed.g2.bx.psu.edu/repos/iuc/metabat2_jgi_summarize_bam_contig_depths/metabat2_jgi_summarize_bam_contig_depths/2.17+galaxy0) %} with the following parameters: +> - *"Mode to process BAM files"*: `One by one` +> - {% icon param-file %} *"Sorted bam files"*: output of **Samtools sort** {% icon tool %} +> - *"Select a reference genome?"*: `No` +> +> > Why not use bam directly +> > +> > MetaBAT and MaxBin2 only accept per-contig depth tables because that is the specific input format their binning algorithm requires. +> > BAM files contain read-level alignment data. +> > These binners need summarized, contig-level coverage statistics. +> {: .comment} +> +{: .hands_on} + +> Individual binning of short-reads with MaxBin2 +> +> 1. {% tool [MaxBin2](toolshed.g2.bx.psu.edu/repos/mbernt/maxbin2/maxbin2/2.2.7+galaxy6) %} with the following parameters: +> - {% icon param-collection %} *"Contig file"*: `Contigs` (Input dataset collection) +> - *"Assembly type used to generate contig(s)"*: `Assembly of sample(s) one by one (individual assembly)` +> - *"Input type"*: `Abundances` +> - {% icon param-file %} *"Abundance file"*: `outputDepth` (output of **Calculate contig depths** {% icon tool %}) +> - In *"Outputs"*: +> - *"Generate visualization of the marker gene presence numbers"*: `Yes` +> - *"Output marker gene presence for bins table"*: `Yes` +> - *"Output marker genes for each bin as fasta"*: `Yes` +> - *"Output log"*: `Yes` +> +> +{: .hands_on} + +> Binning metrics +> +> 1. How many bins where produced by MaxBin2 for our sample? +> 2. How many contigs are in the bin with most contigs? +> > +> > +> > 1. There are two bin for this sample. +> > 2. 35 and 24 in the other bin. +> > +> {: .solution} +> +{: .question} \ No newline at end of file diff --git a/topics/microbiome/tutorials/metagenomics-binning/metabet2_version.md b/topics/microbiome/tutorials/metagenomics-binning/metabet2_version.md new file mode 100644 index 00000000000000..bbda266b10201c --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/metabet2_version.md @@ -0,0 +1,65 @@ +## MetaBAT 2 + +In this tutorial version we will learn how to use **MetaBAT 2** {%cite Kang2019%} tool through Galaxy. **MetaBAT** stands for "Metagenome Binning based on Abundance and Tetranucleotide frequency". It is: + +> Grouping large fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Here we developed automated metagenome binning software, called MetaBAT, which integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency. On synthetic datasets MetaBAT on average achieves 98percent precision and 90% recall at the strain level with 281 near complete unique genomes. Applying MetaBAT to a human gut microbiome data set we recovered 176 genome bins with 92% precision and 80% recall. Further analyses suggest MetaBAT is able to recover genome fragments missed in reference genomes up to 19%, while 53 genome bins are novel. In summary, we believe MetaBAT is a powerful tool to facilitate comprehensive understanding of complex microbial communities. +{: .quote author="Kang et al, 2019" } + +MetaBAT is a popular software tool for metagenomics binning, and there are several reasons why it is often used: +- *High accuracy*: MetaBAT uses a combination of tetranucleotide frequency, coverage depth, and read linkage information to bin contigs, which has been shown to be highly accurate and efficient. +- *Easy to use*: MetaBAT has a user-friendly interface and can be run on a standard desktop computer, making it accessible to a wide range of researchers with varying levels of computational expertise. +- *Flexibility*: MetaBAT can be used with a variety of sequencing technologies, including Illumina, PacBio, and Nanopore, and can be applied to both microbial and viral metagenomes. +- *Scalability*: MetaBAT can handle large-scale datasets, and its performance has been shown to improve with increasing sequencing depth. +- *Compatibility*: MetaBAT outputs MAGs in standard formats that can be easily integrated into downstream analyses and tools, such as taxonomic annotation and functional prediction. + +### Bin contigs using MetaBAT 2 + +> Calculate contig depths +> +> 1. {% tool [Calculate contig depths](toolshed.g2.bx.psu.edu/repos/iuc/metabat2_jgi_summarize_bam_contig_depths/metabat2_jgi_summarize_bam_contig_depths/2.17+galaxy0) %} with the following parameters: +> - *"Mode to process BAM files"*: `One by one` +> - {% icon param-file %} *"Sorted bam files"*: output of **Samtools sort** {% icon tool %} +> - *"Select a reference genome?"*: `No` +> +> > Why not use bam directly +> > +> > MetaBAT only accepts per-contig depth tables because that is the specific input format its binning algorithm requires. +> > BAM files contain read-level alignment data. +> > MetaBAT needs summarized, contig-level coverage statistics. This is also the case for MaxBin2. +> {: .comment} +> +{: .hands_on} + +> Individual binning of short-reads with MetaBAT 2 +> 1. {% tool [MetaBAT 2](toolshed.g2.bx.psu.edu/repos/iuc/metabat2/metabat2/2.17+galaxy0) %} with parameters: +> - *"Fasta file containing contigs"*: `Contigs` +> - In **Advanced options**, keep all as **default**. +> - In **Output options:** +> - *"Save cluster memberships as a matrix format?"*: `"Yes"` +> +{: .hands_on} + +The output files generated by MetaBAT 2 include (some of the files below are optional and not produced unless it is required by the user): + +1. The final set of genome bins in FASTA format (`.fa`) +2. A summary file with information on each genome bin, including its length, completeness, contamination, and taxonomy classification (`.txt`) +3. A file with the mapping results showing how each contig was assigned to a genome bin (`.bam`) +4. A file containing the abundance estimation of each genome bin (`.txt`) +5. A file with the coverage profile of each genome bin (`.txt`) +6. A file containing the nucleotide composition of each genome bin (`.txt`) +7. A file with the predicted gene sequences of each genome bin (`.faa`) + +These output files can be further analyzed and used for downstream applications such as functional annotation, comparative genomics, and phylogenetic analysis. + +> Binning metrics +> +> 1. How many bins where produced by MetaBAT 2 for our sample? +> 2. How many contigs are in the bin with most contigs? +> > +> > +> > 1. There is only one bin for this sample. +> > 2. 52 (these numbers may differ slightly depending on the version of MetaBAT2). So not all contigs where binned into this bin ! +> > +> {: .solution} +> +{: .question} \ No newline at end of file diff --git a/topics/microbiome/tutorials/metagenomics-binning/semibin_version.md b/topics/microbiome/tutorials/metagenomics-binning/semibin_version.md new file mode 100644 index 00000000000000..496293d1a951d1 --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/semibin_version.md @@ -0,0 +1,70 @@ +## SemiBin + +In this tutorial version we will learn how to use **SemiBin** {%cite Pan2022%} through Galaxy. **SemiBin** is a *semi-supervised deep learning method* for metagenomic binning. It uses both **must-link** and **cannot-link** constraints derived from single-copy marker genes to guide binning, allowing higher accuracy than purely unsupervised methods. + +> Metagenome binning is essential for recovering high-quality metagenome-assembled genomes (MAGs) from environmental samples. SemiBin applies a semi-supervised Siamese neural network that learns from both contig features and automatically generated constraints. It has been shown to recover more high-quality and near-complete genomes than MetaBAT2, MaxBin2, or VAMB across multiple benchmark datasets. SemiBin also supports single-sample, co-assembly, and multi-sample binning workflows, demonstrating excellent scalability and versatility. +> {: .quote author="Pan et al., 2022" } + +SemiBin is increasingly popular for metagenomic binning due to: + +* **Higher reconstruction quality** + SemiBin usually recovers **more high-quality and near-complete MAGs** than traditional binners, including MetaBAT2. + +* **Semi-supervised learning** + It combines deep learning with automatically generated constraints to better separate similar genomes. + +* **Flexible binning modes** + Works with: + + * individual samples + * co-assemblies + * multi-sample binning + +* **Support for multiple environments** + SemiBin includes trained models for: + + * human gut + * dog gut + * ocean + * soil + …plus a *generic pretrained model*. + +### Bin contigs using SemiBin + +> Individual binning of short reads with SemiBin +> +> 1. {% tool [SemiBin](toolshed.g2.bx.psu.edu/repos/iuc/semibin/semibin/2.1.0+galaxy1) %} with the following parameters: +> +> * *"Binning mode"*: `Single sample binning (each sample is assembled and binned independently)` +> +> * {% icon param-collection %} *"Contig sequences"*: `Contigs` (Input dataset collection) +> * {% icon param-file %} *"Read mapping to the contigs"*: output of **Samtools sort** {% icon tool %} +> * *"Reference database"*: `Use SemiBin ML function` +> * *"Environment for the built-in model"*: `` +> +> * *"Method to set up the minimal length for contigs in binning"*: `Automatic` +> +> > Environment for the built-in model +> > +> > SemiBin provides several pretrained models. If a model matching your environment is available, selecting it can improve binning performance. +> > +> > If no environment-specific model fits your data, you may choose: +> > +> > * **Global** — a general-purpose pretrained model trained across many environments. +> > * **None** — no pretrained model is used. SemiBin then runs in fully unsupervised mode, which is recommended when your environment differs substantially from all available pretrained models. +> {: .comment} +> +{: .hands_on} + +> Binning metrics +> +> 1. How many bins where produced by SemiBin for our sample? +> 2. How many contigs are in the bin with most contigs? +> > +> > +> > 1. There is only one bin for this sample. +> > 2. 50 (these numbers may differ slightly depending on the version of SemiBin). So not all contigs where binned into this bin ! +> > +> {: .solution} +> +{: .question} \ No newline at end of file diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib b/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib index 18aac5e68c40ef..218335ca6f765b 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib @@ -1,3 +1,89 @@ +@article{nissen2021improved, + title={Improved metagenome binning and assembly using deep variational autoencoders}, + author={Nissen, Jakob Nybo and Johansen, Joachim and Alles{\o}e, Rosa Lundbye and S{\o}nderby, Casper Kaae and Armenteros, Jose Juan Almagro and Gr{\o}nbech, Christopher Heje and Jensen, Lars Juhl and Nielsen, Henrik Bj{\o}rn and Petersen, Thomas Nordahl and Winther, Ole and others}, + journal={Nature biotechnology}, + volume={39}, + number={5}, + pages={555--560}, + year={2021}, + publisher={Nature Publishing Group US New York}, + doi = {10.1038/s41587-020-00777-4}, +} + +@article{NatureBinner2025, + author = {Author, A. and Author, B. and Author, C.}, + title = {Comprehensive benchmarking of metagenomic binners across diverse environments}, + journal = {Nature Communications}, + year = {2025}, + volume = {16}, + pages = {57957}, + doi = {10.1038/s41467-025-57957-6} +} + +@article{Meyer2022, + author = {Meyer, Fernando and Fritz, Adrian and Deng, Zhi‑Luo and Koslicki, David and Lesker, Till Robin and Gurevich, Alexey and Robertson, Gary and Alser, Mohammed and Antipov, Dmitry and Beghini, Francesco and Bertrand, Denis and Brito, Jaqueline J. and Brown, C. Titus and Buchmann, Jan and Buluç, Aydin and Chen, Bo and Chikhi, Rayan and Clausen, Philip T.L.C. and Cristian, Alexandru and Dabrowski, Piotr W. and Darling, Aaron E. and Egan, Rob and Eskin, Eleazar and Georganas, Evangelos and Goltsman, Eugene and Gray, Melissa A. and Hansen, Lars Hestbjerg and Hofmeyr, Steven and Huang, Pingqin and Irber, Luiz and Jia, Huijue and Jørgensen, Tue Sparholt and Kieser, Silas D. and Klemetsen, Terje and Kola, Axel and Kolmogorov, Mikhail and Korobeynikov, Anton and Kwan, Jason and LaPierre, Nathan and Lemaitre, Claire and Li, Chenhao and Limasset, Antoine and Malcher‑Miranda, Fabio and Mangul, Serghei and Marcelino, Vanessa R. and Marchet, Camille and Marijon, Pierre and Meleshko, Dmitry and Mende, Daniel R. and Milanese, Alessio and Nagarajan, Niranjan and Nissen, Jakob and Nurk, Sergey and Oliker, Leonid and Paoli, Lucas and Peterlongo, Pierre and Piro, Vitor C. and Porter, Jacob S. and Rasmussen, Simon and Rees, Evan R. and Reinert, Knut and Renard, Bernhard and Robertsen, Espen Mikal and Rosen, Gail L. and Ruscheweyh, Hans‑Joachim and Sarwal, Varuni and Segata, Nicola and Seiler, Enrico and Shi, Lizhen and Sun, Fengzhu and Sunagawa, Shinichi and Sørensen, Søren Johannes and Thomas, Ashleigh and Tong, Chengxuan and Trajkovski, Mirko and Tremblay, Julien and Uritskiy, Gherman and Vicedomini, Riccardo and Wang, Zhengyang and Wang, Ziye and Wang, Zhong and Warren, Andrew and Willassen, Nils Peder and Yelick, Katherine and You, Ronghui and Zeller, Georg and Zhao, Zhengqiao and Zhu, Shanfeng and Zhu, Jie and Garrido‑Oter, Ruben and Gastmeier, Petra and Hacquard, Stephane and Häußler, Susanne and Khaledi, Ariane and Maechler, Friederike and Mesny, Fantin and Radutoiu, Simona and Schulze‑Lefert, Paul and Smit, Nathiana and Strowig, Till and Bremges, Andreas and Sczyrba, Alice Carolyn McHardy}, + title = {Critical Assessment of Metagenome Interpretation: the second round of challenges}, + journal = {Nature Methods}, + year = {2022}, + volume = {19}, + number = {4}, + pages = {429–440}, + doi = {10.1038/s41592-022-01431-4} +} + +@article{Wang2024COMEBin, + author = {Wang, Ziye and You, Ronghui and Han, Haitao and Liu, Wei and Sun, Fengzhu and Zhu, Shanfeng}, + title = {Effective binning of metagenomic contigs using contrastive multi‑view representation learning}, + journal = {Nature Communications}, + year = {2024}, + volume = {15}, + article = {585}, + doi = {10.1038/s41467-023-44290-z}, + url = {https://doi.org/10.1038/s41467-023-44290-z} +} + +@article{Chklovski2023CheckM2, + title = {CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning}, + author = {Alex Chklovski and Donovan H. Parks and Ben J. Woodcroft and Gene W. Tyson}, + journal = {Nature Methods}, + year = {2023}, + volume = {20}, + number = {8}, + pages = {1203--1212}, + doi = {10.1038/s41592-023-01940-w} +} + +@article{Mainguy2024Binette, + author = {Mainguy, Jean and Hoede, Claire}, + title = {Binette: a fast and accurate bin refinement tool to construct high‐quality Metagenome Assembled Genomes}, + journal = {Journal of Open Source Software}, + year = {2024}, + volume = {9}, + number = {102}, + pages = {6782}, + doi = {10.21105/joss.06782} +} + +@article{Sieber2018DASTool, + author = {Sieber, Christopher M. K. and Probst, Alexander J. and Sharrar, Amanda and Thomas, Benjamin C. and Hess, Michelle and Tringe, Susannah G. and Banfield, Jillian F.}, + title = {Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy}, + journal = {Nature Microbiology}, + year = {2018}, + volume = {3}, + pages = {836--843}, + doi = {10.1038/s41564-018-0171-1} +} + +@article{CAMIChallenge2017, + author = {Sczyrba, A. and Hofmann, P. and Belmann, P. and et al.}, + title = {Critical Assessment of Metagenome Interpretation—A benchmark of metagenomics software}, + journal = {Nature Methods}, + year = {2017}, + volume = {14}, + pages = {1063--1071}, + doi = {10.1038/nmeth.4458} +} + @article{maxbin2015, author = {Wu, Yu-Wei and Simmons, Blake A. and Singer, Steven W.}, title = "{MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets}", diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md index d79801cc5ac05b..d18eb020786eba 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md @@ -1,20 +1,17 @@ --- layout: tutorial_hands_on title: Binning of metagenomic sequencing data -zenodo_link: https://zenodo.org/record/7818827 -extra: - zenodo_link_results: https://zenodo.org/record/7845138 -level: Introductory +zenodo_link: https://zenodo.org/records/17660820 +level: Intermediate questions: - What is metagenomic binning refers to? -- Which tools should be used for metagenomic binning? -- How to assess the quality of metagenomic data binning? +- Which tools may be used for metagenomic binning? +- How to assess the quality of metagenomic binning? objectives: -- Describe what metagenomics binning is -- Describe common problems in metagenomics binning -- What software tools are available for metagenomics binning -- Binning of contigs into metagenome-assembled genomes (MAGs) using MetaBAT 2 software -- Evaluation of MAG quality and completeness using CheckM software +- Describe what is metagenomics binning. +- Describe common challenges in metagenomics binning. +- Perform metagenomic binning using MetaBAT 2 software. +- Evaluation of MAG quality and completeness using CheckM software. time_estimation: 2H key_points: - Metagenomics binning is a computational approach to grouping together DNA sequences @@ -30,8 +27,15 @@ key_points: of research areas, such as human health, environmental microbiology, and biotechnology contributions: authorship: + - paulzierep - npechl - fpsom + - vinisalazar +requirements: + - type: internal + topic_name: microbiome + tutorials: + - metagenomics-assembly subtopic: metagenomics tags: - binning @@ -56,11 +60,16 @@ recordings: --- - Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other. The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins. +> +> Before starting this tutorial, it is recommended to do the [**Metagenomics Assembly Tutorial**]({% link topics/microbiome/tutorials/metagenomics-assembly/tutorial.md %}) +{: .comment} + +## Binning approaches + There are several approaches to metagenomics binning, including: - **Sequence composition-based binning**: This method is based on the observation that different genomes have distinct sequence composition patterns, such as GC content or codon usage bias. By analyzing these patterns in metagenomic data, sequence fragments can be assigned to individual genomes or groups of genomes. @@ -73,10 +82,9 @@ There are several approaches to metagenomics binning, including: - **Supervised machine learning-based binning**: This method uses machine learning algorithms trained on annotated reference genomes to classify metagenomic data into bins. This approach can achieve high accuracy but requires a large number of annotated genomes for training. -Each of these methods has its strengths and limitations, and the choice of binning method depends on the specific characteristics of the metagenomic data set and the research question being addressed. - +## Binning challanges -**Metagenomics binning is a complex process that involves many steps and can be challenging due to several problems that can occur during the process**. Some of the most common problems encountered in metagenomics binning include: +Metagenomic binning is a complex process that involves many steps and can be challenging due to several problems that can occur during the process. Some of the most common problems encountered in metagenomic binning include: - **High complexity**: Metagenomic samples contain DNA from multiple organisms, which can lead to high complexity in the data. - **Fragmented sequences**: Metagenomic sequencing often generates fragmented sequences, which can make it difficult to assign reads to the correct bin. @@ -86,29 +94,60 @@ Each of these methods has its strengths and limitations, and the choice of binni - **Chimeric sequences**: Sequences that are the result of sequencing errors or contamination can lead to chimeric sequences, which can make it difficult to accurately bin reads. - **Strain variation**: Organisms within a species can exhibit significant genetic variation, which can make it difficult to distinguish between different strains in a metagenomic sample. -There are plenty of computational tools to perform metafenomics binning. Some of the most widely used include: +## Common binners + +There are plenty of algorithms that perform metagenomic binning. Some of the most widely used include: + +* **MaxBin** ({%cite maxbin2015%}): A popular de novo binning algorithm that uses a combination of sequence features and marker genes to cluster contigs into genome bins. +* **MetaBAT** ({%cite Kang2019%}): Another widely used de novo binning algorithm that employs a hierarchical clustering approach based on tetranucleotide frequency and coverage information. +* **CONCOCT** ({%cite Alneberg2014%}): A de novo binning tool that uses a clustering algorithm based on sequence composition and coverage information to group contigs into genome bins. +* **MyCC** ({%cite Lin2016%}): A reference-based binning tool that uses sequence alignment to identify contigs belonging to the same genome or taxonomic group. +* **GroopM** ({%cite Imelfort2014%}): A hybrid binning tool that combines reference-based and de novo approaches to achieve high binning accuracy. +* **SemiBin** ({%cite Pan2022%}): A command-line tool for metagenomic binning with deep learning; handles both short and long reads. +* **Vamb** ({%cite nissen2021improved%}): An algorithm that uses variational autoencoders (VAEs) to encode sequence composition and coverage information. +* **ComeBin** ({%cite Wang2024COMEBin%}): A metagenomic binning tool that integrates both composition and abundance features with machine learning-based clustering to improve binning accuracy across complex microbial communities. + +## Bin refinement + +There are also bin refinement tools, which can evaluate, combine, and improve the raw bins produced by primary binners such as MetaBAT2, CONCOCT, MaxBin2, or SemiBin. These tools help remove contamination, merge complementary bins, and recover higher-quality MAGs. + +* **MetaWRAP** ({%cite Uritskiy2018%}): + A comprehensive metagenomic analysis pipeline that includes modules for quality control, assembly, binning (wrapping multiple binners), refinement, reassembly, and annotation. Provides an easy-to-use framework for producing high-quality MAGs from raw reads. + +* **DAS Tool** ({%cite Sieber2018DASTool%}): + A bin-refinement tool that combines results from multiple binners (e.g., MetaBAT2, MaxBin2, CONCOCT, SemiBin) into a consensus set of optimized, non-redundant bins. DAS Tool improves overall bin quality by integrating strength from several algorithms. + +* **Binnette** ({%cite Mainguy2024Binette%}): + Binette is a fast and accurate bin refinement tool that constructs high-quality MAGs from the outputs of multiple binning tools. It generates hybrid bins using set operations on overlapping contigs — intersection, difference, and union — and evaluates their quality with CheckM2 to select the best bins. Compared to metaWRAP, Binette is faster and can process an unlimited number of input bin sets, making it highly scalable for large and complex metagenomic datasets. + +## Anvi’o: Interactive bin refinement + +**Anvi’o** ({%cite Eren2015%}) is a platform for **interactive visualization and manual refinement** of metagenomic bins. While it can run automated binning (defaulting to **CONCOCT**), its main strength lies in allowing users to: + +* Inspect contig-level coverage, GC content, and single-copy gene presence +* Visualize connections between contigs in a network view +* Manually merge, split, or reassign contigs to improve bin completeness and reduce contamination +* Annotate bins and link them to taxonomic or functional information + +This interactive approach is particularly useful when automated binning produces ambiguous or low-quality bins, enabling **high-confidence MAG reconstruction**. + +## So many options, what binner to use ? + +Each of these binning methods has its own strengths and limitations, and the choice of a binning tool often depends on the characteristics of the metagenomic dataset and the research question. Practical guidance on which binner to use for specific datasets and environments can be drawn from benchmark studies such as {%cite NatureBinner2025%}. + +!["Benchmark for many Binners"](./images/Binning_Benchmark.png "Benchmark of multiple Binners on Activated sludge and Human Gut Microbiome, taken from {%cite NatureBinner2025%}"){:width="60%"} + +Additionally, the CAMI I and II challenges provide standardized simulated datasets that highlight the strengths and weaknesses of different binners, helping researchers select the most appropriate tool for their analysis. -- **MaxBin** ({%cite maxbin2015%}): A popular de novo binning algorithm that uses a combination of sequence features and marker genes to cluster contigs into genome bins. -- **MetaBAT** ({%cite Kang2019%}): Another widely used de novo binning algorithm that employs a hierarchical clustering approach based on tetranucleotide frequency and coverage information. -- **CONCOCT** ({%cite Alneberg2014%}): A de novo binning tool that uses a clustering algorithm based on sequence composition and coverage information to group contigs into genome bins. -- **MyCC** ({%cite Lin2016%}): A reference-based binning tool that uses sequence alignment to identify contigs belonging to the same genome or taxonomic group. -- **GroopM** ({%cite Imelfort2014%}): A hybrid binning tool that combines reference-based and de novo approaches to achieve high binning accuracy. -- **MetaWRAP** ({%cite Uritskiy2018%}): A comprehensive metagenomic analysis pipeline that includes various modules for quality control, assembly, binning, and annotation. -- **Anvi'o** ({%cite Eren2015%}): A platform for visualizing and analyzing metagenomic data, including features for binning, annotation, and comparative genomics. -- **SemiBin** ({%cite Pan2022%}): A command tool for metagenomic binning with deep learning, handles both short and long reads. +!["Benchmark for many Binners based on CAMI"](./images/CAMI_Binners.png "Benchmark of multiple Binners in the CAMI challenge, taken from {%cite Meyer2022%}"){:width="60%"} -A benchmark study of metagenomics software can be found at {%cite Sczyrba2017%}. MetaBAT 2 outperforms previous MetaBAT and other alternatives in both accuracy and computational efficiency . All are based on default parameters ({%cite Sczyrba2017%}). +A general approach is to perform binning using multiple binners that have shown good performance for the specific dataset, followed by bin refinement to generate an improved bin set that retains the best bins from the analysis. -**In this tutorial, we will learn how to run metagenomic binning tools and evaluate the quality of the results**. In order to do that, we will use data from the study: [Temporal shotgun metagenomic dissection of the coffee fermentation ecosystem](https://www.ebi.ac.uk/metagenomics/studies/MGYS00005630#overview) and MetaBAT 2 algorithm. MetaBAT is a popular software tool for metagenomics binning, and there are several reasons why it is often used: -- *High accuracy*: MetaBAT uses a combination of tetranucleotide frequency, coverage depth, and read linkage information to bin contigs, which has been shown to be highly accurate and efficient. -- *Easy to use*: MetaBAT has a user-friendly interface and can be run on a standard desktop computer, making it accessible to a wide range of researchers with varying levels of computational expertise. -- *Flexibility*: MetaBAT can be used with a variety of sequencing technologies, including Illumina, PacBio, and Nanopore, and can be applied to both microbial and viral metagenomes. -- *Scalability*: MetaBAT can handle large-scale datasets, and its performance has been shown to improve with increasing sequencing depth. -- *Compatibility*: MetaBAT outputs MAGs in standard formats that can be easily integrated into downstream analyses and tools, such as taxonomic annotation and functional prediction. +Does using more binners always improve results? In practice, one must also consider computational resources and time constraints. Running many binners can be very time-consuming and resource-intensive, especially for large studies. In some cases, adding extra binners does not lead to a meaningful increase in bin quality, so the choice of binners should be made carefully. Overall, identifying the optimal combination of binners remains an active area of research, and clear, widely accepted guidelines are still being established. -For an in-depth analysis of the structure and functions of the coffee microbiome, a temporal shotgun metagenomic study (six time points) was performed. The six samples have been sequenced with Illumina MiSeq utilizing whole genome sequencing. +# Mock binning dataset for this training -Based on the 6 original dataset of the coffee fermentation system, we generated mock datasets for this tutorial. +Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases. > > @@ -121,13 +160,30 @@ Based on the 6 original dataset of the coffee fermentation system, we generated # Prepare analysis history and data -MetaBAT 2 takes metagenomic sequencing data as input, typically in the form of assembled contigs in fasta format and coverage information in bam format. Specifically, MetaBAT 2 requires two input files: +Metagenomic binners take typically two data typs as input: assembled contigs in fasta format and coverage information in bam format. - A fasta file containing the assembled contigs, which can be generated from raw metagenomic sequencing reads using an assembler such as MEGAHIT, SPAdes, or IDBA-UD. - A bam file containing the read coverage information for each contig, which can be generated from the same sequencing reads using mapping software such as Bowtie2 or BWA. -MetaBAT 2 also requires a configuration file specifying various parameters and options for the binning process, such as the minimum contig length, the maximum number of clusters to generate, and the maximum expected contamination level. +> Can Bins be generated without coverage information +> +> Not all binners require coverage information — some, like MetaBAT2, can operate using only genomic composition (e.g. tetranucleotide frequencies) when coverage files are not available. This is especially useful for single-sample datasets or legacy data where coverage cannot easily be calculated. +> +> Other tools that support composition-only binning include: +> - **MaxBin 2** (can run with composition alone, but performs better with depth) +> - **SolidBin** (supports single-sample binning based on sequence features) +> - **VAMB** (primarily uses deep learning on composition, coverage optional) +> +> That said, including coverage information generally increases binning accuracy, especially for: +> - Differentiating closely related strains +> - Datasets with uneven abundance +> - Multi-sample metagenomics workflows (e.g. differential coverage binning) +> +> In summary: yes, it’s possible to bin without coverage, but coverage-aware workflows are recommended when available, as they reduce contamination and improve completeness. +> +{: .comment} + To run binning, we first need to get the data into Galaxy. Any analysis should get its own Galaxy history. So let's start by creating a new one: @@ -149,15 +205,10 @@ In case of a not very large dataset it's more convenient to upload data directly > Upload data into Galaxy > -> 2. Import the sequence read data (\*.fasta) from [Zenodo]({{ page.zenodo_link }}) or a data library: +> 1. Import the contig file from [Zenodo]({{ page.zenodo_link }}) or a data library: > > ```text -> {{ page.zenodo_link }}/files/contigs_ERR2231567.fasta -> {{ page.zenodo_link }}/files/contigs_ERR2231568.fasta -> {{ page.zenodo_link }}/files/contigs_ERR2231569.fasta -> {{ page.zenodo_link }}/files/contigs_ERR2231570.fasta -> {{ page.zenodo_link }}/files/contigs_ERR2231571.fasta -> {{ page.zenodo_link }}/files/contigs_ERR2231572.fasta +> {{ page.zenodo_link }}/files/MEGAHIT_contigs.fasta > ``` > > {% snippet faqs/galaxy/datasets_import_via_link.md %} @@ -168,12 +219,58 @@ In case of a not very large dataset it's more convenient to upload data directly > > In case of large dataset, we can use FTP server or the [Galaxy Rule-based Uploader]({% link topics/galaxy-interface/tutorials/upload-rules/tutorial.md %}). > {: .comment} > -> 3. Create a collection named `Raw reads`, rename your pairs with the sample name +> 2. Create a collection named `Contigs` > > {% snippet faqs/galaxy/collections_build_list.md %} > +> 3. Also import the raw reads in fastq format (\*.fasta) from [Zenodo]({{ page.zenodo_link }}) or a data library: +> +> ```text +> {{ page.zenodo_link }}/files/reads_forward.fastqsanger.gz +> {{ page.zenodo_link }}/files/reads_reverse.fastqsanger.gz +> ``` +> 4. Create a collection named `Reads` +> +> {% snippet faqs/galaxy/collections_build_list_paired.md %} +{: .hands_on} + +> Why do we use collections here? +> In this tutorial, collections are not strictly necessary because we are working with only one contig file and its paired-end reads. However, in real metagenomic studies, it is common to process many samples—sometimes hundreds or even thousands—and in those cases, collections become essential for managing data efficiently. +> +> It is generally good practice to first test a workflow on a small subset of the data (for example, a collection containing only a single sample) to ensure that the tools run correctly and the parameters are appropriate before launching thousands of jobs on Galaxy. +{: .comment} + +# Preparation for binning + +As explained before we need coverage information in bam format as a requirement for all binners. Some binners need a specific format for the coverage information, but this will be covered in the version specific to the desired binner. For now we will map the quality controled reads to the contigs to get a bam file with the coverage information. This bam file also needs to be sorted for the downstream binners. + +Make sure the reads are quality controlled. E.g. following the QC toturial TODO. + +> Map reads to contigs +> +> 1. {% tool [Bowtie2](toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.4+galaxy0) %} with the following parameters: +> - *"Is this single or paired library"*: `Paired-end` +> - {% icon param-collection %} *"FASTQ Paired Dataset"*: `Reads` (Input dataset collection) +> - *"Do you want to set paired-end options?"*: `No` +> - *"Will you select a reference genome from your history or use a built-in index?"*: `Use a genome from the history and build index` +> - {% icon param-collection %} *"Select reference genome"*: `Contigs` (Input dataset collection) +> - *"Set read groups information?"*: `Do not set` +> - *"Select analysis mode"*: `1: Default setting only` +> - *"Do you want to tweak SAM/BAM Options?"*: `No` +> - *"Save the bowtie2 mapping statistics to the history"*: `Yes` +> {: .hands_on} +> Sort bam files +> +> 1. {% tool [Samtools sort](toolshed.g2.bx.psu.edu/repos/devteam/samtools_sort/samtools_sort/2.0.7) %} with the following parameters: +> - {% icon param-file %} *"BAM File"*: output of **Bowtie2** {% icon tool %} +> - *"Primary sort key"*: `coordinate` +> +{: .hands_on} + +The sorted bam file can be used as input for any of the binning tools. + # Binning As explained before, there are many challenges to metagenomics binning. The most common of them are listed below: @@ -186,125 +283,117 @@ As explained before, there are many challenges to metagenomics binning. The most - Chimeric sequences. - Strain variation. -![Image show the binning process where sequences are grouped together based on genome signatures like the kmer profiles of each contig, contig coverage, or GC content](./binning.png "Binning"){:width="60%"} +![Metagenomic binning involves grouping contigs into 'bins' based on sequence composition, coverage, or other properties.](./images/binning.png "Metagenomic binning involves grouping contigs into 'bins' based on sequence composition, coverage, or other properties."){:width="60%"} -In this tutorial we will learn how to use **MetaBAT 2** {%cite Kang2019%} tool through Galaxy. **MetaBAT** stands for "Metagenome Binning based on Abundance and Tetranucleotide frequency". It is: +In this tutorial, we offer dedicated versions, which highlight each of the following binners: -> Grouping large fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Here we developed automated metagenome binning software, called MetaBAT, which integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency. On synthetic datasets MetaBAT on average achieves 98percent precision and 90% recall at the strain level with 281 near complete unique genomes. Applying MetaBAT to a human gut microbiome data set we recovered 176 genome bins with 92% precision and 80% recall. Further analyses suggest MetaBAT is able to recover genome fragments missed in reference genomes up to 19%, while 53 genome bins are novel. In summary, we believe MetaBAT is a powerful tool to facilitate comprehensive understanding of complex microbial communities. -{: .quote author="Kang et al, 2019" } +{% include _includes/cyoa-choices.html option1="MetaBAT2" option2="MaxBin2" option3="SemiBin" option4="CONCOCT" option5="COMEBin" default="MetaBAT2" %} -We will use the uploaded assembled fasta files as input to the algorithm (For simplicity reasons all other parameters will be preserved with their default values). +
+{% include topics/microbiome/tutorials/metagenomics-binning/metabet2_version.md %} +
+
+{% include topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md %} +
+
+{% include topics/microbiome/tutorials/metagenomics-binning/semibin_version.md %} +
+
+{% include topics/microbiome/tutorials/metagenomics-binning/concoct_version.md %} +
+
+{% include topics/microbiome/tutorials/metagenomics-binning/comebin_version.md %} +
-> Individual binning of short-reads with MetaBAT 2 -> 1. {% tool [MetaBAT 2](toolshed.g2.bx.psu.edu/repos/iuc/megahit/megahit/1.2.9+galaxy0) %} with parameters: -> - *"Fasta file containing contigs"*: `assembly fasta files` -> +# Bin refinement + +Now, that you have produced bins with your favorite Binning algorithms you can refine the recovered bins. +Therefore, you need to convert the bins into a contig to bin mapping table, combine the tables from each binner into one collection and +use Binette to creat consensus bins. An alternative tool would be {% tool [DAS Tool](toolshed.g2.bx.psu.edu/repos/iuc/das_tool/das_tool/1.1.7+galaxy1) %} which is also available in Galaxy. + +For the refinement we will use the bins created by all the binners used before. If you do not want to run them all by yourself, +we have provided the results here as well. + +> Get result bins +> +> 1. Import the contig file from [Zenodo]({{ page.zenodo_link }}) or a data library: +> +> ```text +> {{ page.zenodo_link }}/files/semibin_0.fasta +> {{ page.zenodo_link }}/files/maxbin_0.fasta +> {{ page.zenodo_link }}/files/maxbin_1.fasta +> {{ page.zenodo_link }}/files/metabat_0.fasta +> {{ page.zenodo_link }}/files/concoct_1.fasta +> {{ page.zenodo_link }}/files/concoct_2.fasta +> {{ page.zenodo_link }}/files/concoct_3.fasta +> {{ page.zenodo_link }}/files/concoct_4.fasta +> {{ page.zenodo_link }}/files/concoct_5.fasta +> {{ page.zenodo_link }}/files/concoct_6.fasta +> {{ page.zenodo_link }}/files/concoct_7.fasta +> {{ page.zenodo_link }}/files/concoct_8.fasta +> {{ page.zenodo_link }}/files/concoct_9.fasta +> ``` +> +> 2. Create a collection for each bin set called e.g. maxbin, semibin ... by selecting only the bins created by this binner and creating a collection: > +> {% snippet faqs/galaxy/collections_build_list.md %} {: .hands_on} -The output files generated by MetaBAT 2 include (some of the files below are optional and not produced unless it is required by the user): +Once each bin set is converting into one collection they can be converted into a contig to bin mapping table. Perform this step for every bin set. -1. The final set of genome bins in FASTA format (`.fa`) -2. A summary file with information on each genome bin, including its length, completeness, contamination, and taxonomy classification (`.txt`) -3. A file with the mapping results showing how each contig was assigned to a genome bin (`.bam`) -4. A file containing the abundance estimation of each genome bin (`.txt`) -5. A file with the coverage profile of each genome bin (`.txt`) -6. A file containing the nucleotide composition of each genome bin (`.txt`) -7. A file with the predicted gene sequences of each genome bin (`.faa`) +> Convert the bins into a contig to bin mapping table +> +> 1. {% tool [Converts genome bins in fasta format](toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1) %} with the following parameters: +> - {% icon param-file %} *"Bin sequences"*: `bins` (output of any of the binners {% icon tool %}) +> +{: .hands_on} -These output files can be further analyzed and used for downstream applications such as functional annotation, comparative genomics, and phylogenetic analysis. -> +> Build a list of the binning tables > -> Since the binning process would take some we are just going to import the results of the binning previously run. +> 1. {% tool [Build list](__BUILD_LIST__) %} with the following parameters: +> - In *"Dataset"*: +> - {% icon param-repeat %} *"Insert Dataset"* +> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of SemiBin {% icon tool %}) +> - *"Label to use"*: `Index` +> - {% icon param-repeat %} *"Insert Dataset"* +> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of MetaBAT2 {% icon tool %}) +> - *"Label to use"*: `Index` +> - {% icon param-repeat %} *"Insert Dataset"* +> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of MaxBin2 {% icon tool %}) +> - *"Label to use"*: `Index` +> - {% icon param-repeat %} *"Insert Dataset"* +> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of CONCOCT {% icon tool %}) +> - *"Label to use"*: `Index` > -> > Import generated assembly files -> > -> > 1. Import the six folders containg binning result files from [Zenodo]({{ page.extra.zenodo_link_results }}) or the Shared Data library: -> > -> > ```text -> > {{ page.extra.zenodo_link_results }}/files/26_%20MetaBAT2%20on%20data%20ERR2231567_%20Bins.zip -> > {{ page.extra.zenodo_link_results }}/files/38_%20MetaBAT2%20on%20data%20ERR2231568_%20Bins.zip -> > {{ page.extra.zenodo_link_results }}/files/47_%20MetaBAT2%20on%20data%20ERR2231569_%20Bins.zip -> > {{ page.extra.zenodo_link_results }}/files/57_%20MetaBAT2%20on%20data%20ERR2231570_%20Bins.zip -> > {{ page.extra.zenodo_link_results }}/files/65_%20MetaBAT2%20on%20data%20ERR2231571_%20Bins.zip -> > {{ page.extra.zenodo_link_results }}/files/74_%20MetaBAT2%20on%20data%20ERR2231572_%20Bins.zip -> > ``` -> > -> > -> > 2. Create a collection named `MEGAHIT Contig`, rename your pairs with the sample name -> > -> {: .hands_on} -{: .comment} +{: .hands_on} -> +> Refine with Binette > -> 1. How many bins has been for ERR2231567 sample? -> 2. How many sequences are contained in the second bin? +> 1. {% tool [Binette](toolshed.g2.bx.psu.edu/repos/iuc/binette/binette/1.2.0+galaxy0) %} with the following parameters: +> - {% icon param-file %} *"Input contig table"*: `output` (output of **Build list** {% icon tool %}) +> - {% icon param-collection %} *"Input contig file"*: `output` (Input dataset collection) +> - *"Select if database should be used either via file or cached database"*: `cached database` +> +{: .hands_on} + +> Bin refinement +> +> 1. How many bins are left after refinement ? > > > > > -> > 1. There are 6 bins identified -> > 2. 167 sequences are classified into the second bin. +> > 1. Two bins are left. Most contigs from different bins where combined into one bin. There is still one single contig bin left. > > > {: .solution} > {: .question} -# De-replication - -De-replication is the process of identifying sets of genomes that are the "same" in a list of genomes, and removing all but the “best” genome from each redundant set. How similar genomes need to be to be considered “same”, how to determine which genome is “best”, and other important decisions are discussed in [Important Concepts](https://drep.readthedocs.io/en/latest/choosing_parameters.html). - -A common use for genome de-replication is the case of individual assembly of metagenomic data. If metagenomic samples are collected in a series, a common way to assemble the short reads is with a “co-assembly”. That is, combining the reads from all samples and assembling them together. The problem with this is assembling similar strains together can severely fragment assemblies, precluding recovery of a good genome bin. An alternative option is to assemble each sample separately, and then “de-replicate” the bins from each assembly to make a final genome set. - -![Image shows the process of individual assembly on two strains and five samples, after individual assembly of samples two samples are chosen for de-replication process. In parallel, co-assembly on all five samples is performed](./individual-assembly.png "Individual assembly followed by de-replication vs co-assembly"){:width="80%"} - -MetaBAT 2 does not explicitly perform dereplication in the sense of identifying groups of identical or highly similar genomes in a given dataset. Instead, MetaBAT 2 focuses on improving the accuracy of binning by leveraging various features such as read coverage, differential coverage across samples, and sequence composition. It aims to distinguish between different genomes present in the metagenomic dataset and assign contigs to the appropriate bins. - -Several tools have been designed for the proccess of de-replication. **`dRep`** is a software tool designed for the dereplication of genomes in metagenomic datasets. The goal is to retain a representative set of genomes to improve downstream analyses, such as taxonomic profiling and functional annotation. - -An typical workflow of how `dRep` works for dereplication in metagenomics includes: - -- *Genome Comparison*: `dRep` uses a pairwise genome comparison approach to assess the similarity between genomes in a given metagenomic dataset. - -- *Clustering*: Based on the genome similarities, `dRep` performs clustering to group similar genomes into "genome clusters." Each cluster represents a group of closely related genomes. - -- *Genome Quality Assessment*: `dRep` evaluates the quality of each genome within a cluster. It considers factors such as completeness, contamination, and strain heterogeneity. - -- *Genome Selection*: Within each genome cluster, `dRep` selects a representative genome based on user-defined criteria. This representative genome is considered as the "dereplicated" version of the cluster. - -- *Dereplication Output*: The output of `dRep` includes information about the dereplicated genomes, including their identity, completeness, and contamination. The user can choose a threshold for genome similarity to control the level of dereplication. - -> General list of actions for de-replication -> 1. Create new history -> 2. Assemble each sample separately using your favorite assembler -> 3. Perform a co-assembly to catch low-abundance microbes -> 4. Bin each assembly separately using your favorite binner -> 5. Bin co-assembly using your favorite binner -> 6. Pull the bins from all assemblies together -> 7. rRun **`dRep`** on them -> 8. Perform downstream analysis on the de-replicated genome list -> -{: .hands_on} - - # Checking the quality of the bins Once binning is done, it is important to check its quality. -Binning results can be evaluated with **CheckM** ({%cite Parks2015%}). CheckM is a software tool used in metagenomics binning to assess the completeness and contamination of genome bins. Metagenomics binning is the process of separating DNA fragments from a mixed community of microorganisms into individual bins, each representing a distinct genome. +Binning results can be evaluated with **CheckM** ({%cite Parks2015%}). CheckM is a software tool used in metagenomics binning to assess the completeness and contamination of genome bins. CheckM compares the genome bins to a set of universal single-copy marker genes that are present in nearly all bacterial and archaeal genomes. By identifying the presence or absence of these marker genes in the bins, CheckM can estimate the completeness of each genome bin (i.e., the percentage of the total set of universal single-copy marker genes that are present in the bin) and the degree of contamination (i.e., the percentage of marker genes that are found in more than one bin). @@ -324,48 +413,17 @@ Based on the previous analysis we will use **CheckM lineage_wf**: *Assessing the `CheckM lineage_wf` is a specific workflow within the CheckM software tool that is used for taxonomic classification of genome bins based on their marker gene content. This workflow uses a reference database of marker genes and taxonomic information to classify the genome bins at different taxonomic levels, from domain to species. +Now you can investigate the completeness and contamination of any of your previously generated genome bins as well as the refined set. + > Assessing the completeness and contamination of genome bins using lineage-specific marker sets with `CheckM lineage_wf` > 1. {% tool [CheckM lineage_wf](toolshed.g2.bx.psu.edu/repos/iuc/checkm_lineage_wf/checkm_lineage_wf/1.2.0+galaxy0) %} with parameters: > - *"Bins"*: `Folder containing the produced bins` -> -> {: .hands_on} -> -> -> Since the CheckM process would take some time we are just going to import the results: -> -> > Import generated `CheckM lineage_wf` results -> > -> > 1. Import the `CheckM lineage_wf` report files from [Zenodo]({{ page.extra.zenodo_link_results }}) or the Shared Data library: -> > -> > ```text -> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231567__Bin_statistics.txt -> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231568__Bin_statistics.txt -> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231569__Bin_statistics.txt -> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231570__Bin_statistics.txt -> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231571__Bin_statistics.txt -> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231572__Bin_statistics.txt -> > ``` -> > -> {: .hands_on} -{: .comment} The output of "CheckM lineage_wf" includes several files and tables that provide information about the taxonomic classification and quality assessment of genome bins. Here are some of the key outputs: -- **CheckM Lineage Workflow Output Report**: This report provides a summary of the quality assessment performed by CheckM. It includes statistics such as the number of genomes analyzed, their completeness, contamination, and other quality metrics. +- **CheckM Lineage Workflow Output Report (Bin statistics)**: This report provides a summary of the quality assessment performed by CheckM. It includes statistics such as the number of genomes analyzed, their completeness, contamination, and other quality metrics. - **Lineage-specific Quality Assessment**: CheckM generates lineage-specific quality assessment files for each analyzed genome. These files contain detailed information about the completeness and contamination of the genome based on its taxonomic lineage. @@ -377,13 +435,103 @@ The output of "CheckM lineage_wf" includes several files and tables that provide It should be noted that "CheckM lineage_wf" offers a range of optional outputs that can be generated to provide additional information to the user. - - # Conclusions -In summary, this tutorial shows a step-by-step on how to bin metagenomic contigs using MetaBAT 2. +In summary, this tutorial shows a step-by-step on how to bin metagenomic contigs using various Binners, including Bin refinement. It is critical to select the appropriate binning tool for a specific metagenomics study, as different binning methods may have different strengths and limitations depending on the type of metagenomic data being analyzed. By comparing the outcomes of several binning techniques, researchers can increase the precision and accuracy of genome binning. @@ -409,4 +555,4 @@ There are various binning methods available for metagenomic data, including refe Comparing the outcomes of multiple binning methods can help to identify the most accurate and reliable method for a specific study. This can be done by evaluating the quality of the resulting bins in terms of completeness, contamination, and strain heterogeneity, as well as by comparing the composition and functional profiles of the identified genomes. -Overall, by carefully selecting and comparing binning methods, researchers can improve the quality and reliability of genome bins, which can ultimately lead to a better understanding of the functional and ecological roles of microbial communities in various environments. +Overall, by carefully selecting and comparing binning methods, researchers can improve the quality and reliability of genome bins, which can ultimately lead to a better understanding of the functional and ecological roles of microbial communities in various environments. \ No newline at end of file diff --git a/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning-tests.yml b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning-tests.yml new file mode 100644 index 00000000000000..09939e694c3f0a --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning-tests.yml @@ -0,0 +1,31 @@ +- doc: Test outline for Assembly-of-metagenomic-sequencing-data + job: + Trimmed reads: + class: Collection + collection_type: list:paired + elements: + - class: Collection + type: paired + identifier: ERR2231567 + elements: + - class: File + identifier: forward + path: https://zenodo.org/records/17660820/files/reads_forward.fastqsanger.gz + - class: File + identifier: reverse + path: https://zenodo.org/records/17660820/files/reads_reverse.fastqsanger.gz + Assemblies: + class: Collection + collection_type: list + elements: + - class: File + identifier: ERR2231567 + path: https://zenodo.org/records/17660820/files/MEGAHIT_contigs.fasta + + outputs: + final: + asserts: + has_text: + text: "binette_bin1" + has_text: + text: "16.69" diff --git a/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning.ga b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning.ga new file mode 100644 index 00000000000000..268bf7ee1e50a3 --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning.ga @@ -0,0 +1,1111 @@ +{ + "a_galaxy_workflow": "true", + "annotation": "Binning workflows that uses abundance information and performs binning of metagenomic contigs using 4 different binners as well as bin refinement.", + "comments": [ + { + "color": "orange", + "data": { + "text": "# Bin refinement" + }, + "id": 5, + "position": [ + 2404.7906885054667, + 1141.6 + ], + "size": [ + 1271, + 1458 + ], + "type": "markdown" + }, + { + "color": "none", + "data": { + "text": "# CONCOCT\n" + }, + "id": 0, + "position": [ + 680.4906885054666, + 0 + ], + "size": [ + 1713, + 597 + ], + "type": "markdown" + }, + { + "color": "lime", + "data": { + "text": "# Mapping" + }, + "id": 1, + "position": [ + 11.290688505466562, + 1078.1999999999998 + ], + "size": [ + 692, + 415 + ], + "type": "markdown" + }, + { + "color": "red", + "data": { + "text": "# MetaBAT2\n" + }, + "id": 2, + "position": [ + 1496.8906885054666, + 656.9 + ], + "size": [ + 541, + 404 + ], + "type": "markdown" + }, + { + "color": "pink", + "data": { + "text": "# MaxBin2" + }, + "id": 3, + "position": [ + 1533.5906885054667, + 1150.1999999999998 + ], + "size": [ + 587, + 490 + ], + "type": "markdown" + }, + { + "color": "lime", + "data": { + "text": "# SemiBin" + }, + "id": 4, + "position": [ + 1089.8906885054666, + 1704.6999999999998 + ], + "size": [ + 887, + 532 + ], + "type": "markdown" + } + ], + "creator": [ + { + "class": "Person", + "identifier": "https://orcid.org/0000-0003-2982-388X", + "name": "Paul Zierep" + } + ], + "format-version": "0.1", + "help": "", + "license": "MIT", + "name": "Metagenomic Binning", + "readme": "", + "report": { + "markdown": "\n# Workflow Execution Report\n\n## Workflow Inputs\n```galaxy\ninvocation_inputs()\n```\n\n## Workflow Outputs\n```galaxy\ninvocation_outputs()\n```\n\n## Workflow\n```galaxy\nworkflow_display()\n```\n" + }, + "steps": { + "0": { + "annotation": "Samples grouped for co-assembly. For individual assembly use same reads as `Trimmed reads input`. The tool fastq_groupmerge can be used to perform the grouping.", + "content_id": null, + "errors": null, + "id": 0, + "input_connections": {}, + "inputs": [ + { + "description": "Samples grouped for co-assembly. For individual assembly use same reads as `Trimmed reads input`. The tool fastq_groupmerge can be used to perform the grouping.", + "name": "Trimmed reads" + } + ], + "label": "Trimmed reads", + "name": "Input dataset collection", + "outputs": [], + "position": { + "left": 0, + "top": 896.6187286877193 + }, + "tool_id": null, + "tool_state": "{\"optional\": false, \"tag\": null, \"collection_type\": \"list:paired\", \"fields\": null}", + "tool_version": null, + "type": "data_collection_input", + "uuid": "dd8faa6e-3f29-4fa2-befb-38ef4a7832b5", + "when": null, + "workflow_outputs": [] + }, + "1": { + "annotation": "CONCOCT requires the read length for coverage. Best use fastQC to estimate the mean value.", + "content_id": null, + "errors": null, + "id": 1, + "input_connections": {}, + "inputs": [ + { + "description": "CONCOCT requires the read length for coverage. Best use fastQC to estimate the mean value.", + "name": "Read length (CONCOCT)" + } + ], + "label": "Read length (CONCOCT)", + "name": "Input parameter", + "outputs": [], + "position": { + "left": 1097.0837390044621, + "top": 41.539782924321685 + }, + "tool_id": null, + "tool_state": "{\"default\": 100, \"validators\": [{\"min\": null, \"max\": null, \"negate\": false, \"type\": \"in_range\"}], \"parameter_type\": \"integer\", \"optional\": false}", + "tool_version": null, + "type": "parameter_input", + "uuid": "30109f1d-816b-4f85-a0b3-e54506ae32ae", + "when": null, + "workflow_outputs": [] + }, + "2": { + "annotation": "This workflow allows using a custom assembly as input. If provided, select `custom assembly` as Assembler.\nProvide one assembly for each group of trimmed input reads.", + "content_id": null, + "errors": null, + "id": 2, + "input_connections": {}, + "inputs": [ + { + "description": "This workflow allows using a custom assembly as input. If provided, select `custom assembly` as Assembler.\nProvide one assembly for each group of trimmed input reads.", + "name": "Assemblies" + } + ], + "label": "Assemblies", + "name": "Input dataset collection", + "outputs": [], + "position": { + "left": 248.96247766826204, + "top": 1670.8582741820105 + }, + "tool_id": null, + "tool_state": "{\"optional\": false, \"tag\": null, \"collection_type\": \"list\", \"fields\": null}", + "tool_version": null, + "type": "data_collection_input", + "uuid": "e2f5ad16-674f-4687-94a4-a5e55680440a", + "when": null, + "workflow_outputs": [] + }, + "3": { + "annotation": "Environment for the built-in model (SemiBin), options are: human_gut, dog_gut, ocean, soil, cat_gut, human_oral, mouse_gut, pig_gut, built_environment, wastewater, chicken_caecum, global", + "content_id": null, + "errors": null, + "id": 3, + "input_connections": {}, + "inputs": [ + { + "description": "Environment for the built-in model (SemiBin), options are: human_gut, dog_gut, ocean, soil, cat_gut, human_oral, mouse_gut, pig_gut, built_environment, wastewater, chicken_caecum, global", + "name": "Environment for the built-in model (SemiBin)" + } + ], + "label": "Environment for the built-in model (SemiBin)", + "name": "Input parameter", + "outputs": [], + "position": { + "left": 1139.1165875728577, + "top": 2094.707949827118 + }, + "tool_id": null, + "tool_state": "{\"default\": \"global\", \"multiple\": false, \"validators\": [], \"restrictions\": [\"global\", \"human_gut\", \"dog_gut\", \"ocean\", \"soil\", \"cat_gut\", \"human_oral\", \"mouse_gut\", \"pig_gut\", \"built_environment\", \"wastewater\", \"chicken_caecum\"], \"parameter_type\": \"text\", \"optional\": false}", + "tool_version": null, + "type": "parameter_input", + "uuid": "86fc3246-f756-49ce-b196-ffa358e5ac41", + "when": null, + "workflow_outputs": [] + }, + "4": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_cut_up_fasta/concoct_cut_up_fasta/1.1.0+galaxy2", + "errors": null, + "id": 4, + "input_connections": { + "input_fasta": { + "id": 2, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "CONCOCT: Cut up contigs", + "outputs": [ + { + "name": "output_fasta", + "type": "fasta" + }, + { + "name": "output_bed", + "type": "bed" + } + ], + "position": { + "left": 749.6942143874219, + "top": 224.77559328535233 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_cut_up_fasta/concoct_cut_up_fasta/1.1.0+galaxy2", + "tool_shed_repository": { + "changeset_revision": "4d8bc5dd9e95", + "name": "concoct_cut_up_fasta", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"bedfile\": true, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"chunk_size\": \"10000\", \"input_fasta\": {\"__class__\": \"ConnectedValue\"}, \"input_fasta|__identifier__\": \"ERR2231567.fastqsanger\", \"merge_last\": true, \"overlap_size\": \"0\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.0+galaxy2", + "type": "tool", + "uuid": "3d11cf76-8a00-4aab-8cee-14647ad02165", + "when": null, + "workflow_outputs": [] + }, + "5": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.4+galaxy0", + "errors": null, + "id": 5, + "input_connections": { + "library|input_1": { + "id": 0, + "output_name": "output" + }, + "reference_genome|own_file": { + "id": 2, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Bowtie2", + "name": "library" + }, + { + "description": "runtime parameter for tool Bowtie2", + "name": "reference_genome" + } + ], + "label": null, + "name": "Bowtie2", + "outputs": [ + { + "name": "output", + "type": "bam" + }, + { + "name": "mapping_stats", + "type": "txt" + } + ], + "position": { + "left": 170.4540358517052, + "top": 1189.3571323702458 + }, + "post_job_actions": { + "HideDatasetActionoutput": { + "action_arguments": {}, + "action_type": "HideDatasetAction", + "output_name": "output" + } + }, + "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.4+galaxy0", + "tool_shed_repository": { + "changeset_revision": "f76cbb84d67f", + "name": "bowtie2", + "owner": "devteam", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"analysis_type\": {\"analysis_type_selector\": \"simple\", \"__current_case__\": 0, \"presets\": \"no_presets\"}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"library\": {\"type\": \"paired_collection\", \"__current_case__\": 1, \"input_1\": {\"__class__\": \"ConnectedValue\"}, \"unaligned_file\": false, \"aligned_file\": false, \"paired_options\": {\"paired_options_selector\": \"no\", \"__current_case__\": 1}}, \"own_file|__identifier__\": \"ERR2231567.fastqsanger\", \"reference_genome\": {\"source\": \"history\", \"__current_case__\": 1, \"own_file\": {\"__class__\": \"ConnectedValue\"}}, \"rg\": {\"rg_selector\": \"do_not_set\", \"__current_case__\": 3}, \"sam_options\": {\"sam_options_selector\": \"no\", \"__current_case__\": 1}, \"save_mapping_stats\": true, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.5.4+galaxy0", + "type": "tool", + "uuid": "e41a21d2-91c7-42e3-8072-e254280733ab", + "when": null, + "workflow_outputs": [] + }, + "6": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/samtools_sort/samtools_sort/2.0.7", + "errors": null, + "id": 6, + "input_connections": { + "input1": { + "id": 5, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "Samtools sort", + "outputs": [ + { + "name": "output1", + "type": "bam" + } + ], + "position": { + "left": 425.0155087709174, + "top": 1226.3643944120972 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/samtools_sort/samtools_sort/2.0.7", + "tool_shed_repository": { + "changeset_revision": "f2f2650aeade", + "name": "samtools_sort", + "owner": "devteam", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input1\": {\"__class__\": \"ConnectedValue\"}, \"input1|__identifier__\": \"ERR2231567.fastqsanger\", \"minhash\": false, \"prim_key_cond\": {\"prim_key_select\": \"\", \"__current_case__\": 0}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.0.7", + "type": "tool", + "uuid": "863d5ee6-9eb0-4ac7-a634-9e258807f8cb", + "when": null, + "workflow_outputs": [] + }, + "7": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_coverage_table/concoct_coverage_table/1.1.0+galaxy2", + "errors": null, + "id": 7, + "input_connections": { + "bedfile": { + "id": 4, + "output_name": "output_bed" + }, + "mode|bamfile": { + "id": 6, + "output_name": "output1" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool CONCOCT: Generate the input coverage table", + "name": "mode" + } + ], + "label": null, + "name": "CONCOCT: Generate the input coverage table", + "outputs": [ + { + "name": "output", + "type": "tabular" + } + ], + "position": { + "left": 1135.7679978931717, + "top": 217.73566551102363 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_coverage_table/concoct_coverage_table/1.1.0+galaxy2", + "tool_shed_repository": { + "changeset_revision": "fd31cd168efc", + "name": "concoct_coverage_table", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"bedfile\": {\"__class__\": \"ConnectedValue\"}, \"bedfile|__identifier__\": \"ERR2231567.fastqsanger\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"mode\": {\"type\": \"individual\", \"__current_case__\": 0, \"bamfile\": {\"__class__\": \"ConnectedValue\"}}, \"mode|bamfile|__identifier__\": \"ERR2231567.fastqsanger\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.0+galaxy2", + "type": "tool", + "uuid": "11dcb0bb-7626-4c97-a15c-0366194f1cea", + "when": null, + "workflow_outputs": [] + }, + "8": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2_jgi_summarize_bam_contig_depths/metabat2_jgi_summarize_bam_contig_depths/2.17+galaxy0", + "errors": null, + "id": 8, + "input_connections": { + "mode|bam_indiv_input": { + "id": 6, + "output_name": "output1" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Calculate contig depths", + "name": "mode" + } + ], + "label": null, + "name": "Calculate contig depths", + "outputs": [ + { + "name": "outputDepth", + "type": "tabular" + } + ], + "position": { + "left": 896.6290086604289, + "top": 924.5426879033071 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2_jgi_summarize_bam_contig_depths/metabat2_jgi_summarize_bam_contig_depths/2.17+galaxy0", + "tool_shed_repository": { + "changeset_revision": "00e3b4ef7e0c", + "name": "metabat2_jgi_summarize_bam_contig_depths", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"advanced\": {\"percentIdentity\": \"97\", \"output_paired_contigs\": false, \"noIntraDepthVariance\": false, \"showDepth\": false, \"minMapQual\": \"0\", \"weightMapQual\": \"0.0\", \"includeEdgeBases\": false, \"maxEdgeBases\": \"75\"}, \"bam_indiv_input|__identifier__\": \"ERR2231567.fastqsanger\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"mode\": {\"type\": \"individual\", \"__current_case__\": 0, \"bam_indiv_input\": {\"__class__\": \"ConnectedValue\"}, \"use_reference_cond\": {\"use_reference\": \"no\", \"__current_case__\": 0}}, \"shredding\": {\"shredLength\": \"16000\", \"shredDepth\": \"5\", \"minContigLength\": \"1\", \"minContigDepth\": \"0.0\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.17+galaxy0", + "type": "tool", + "uuid": "99aabfaa-9e98-45e2-83f7-3009b959603d", + "when": null, + "workflow_outputs": [] + }, + "9": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/semibin/semibin/2.1.0+galaxy1", + "errors": null, + "id": 9, + "input_connections": { + "mode|environment": { + "id": 3, + "output_name": "output" + }, + "mode|input_bam": { + "id": 6, + "output_name": "output1" + }, + "mode|input_fasta": { + "id": 2, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool SemiBin", + "name": "mode" + }, + { + "description": "runtime parameter for tool SemiBin", + "name": "mode" + }, + { + "description": "runtime parameter for tool SemiBin", + "name": "mode" + } + ], + "label": null, + "name": "SemiBin", + "outputs": [ + { + "name": "output_bins", + "type": "input" + }, + { + "name": "single_data", + "type": "csv" + }, + { + "name": "single_data_split", + "type": "csv" + }, + { + "name": "single_cov", + "type": "csv" + }, + { + "name": "single_split_cov", + "type": "csv" + } + ], + "position": { + "left": 1644.2381997196558, + "top": 1785.1442996053324 + }, + "post_job_actions": { + "TagDatasetActionoutput_bins": { + "action_arguments": { + "tags": "sample-bins" + }, + "action_type": "TagDatasetAction", + "output_name": "output_bins" + } + }, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/semibin/semibin/2.1.0+galaxy1", + "tool_shed_repository": { + "changeset_revision": "afee33334a63", + "name": "semibin", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"annot\": {\"ml_threshold\": null}, \"bin\": {\"max_node\": \"1.0\", \"max_edges\": \"200\", \"minfasta_kbs\": \"200\"}, \"extra_output\": [\"data\", \"coverage\"], \"min_len\": {\"method\": \"automatic\", \"__current_case__\": 0}, \"mode\": {\"select\": \"single\", \"__current_case__\": 0, \"input_fasta\": {\"__class__\": \"ConnectedValue\"}, \"input_bam\": {\"__class__\": \"ConnectedValue\"}, \"ref\": {\"select\": \"ml\", \"__current_case__\": 2}, \"environment\": {\"__class__\": \"ConnectedValue\"}}, \"orf_finder\": \"fast-naive\", \"random_seed\": \"0\", \"training\": {\"epoches\": \"20\", \"batch_size\": \"2048\"}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.1.0+galaxy1", + "type": "tool", + "uuid": "6fd5ada8-8884-4988-a895-a6c41172022d", + "when": null, + "workflow_outputs": [] + }, + "10": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct/concoct/1.1.0+galaxy2", + "errors": null, + "id": 10, + "input_connections": { + "advanced|read_length": { + "id": 1, + "output_name": "output" + }, + "composition_file": { + "id": 4, + "output_name": "output_fasta" + }, + "coverage_file": { + "id": 7, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool CONCOCT", + "name": "advanced" + } + ], + "label": null, + "name": "CONCOCT", + "outputs": [ + { + "name": "output_clustering", + "type": "csv" + }, + { + "name": "output_pca_components", + "type": "csv" + }, + { + "name": "output_pca_transformed", + "type": "csv" + } + ], + "position": { + "left": 1516.0490446249091, + "top": 184.3040223201906 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct/concoct/1.1.0+galaxy2", + "tool_shed_repository": { + "changeset_revision": "eae7ee167917", + "name": "concoct", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"advanced\": {\"clusters\": \"400\", \"kmer_length\": \"4\", \"length_threshold\": \"1000\", \"read_length\": {\"__class__\": \"ConnectedValue\"}, \"total_percentage_pca\": \"90\", \"seed\": \"1\", \"iterations\": \"500\", \"no_cov_normalization\": false}, \"composition_file\": {\"__class__\": \"ConnectedValue\"}, \"coverage_file\": {\"__class__\": \"ConnectedValue\"}, \"output\": {\"no_total_coverage\": false, \"converge_out\": false, \"log\": false}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.0+galaxy2", + "type": "tool", + "uuid": "f4094547-c644-4657-a908-25e789703384", + "when": null, + "workflow_outputs": [] + }, + "11": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2/metabat2/2.17+galaxy0", + "errors": null, + "id": 11, + "input_connections": { + "advanced|base_coverage_depth_cond|abdFile": { + "id": 8, + "output_name": "outputDepth" + }, + "inFile": { + "id": 2, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "MetaBAT2", + "outputs": [ + { + "name": "bins", + "type": "input" + }, + { + "name": "lowDepth", + "type": "fasta" + }, + { + "name": "tooShort", + "type": "fasta" + }, + { + "name": "unbinned", + "type": "fasta" + }, + { + "name": "process_log", + "type": "txt" + } + ], + "position": { + "left": 1755.1233348021783, + "top": 667.814925017154 + }, + "post_job_actions": { + "TagDatasetActionbins": { + "action_arguments": { + "tags": "sample-bins" + }, + "action_type": "TagDatasetAction", + "output_name": "bins" + } + }, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2/metabat2/2.17+galaxy0", + "tool_shed_repository": { + "changeset_revision": "f375b4f6ef57", + "name": "metabat2", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"advanced\": {\"base_coverage_depth_cond\": {\"base_coverage_depth\": \"yes\", \"__current_case__\": 1, \"abdFile\": {\"__class__\": \"ConnectedValue\"}, \"cvExt\": null}, \"minContig\": \"1500\", \"maxP\": \"95\", \"minS\": \"60\", \"maxEdges\": \"200\", \"pTNF\": \"0\", \"noAdd\": false, \"minCV\": \"1.0\", \"minCVSum\": \"1.0\", \"seed\": \"0\"}, \"advanced|abdFile|__identifier__\": \"ERR2231567.fastqsanger\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inFile\": {\"__class__\": \"ConnectedValue\"}, \"inFile|__identifier__\": \"ERR2231567.fastqsanger\", \"out\": {\"minClsSize\": \"200000\", \"onlyLabel\": false, \"saveCls\": false, \"extra_outputs\": [\"lowDepth\", \"tooShort\", \"unbinned\", \"log\"]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.17+galaxy0", + "type": "tool", + "uuid": "12715693-dcaa-4097-bd1e-61bfaa0fbe42", + "when": null, + "workflow_outputs": [] + }, + "12": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/mbernt/maxbin2/maxbin2/2.2.7+galaxy6", + "errors": null, + "id": 12, + "input_connections": { + "assembly|inputs|abund": { + "id": 8, + "output_name": "outputDepth" + }, + "contig": { + "id": 2, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "MaxBin2", + "outputs": [ + { + "name": "bins", + "type": "input" + }, + { + "name": "markers", + "type": "input" + }, + { + "name": "noclass", + "type": "fasta" + }, + { + "name": "toshort", + "type": "fasta" + }, + { + "name": "summary", + "type": "tabular" + }, + { + "name": "log", + "type": "txt" + }, + { + "name": "marker", + "type": "tabular" + }, + { + "name": "plot", + "type": "pdf" + } + ], + "position": { + "left": 1670.4872771406842, + "top": 1205.5486143537003 + }, + "post_job_actions": { + "TagDatasetActionbins": { + "action_arguments": { + "tags": "sample-bins" + }, + "action_type": "TagDatasetAction", + "output_name": "bins" + } + }, + "tool_id": "toolshed.g2.bx.psu.edu/repos/mbernt/maxbin2/maxbin2/2.2.7+galaxy6", + "tool_shed_repository": { + "changeset_revision": "0917b2d6010d", + "name": "maxbin2", + "owner": "mbernt", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"adv\": {\"min_contig_length\": \"1000\", \"max_iteration\": \"50\", \"prob_threshold\": \"0.5\"}, \"assembly\": {\"type\": \"individual\", \"__current_case__\": 0, \"inputs\": {\"type\": \"abund\", \"__current_case__\": 1, \"abund\": {\"__class__\": \"ConnectedValue\"}}}, \"contig\": {\"__class__\": \"ConnectedValue\"}, \"output\": {\"plotmarker\": true, \"marker\": true, \"markers\": true, \"log\": true, \"markerset\": \"107\"}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}", + "tool_version": "2.2.7+galaxy6", + "type": "tool", + "uuid": "c3282b9b-30ec-4e23-9859-5931275f7fdf", + "when": null, + "workflow_outputs": [] + }, + "13": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "errors": null, + "id": 13, + "input_connections": { + "inputs": { + "id": 9, + "output_name": "output_bins" + } + }, + "inputs": [], + "label": null, + "name": "Converts genome bins in fasta format", + "outputs": [ + { + "name": "contigs2bin", + "type": "tabular" + } + ], + "position": { + "left": 2805.967292429587, + "top": 1922.7528899355289 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "tool_shed_repository": { + "changeset_revision": "fb2bed0eb02f", + "name": "fasta_to_contig2bin", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.7+galaxy1", + "type": "tool", + "uuid": "9a7aa4f6-b60b-4415-a1a4-73cf829ec2c3", + "when": null, + "workflow_outputs": [] + }, + "14": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_merge_cut_up_clustering/concoct_merge_cut_up_clustering/1.1.0+galaxy2", + "errors": null, + "id": 14, + "input_connections": { + "cutup_clustering_result": { + "id": 10, + "output_name": "output_clustering" + } + }, + "inputs": [], + "label": null, + "name": "CONCOCT: Merge cut clusters", + "outputs": [ + { + "name": "output", + "type": "csv" + } + ], + "position": { + "left": 1812.6886863992684, + "top": 330.66525053302036 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_merge_cut_up_clustering/concoct_merge_cut_up_clustering/1.1.0+galaxy2", + "tool_shed_repository": { + "changeset_revision": "20ccec4a2c38", + "name": "concoct_merge_cut_up_clustering", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"cutup_clustering_result\": {\"__class__\": \"ConnectedValue\"}, \"cutup_clustering_result|__identifier__\": \"ERR2231567.fastqsanger\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.0+galaxy2", + "type": "tool", + "uuid": "d7c52e93-16f4-4071-b47a-e6a0c3feaf6e", + "when": null, + "workflow_outputs": [] + }, + "15": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "errors": null, + "id": 15, + "input_connections": { + "inputs": { + "id": 11, + "output_name": "bins" + } + }, + "inputs": [], + "label": null, + "name": "Converts genome bins in fasta format", + "outputs": [ + { + "name": "contigs2bin", + "type": "tabular" + } + ], + "position": { + "left": 2792.8906885054666, + "top": 1238.8 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "tool_shed_repository": { + "changeset_revision": "fb2bed0eb02f", + "name": "fasta_to_contig2bin", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.7+galaxy1", + "type": "tool", + "uuid": "aca0d099-2011-42db-9cac-d0eed8750ec8", + "when": null, + "workflow_outputs": [] + }, + "16": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "errors": null, + "id": 16, + "input_connections": { + "inputs": { + "id": 12, + "output_name": "bins" + } + }, + "inputs": [], + "label": null, + "name": "Converts genome bins in fasta format", + "outputs": [ + { + "name": "contigs2bin", + "type": "tabular" + } + ], + "position": { + "left": 2766.4841273841676, + "top": 1687.649001081842 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "tool_shed_repository": { + "changeset_revision": "fb2bed0eb02f", + "name": "fasta_to_contig2bin", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.7+galaxy1", + "type": "tool", + "uuid": "e7e5c640-78f8-4af4-b589-75cea9f5a7fa", + "when": null, + "workflow_outputs": [] + }, + "17": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_extract_fasta_bins/concoct_extract_fasta_bins/1.1.0+galaxy2", + "errors": null, + "id": 17, + "input_connections": { + "cluster_file": { + "id": 14, + "output_name": "output" + }, + "fasta_file": { + "id": 2, + "output_name": "output" + } + }, + "inputs": [], + "label": null, + "name": "CONCOCT: Extract a fasta file", + "outputs": [ + { + "name": "bins", + "type": "input" + } + ], + "position": { + "left": 2144.664315483395, + "top": 278.4488665876612 + }, + "post_job_actions": { + "TagDatasetActionbins": { + "action_arguments": { + "tags": "sample-bins" + }, + "action_type": "TagDatasetAction", + "output_name": "bins" + } + }, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_extract_fasta_bins/concoct_extract_fasta_bins/1.1.0+galaxy2", + "tool_shed_repository": { + "changeset_revision": "8b1b09fcd8b7", + "name": "concoct_extract_fasta_bins", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"cluster_file\": {\"__class__\": \"ConnectedValue\"}, \"cluster_file|__identifier__\": \"ERR2231567.fastqsanger\", \"fasta_file\": {\"__class__\": \"ConnectedValue\"}, \"fasta_file|__identifier__\": \"ERR2231567.fastqsanger\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.0+galaxy2", + "type": "tool", + "uuid": "5e80f512-bda6-49a4-9294-d80a12c4209b", + "when": null, + "workflow_outputs": [] + }, + "18": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "errors": null, + "id": 18, + "input_connections": { + "inputs": { + "id": 17, + "output_name": "bins" + } + }, + "inputs": [], + "label": null, + "name": "Converts genome bins in fasta format", + "outputs": [ + { + "name": "contigs2bin", + "type": "tabular" + } + ], + "position": { + "left": 2801.2970664511176, + "top": 1485.5000932902894 + }, + "post_job_actions": {}, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1", + "tool_shed_repository": { + "changeset_revision": "fb2bed0eb02f", + "name": "fasta_to_contig2bin", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.1.7+galaxy1", + "type": "tool", + "uuid": "024263f9-774c-46f0-9ed1-7dbc733a3e45", + "when": null, + "workflow_outputs": [] + }, + "19": { + "annotation": "", + "content_id": "__BUILD_LIST__", + "errors": null, + "id": 19, + "input_connections": { + "datasets_0|input": { + "id": 18, + "output_name": "contigs2bin" + }, + "datasets_1|input": { + "id": 15, + "output_name": "contigs2bin" + }, + "datasets_2|input": { + "id": 16, + "output_name": "contigs2bin" + }, + "datasets_3|input": { + "id": 13, + "output_name": "contigs2bin" + } + }, + "inputs": [], + "label": null, + "name": "Build list", + "outputs": [ + { + "name": "output", + "type": "input" + } + ], + "position": { + "left": 3142.8906885054666, + "top": 1608.8 + }, + "post_job_actions": {}, + "tool_id": "__BUILD_LIST__", + "tool_state": "{\"datasets\": [{\"__index__\": 0, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}, {\"__index__\": 1, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}, {\"__index__\": 2, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}, {\"__index__\": 3, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}], \"__page__\": 0, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.2.0", + "type": "tool", + "uuid": "111a0da7-0ea8-488c-bcc1-cd40bffdd03f", + "when": null, + "workflow_outputs": [] + }, + "20": { + "annotation": "", + "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/binette/binette/1.2.0+galaxy0", + "errors": null, + "id": 20, + "input_connections": { + "contig2bin_tables": { + "id": 19, + "output_name": "output" + }, + "contigs": { + "id": 2, + "output_name": "output" + } + }, + "inputs": [ + { + "description": "runtime parameter for tool Binette", + "name": "proteins" + } + ], + "label": null, + "name": "Binette", + "outputs": [ + { + "name": "bins", + "type": "input" + }, + { + "name": "quality", + "type": "input" + }, + { + "name": "final", + "type": "tabular" + } + ], + "position": { + "left": 3160.031508556423, + "top": 2104.2689073130805 + }, + "post_job_actions": { + "TagDatasetActionbins": { + "action_arguments": { + "tags": "refined-sample-bins" + }, + "action_type": "TagDatasetAction", + "output_name": "bins" + } + }, + "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/binette/binette/1.2.0+galaxy0", + "tool_shed_repository": { + "changeset_revision": "37ab2cfedac4", + "name": "binette", + "owner": "iuc", + "tool_shed": "toolshed.g2.bx.psu.edu" + }, + "tool_state": "{\"contamination_weight\": {\"__class__\": \"ConnectedValue\"}, \"contig2bin_tables\": {\"__class__\": \"ConnectedValue\"}, \"contigs\": {\"__class__\": \"ConnectedValue\"}, \"database_type\": {\"is_select\": \"cached\", \"__current_case__\": 1, \"datamanager\": \"1.0.2\"}, \"min_completeness\": {\"__class__\": \"ConnectedValue\"}, \"proteins\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}", + "tool_version": "1.2.0+galaxy0", + "type": "tool", + "uuid": "c2645ba2-7568-43ab-8eb0-4095fb6a4f45", + "when": null, + "workflow_outputs": [] + } + }, + "tags": ["microbiome", "microgalaxy", "binning"], + "uuid": "b949eaf7-bf7c-4284-b0b8-fb7a94737de1", + "version": 6 +} \ No newline at end of file diff --git a/topics/microbiome/tutorials/metagenomics-binning/workflows/index.md b/topics/microbiome/tutorials/metagenomics-binning/workflows/index.md new file mode 100644 index 00000000000000..e092e0ae66ddd4 --- /dev/null +++ b/topics/microbiome/tutorials/metagenomics-binning/workflows/index.md @@ -0,0 +1,3 @@ +--- +layout: workflow-list +---