`
+>
+> * *"Method to set up the minimal length for contigs in binning"*: `Automatic`
+>
+> > Environment for the built-in model
+> >
+> > SemiBin provides several pretrained models. If a model matching your environment is available, selecting it can improve binning performance.
+> >
+> > If no environment-specific model fits your data, you may choose:
+> >
+> > * **Global** — a general-purpose pretrained model trained across many environments.
+> > * **None** — no pretrained model is used. SemiBin then runs in fully unsupervised mode, which is recommended when your environment differs substantially from all available pretrained models.
+> {: .comment}
+>
+{: .hands_on}
+
+> Binning metrics
+>
+> 1. How many bins where produced by SemiBin for our sample?
+> 2. How many contigs are in the bin with most contigs?
+> >
+> >
+> > 1. There is only one bin for this sample.
+> > 2. 50 (these numbers may differ slightly depending on the version of SemiBin). So not all contigs where binned into this bin !
+> >
+> {: .solution}
+>
+{: .question}
\ No newline at end of file
diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib b/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib
index 18aac5e68c40ef..218335ca6f765b 100644
--- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib
+++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib
@@ -1,3 +1,89 @@
+@article{nissen2021improved,
+ title={Improved metagenome binning and assembly using deep variational autoencoders},
+ author={Nissen, Jakob Nybo and Johansen, Joachim and Alles{\o}e, Rosa Lundbye and S{\o}nderby, Casper Kaae and Armenteros, Jose Juan Almagro and Gr{\o}nbech, Christopher Heje and Jensen, Lars Juhl and Nielsen, Henrik Bj{\o}rn and Petersen, Thomas Nordahl and Winther, Ole and others},
+ journal={Nature biotechnology},
+ volume={39},
+ number={5},
+ pages={555--560},
+ year={2021},
+ publisher={Nature Publishing Group US New York},
+ doi = {10.1038/s41587-020-00777-4},
+}
+
+@article{NatureBinner2025,
+ author = {Author, A. and Author, B. and Author, C.},
+ title = {Comprehensive benchmarking of metagenomic binners across diverse environments},
+ journal = {Nature Communications},
+ year = {2025},
+ volume = {16},
+ pages = {57957},
+ doi = {10.1038/s41467-025-57957-6}
+}
+
+@article{Meyer2022,
+ author = {Meyer, Fernando and Fritz, Adrian and Deng, Zhi‑Luo and Koslicki, David and Lesker, Till Robin and Gurevich, Alexey and Robertson, Gary and Alser, Mohammed and Antipov, Dmitry and Beghini, Francesco and Bertrand, Denis and Brito, Jaqueline J. and Brown, C. Titus and Buchmann, Jan and Buluç, Aydin and Chen, Bo and Chikhi, Rayan and Clausen, Philip T.L.C. and Cristian, Alexandru and Dabrowski, Piotr W. and Darling, Aaron E. and Egan, Rob and Eskin, Eleazar and Georganas, Evangelos and Goltsman, Eugene and Gray, Melissa A. and Hansen, Lars Hestbjerg and Hofmeyr, Steven and Huang, Pingqin and Irber, Luiz and Jia, Huijue and Jørgensen, Tue Sparholt and Kieser, Silas D. and Klemetsen, Terje and Kola, Axel and Kolmogorov, Mikhail and Korobeynikov, Anton and Kwan, Jason and LaPierre, Nathan and Lemaitre, Claire and Li, Chenhao and Limasset, Antoine and Malcher‑Miranda, Fabio and Mangul, Serghei and Marcelino, Vanessa R. and Marchet, Camille and Marijon, Pierre and Meleshko, Dmitry and Mende, Daniel R. and Milanese, Alessio and Nagarajan, Niranjan and Nissen, Jakob and Nurk, Sergey and Oliker, Leonid and Paoli, Lucas and Peterlongo, Pierre and Piro, Vitor C. and Porter, Jacob S. and Rasmussen, Simon and Rees, Evan R. and Reinert, Knut and Renard, Bernhard and Robertsen, Espen Mikal and Rosen, Gail L. and Ruscheweyh, Hans‑Joachim and Sarwal, Varuni and Segata, Nicola and Seiler, Enrico and Shi, Lizhen and Sun, Fengzhu and Sunagawa, Shinichi and Sørensen, Søren Johannes and Thomas, Ashleigh and Tong, Chengxuan and Trajkovski, Mirko and Tremblay, Julien and Uritskiy, Gherman and Vicedomini, Riccardo and Wang, Zhengyang and Wang, Ziye and Wang, Zhong and Warren, Andrew and Willassen, Nils Peder and Yelick, Katherine and You, Ronghui and Zeller, Georg and Zhao, Zhengqiao and Zhu, Shanfeng and Zhu, Jie and Garrido‑Oter, Ruben and Gastmeier, Petra and Hacquard, Stephane and Häußler, Susanne and Khaledi, Ariane and Maechler, Friederike and Mesny, Fantin and Radutoiu, Simona and Schulze‑Lefert, Paul and Smit, Nathiana and Strowig, Till and Bremges, Andreas and Sczyrba, Alice Carolyn McHardy},
+ title = {Critical Assessment of Metagenome Interpretation: the second round of challenges},
+ journal = {Nature Methods},
+ year = {2022},
+ volume = {19},
+ number = {4},
+ pages = {429–440},
+ doi = {10.1038/s41592-022-01431-4}
+}
+
+@article{Wang2024COMEBin,
+ author = {Wang, Ziye and You, Ronghui and Han, Haitao and Liu, Wei and Sun, Fengzhu and Zhu, Shanfeng},
+ title = {Effective binning of metagenomic contigs using contrastive multi‑view representation learning},
+ journal = {Nature Communications},
+ year = {2024},
+ volume = {15},
+ article = {585},
+ doi = {10.1038/s41467-023-44290-z},
+ url = {https://doi.org/10.1038/s41467-023-44290-z}
+}
+
+@article{Chklovski2023CheckM2,
+ title = {CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning},
+ author = {Alex Chklovski and Donovan H. Parks and Ben J. Woodcroft and Gene W. Tyson},
+ journal = {Nature Methods},
+ year = {2023},
+ volume = {20},
+ number = {8},
+ pages = {1203--1212},
+ doi = {10.1038/s41592-023-01940-w}
+}
+
+@article{Mainguy2024Binette,
+ author = {Mainguy, Jean and Hoede, Claire},
+ title = {Binette: a fast and accurate bin refinement tool to construct high‐quality Metagenome Assembled Genomes},
+ journal = {Journal of Open Source Software},
+ year = {2024},
+ volume = {9},
+ number = {102},
+ pages = {6782},
+ doi = {10.21105/joss.06782}
+}
+
+@article{Sieber2018DASTool,
+ author = {Sieber, Christopher M. K. and Probst, Alexander J. and Sharrar, Amanda and Thomas, Benjamin C. and Hess, Michelle and Tringe, Susannah G. and Banfield, Jillian F.},
+ title = {Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy},
+ journal = {Nature Microbiology},
+ year = {2018},
+ volume = {3},
+ pages = {836--843},
+ doi = {10.1038/s41564-018-0171-1}
+}
+
+@article{CAMIChallenge2017,
+ author = {Sczyrba, A. and Hofmann, P. and Belmann, P. and et al.},
+ title = {Critical Assessment of Metagenome Interpretation—A benchmark of metagenomics software},
+ journal = {Nature Methods},
+ year = {2017},
+ volume = {14},
+ pages = {1063--1071},
+ doi = {10.1038/nmeth.4458}
+}
+
@article{maxbin2015,
author = {Wu, Yu-Wei and Simmons, Blake A. and Singer, Steven W.},
title = "{MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets}",
diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md
index d79801cc5ac05b..d18eb020786eba 100644
--- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md
+++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md
@@ -1,20 +1,17 @@
---
layout: tutorial_hands_on
title: Binning of metagenomic sequencing data
-zenodo_link: https://zenodo.org/record/7818827
-extra:
- zenodo_link_results: https://zenodo.org/record/7845138
-level: Introductory
+zenodo_link: https://zenodo.org/records/17660820
+level: Intermediate
questions:
- What is metagenomic binning refers to?
-- Which tools should be used for metagenomic binning?
-- How to assess the quality of metagenomic data binning?
+- Which tools may be used for metagenomic binning?
+- How to assess the quality of metagenomic binning?
objectives:
-- Describe what metagenomics binning is
-- Describe common problems in metagenomics binning
-- What software tools are available for metagenomics binning
-- Binning of contigs into metagenome-assembled genomes (MAGs) using MetaBAT 2 software
-- Evaluation of MAG quality and completeness using CheckM software
+- Describe what is metagenomics binning.
+- Describe common challenges in metagenomics binning.
+- Perform metagenomic binning using MetaBAT 2 software.
+- Evaluation of MAG quality and completeness using CheckM software.
time_estimation: 2H
key_points:
- Metagenomics binning is a computational approach to grouping together DNA sequences
@@ -30,8 +27,15 @@ key_points:
of research areas, such as human health, environmental microbiology, and biotechnology
contributions:
authorship:
+ - paulzierep
- npechl
- fpsom
+ - vinisalazar
+requirements:
+ - type: internal
+ topic_name: microbiome
+ tutorials:
+ - metagenomics-assembly
subtopic: metagenomics
tags:
- binning
@@ -56,11 +60,16 @@ recordings:
---
-
Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.
The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins.
+>
+> Before starting this tutorial, it is recommended to do the [**Metagenomics Assembly Tutorial**]({% link topics/microbiome/tutorials/metagenomics-assembly/tutorial.md %})
+{: .comment}
+
+## Binning approaches
+
There are several approaches to metagenomics binning, including:
- **Sequence composition-based binning**: This method is based on the observation that different genomes have distinct sequence composition patterns, such as GC content or codon usage bias. By analyzing these patterns in metagenomic data, sequence fragments can be assigned to individual genomes or groups of genomes.
@@ -73,10 +82,9 @@ There are several approaches to metagenomics binning, including:
- **Supervised machine learning-based binning**: This method uses machine learning algorithms trained on annotated reference genomes to classify metagenomic data into bins. This approach can achieve high accuracy but requires a large number of annotated genomes for training.
-Each of these methods has its strengths and limitations, and the choice of binning method depends on the specific characteristics of the metagenomic data set and the research question being addressed.
-
+## Binning challanges
-**Metagenomics binning is a complex process that involves many steps and can be challenging due to several problems that can occur during the process**. Some of the most common problems encountered in metagenomics binning include:
+Metagenomic binning is a complex process that involves many steps and can be challenging due to several problems that can occur during the process. Some of the most common problems encountered in metagenomic binning include:
- **High complexity**: Metagenomic samples contain DNA from multiple organisms, which can lead to high complexity in the data.
- **Fragmented sequences**: Metagenomic sequencing often generates fragmented sequences, which can make it difficult to assign reads to the correct bin.
@@ -86,29 +94,60 @@ Each of these methods has its strengths and limitations, and the choice of binni
- **Chimeric sequences**: Sequences that are the result of sequencing errors or contamination can lead to chimeric sequences, which can make it difficult to accurately bin reads.
- **Strain variation**: Organisms within a species can exhibit significant genetic variation, which can make it difficult to distinguish between different strains in a metagenomic sample.
-There are plenty of computational tools to perform metafenomics binning. Some of the most widely used include:
+## Common binners
+
+There are plenty of algorithms that perform metagenomic binning. Some of the most widely used include:
+
+* **MaxBin** ({%cite maxbin2015%}): A popular de novo binning algorithm that uses a combination of sequence features and marker genes to cluster contigs into genome bins.
+* **MetaBAT** ({%cite Kang2019%}): Another widely used de novo binning algorithm that employs a hierarchical clustering approach based on tetranucleotide frequency and coverage information.
+* **CONCOCT** ({%cite Alneberg2014%}): A de novo binning tool that uses a clustering algorithm based on sequence composition and coverage information to group contigs into genome bins.
+* **MyCC** ({%cite Lin2016%}): A reference-based binning tool that uses sequence alignment to identify contigs belonging to the same genome or taxonomic group.
+* **GroopM** ({%cite Imelfort2014%}): A hybrid binning tool that combines reference-based and de novo approaches to achieve high binning accuracy.
+* **SemiBin** ({%cite Pan2022%}): A command-line tool for metagenomic binning with deep learning; handles both short and long reads.
+* **Vamb** ({%cite nissen2021improved%}): An algorithm that uses variational autoencoders (VAEs) to encode sequence composition and coverage information.
+* **ComeBin** ({%cite Wang2024COMEBin%}): A metagenomic binning tool that integrates both composition and abundance features with machine learning-based clustering to improve binning accuracy across complex microbial communities.
+
+## Bin refinement
+
+There are also bin refinement tools, which can evaluate, combine, and improve the raw bins produced by primary binners such as MetaBAT2, CONCOCT, MaxBin2, or SemiBin. These tools help remove contamination, merge complementary bins, and recover higher-quality MAGs.
+
+* **MetaWRAP** ({%cite Uritskiy2018%}):
+ A comprehensive metagenomic analysis pipeline that includes modules for quality control, assembly, binning (wrapping multiple binners), refinement, reassembly, and annotation. Provides an easy-to-use framework for producing high-quality MAGs from raw reads.
+
+* **DAS Tool** ({%cite Sieber2018DASTool%}):
+ A bin-refinement tool that combines results from multiple binners (e.g., MetaBAT2, MaxBin2, CONCOCT, SemiBin) into a consensus set of optimized, non-redundant bins. DAS Tool improves overall bin quality by integrating strength from several algorithms.
+
+* **Binnette** ({%cite Mainguy2024Binette%}):
+ Binette is a fast and accurate bin refinement tool that constructs high-quality MAGs from the outputs of multiple binning tools. It generates hybrid bins using set operations on overlapping contigs — intersection, difference, and union — and evaluates their quality with CheckM2 to select the best bins. Compared to metaWRAP, Binette is faster and can process an unlimited number of input bin sets, making it highly scalable for large and complex metagenomic datasets.
+
+## Anvi’o: Interactive bin refinement
+
+**Anvi’o** ({%cite Eren2015%}) is a platform for **interactive visualization and manual refinement** of metagenomic bins. While it can run automated binning (defaulting to **CONCOCT**), its main strength lies in allowing users to:
+
+* Inspect contig-level coverage, GC content, and single-copy gene presence
+* Visualize connections between contigs in a network view
+* Manually merge, split, or reassign contigs to improve bin completeness and reduce contamination
+* Annotate bins and link them to taxonomic or functional information
+
+This interactive approach is particularly useful when automated binning produces ambiguous or low-quality bins, enabling **high-confidence MAG reconstruction**.
+
+## So many options, what binner to use ?
+
+Each of these binning methods has its own strengths and limitations, and the choice of a binning tool often depends on the characteristics of the metagenomic dataset and the research question. Practical guidance on which binner to use for specific datasets and environments can be drawn from benchmark studies such as {%cite NatureBinner2025%}.
+
+{:width="60%"}
+
+Additionally, the CAMI I and II challenges provide standardized simulated datasets that highlight the strengths and weaknesses of different binners, helping researchers select the most appropriate tool for their analysis.
-- **MaxBin** ({%cite maxbin2015%}): A popular de novo binning algorithm that uses a combination of sequence features and marker genes to cluster contigs into genome bins.
-- **MetaBAT** ({%cite Kang2019%}): Another widely used de novo binning algorithm that employs a hierarchical clustering approach based on tetranucleotide frequency and coverage information.
-- **CONCOCT** ({%cite Alneberg2014%}): A de novo binning tool that uses a clustering algorithm based on sequence composition and coverage information to group contigs into genome bins.
-- **MyCC** ({%cite Lin2016%}): A reference-based binning tool that uses sequence alignment to identify contigs belonging to the same genome or taxonomic group.
-- **GroopM** ({%cite Imelfort2014%}): A hybrid binning tool that combines reference-based and de novo approaches to achieve high binning accuracy.
-- **MetaWRAP** ({%cite Uritskiy2018%}): A comprehensive metagenomic analysis pipeline that includes various modules for quality control, assembly, binning, and annotation.
-- **Anvi'o** ({%cite Eren2015%}): A platform for visualizing and analyzing metagenomic data, including features for binning, annotation, and comparative genomics.
-- **SemiBin** ({%cite Pan2022%}): A command tool for metagenomic binning with deep learning, handles both short and long reads.
+{:width="60%"}
-A benchmark study of metagenomics software can be found at {%cite Sczyrba2017%}. MetaBAT 2 outperforms previous MetaBAT and other alternatives in both accuracy and computational efficiency . All are based on default parameters ({%cite Sczyrba2017%}).
+A general approach is to perform binning using multiple binners that have shown good performance for the specific dataset, followed by bin refinement to generate an improved bin set that retains the best bins from the analysis.
-**In this tutorial, we will learn how to run metagenomic binning tools and evaluate the quality of the results**. In order to do that, we will use data from the study: [Temporal shotgun metagenomic dissection of the coffee fermentation ecosystem](https://www.ebi.ac.uk/metagenomics/studies/MGYS00005630#overview) and MetaBAT 2 algorithm. MetaBAT is a popular software tool for metagenomics binning, and there are several reasons why it is often used:
-- *High accuracy*: MetaBAT uses a combination of tetranucleotide frequency, coverage depth, and read linkage information to bin contigs, which has been shown to be highly accurate and efficient.
-- *Easy to use*: MetaBAT has a user-friendly interface and can be run on a standard desktop computer, making it accessible to a wide range of researchers with varying levels of computational expertise.
-- *Flexibility*: MetaBAT can be used with a variety of sequencing technologies, including Illumina, PacBio, and Nanopore, and can be applied to both microbial and viral metagenomes.
-- *Scalability*: MetaBAT can handle large-scale datasets, and its performance has been shown to improve with increasing sequencing depth.
-- *Compatibility*: MetaBAT outputs MAGs in standard formats that can be easily integrated into downstream analyses and tools, such as taxonomic annotation and functional prediction.
+Does using more binners always improve results? In practice, one must also consider computational resources and time constraints. Running many binners can be very time-consuming and resource-intensive, especially for large studies. In some cases, adding extra binners does not lead to a meaningful increase in bin quality, so the choice of binners should be made carefully. Overall, identifying the optimal combination of binners remains an active area of research, and clear, widely accepted guidelines are still being established.
-For an in-depth analysis of the structure and functions of the coffee microbiome, a temporal shotgun metagenomic study (six time points) was performed. The six samples have been sequenced with Illumina MiSeq utilizing whole genome sequencing.
+# Mock binning dataset for this training
-Based on the 6 original dataset of the coffee fermentation system, we generated mock datasets for this tutorial.
+Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases.
>
>
@@ -121,13 +160,30 @@ Based on the 6 original dataset of the coffee fermentation system, we generated
# Prepare analysis history and data
-MetaBAT 2 takes metagenomic sequencing data as input, typically in the form of assembled contigs in fasta format and coverage information in bam format. Specifically, MetaBAT 2 requires two input files:
+Metagenomic binners take typically two data typs as input: assembled contigs in fasta format and coverage information in bam format.
- A fasta file containing the assembled contigs, which can be generated from raw metagenomic sequencing reads using an assembler such as MEGAHIT, SPAdes, or IDBA-UD.
- A bam file containing the read coverage information for each contig, which can be generated from the same sequencing reads using mapping software such as Bowtie2 or BWA.
-MetaBAT 2 also requires a configuration file specifying various parameters and options for the binning process, such as the minimum contig length, the maximum number of clusters to generate, and the maximum expected contamination level.
+> Can Bins be generated without coverage information
+>
+> Not all binners require coverage information — some, like MetaBAT2, can operate using only genomic composition (e.g. tetranucleotide frequencies) when coverage files are not available. This is especially useful for single-sample datasets or legacy data where coverage cannot easily be calculated.
+>
+> Other tools that support composition-only binning include:
+> - **MaxBin 2** (can run with composition alone, but performs better with depth)
+> - **SolidBin** (supports single-sample binning based on sequence features)
+> - **VAMB** (primarily uses deep learning on composition, coverage optional)
+>
+> That said, including coverage information generally increases binning accuracy, especially for:
+> - Differentiating closely related strains
+> - Datasets with uneven abundance
+> - Multi-sample metagenomics workflows (e.g. differential coverage binning)
+>
+> In summary: yes, it’s possible to bin without coverage, but coverage-aware workflows are recommended when available, as they reduce contamination and improve completeness.
+>
+{: .comment}
+
To run binning, we first need to get the data into Galaxy. Any analysis should get its own Galaxy history. So let's start by creating a new one:
@@ -149,15 +205,10 @@ In case of a not very large dataset it's more convenient to upload data directly
> Upload data into Galaxy
>
-> 2. Import the sequence read data (\*.fasta) from [Zenodo]({{ page.zenodo_link }}) or a data library:
+> 1. Import the contig file from [Zenodo]({{ page.zenodo_link }}) or a data library:
>
> ```text
-> {{ page.zenodo_link }}/files/contigs_ERR2231567.fasta
-> {{ page.zenodo_link }}/files/contigs_ERR2231568.fasta
-> {{ page.zenodo_link }}/files/contigs_ERR2231569.fasta
-> {{ page.zenodo_link }}/files/contigs_ERR2231570.fasta
-> {{ page.zenodo_link }}/files/contigs_ERR2231571.fasta
-> {{ page.zenodo_link }}/files/contigs_ERR2231572.fasta
+> {{ page.zenodo_link }}/files/MEGAHIT_contigs.fasta
> ```
>
> {% snippet faqs/galaxy/datasets_import_via_link.md %}
@@ -168,12 +219,58 @@ In case of a not very large dataset it's more convenient to upload data directly
> > In case of large dataset, we can use FTP server or the [Galaxy Rule-based Uploader]({% link topics/galaxy-interface/tutorials/upload-rules/tutorial.md %}).
> {: .comment}
>
-> 3. Create a collection named `Raw reads`, rename your pairs with the sample name
+> 2. Create a collection named `Contigs`
>
> {% snippet faqs/galaxy/collections_build_list.md %}
>
+> 3. Also import the raw reads in fastq format (\*.fasta) from [Zenodo]({{ page.zenodo_link }}) or a data library:
+>
+> ```text
+> {{ page.zenodo_link }}/files/reads_forward.fastqsanger.gz
+> {{ page.zenodo_link }}/files/reads_reverse.fastqsanger.gz
+> ```
+> 4. Create a collection named `Reads`
+>
+> {% snippet faqs/galaxy/collections_build_list_paired.md %}
+{: .hands_on}
+
+> Why do we use collections here?
+> In this tutorial, collections are not strictly necessary because we are working with only one contig file and its paired-end reads. However, in real metagenomic studies, it is common to process many samples—sometimes hundreds or even thousands—and in those cases, collections become essential for managing data efficiently.
+>
+> It is generally good practice to first test a workflow on a small subset of the data (for example, a collection containing only a single sample) to ensure that the tools run correctly and the parameters are appropriate before launching thousands of jobs on Galaxy.
+{: .comment}
+
+# Preparation for binning
+
+As explained before we need coverage information in bam format as a requirement for all binners. Some binners need a specific format for the coverage information, but this will be covered in the version specific to the desired binner. For now we will map the quality controled reads to the contigs to get a bam file with the coverage information. This bam file also needs to be sorted for the downstream binners.
+
+Make sure the reads are quality controlled. E.g. following the QC toturial TODO.
+
+> Map reads to contigs
+>
+> 1. {% tool [Bowtie2](toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.4+galaxy0) %} with the following parameters:
+> - *"Is this single or paired library"*: `Paired-end`
+> - {% icon param-collection %} *"FASTQ Paired Dataset"*: `Reads` (Input dataset collection)
+> - *"Do you want to set paired-end options?"*: `No`
+> - *"Will you select a reference genome from your history or use a built-in index?"*: `Use a genome from the history and build index`
+> - {% icon param-collection %} *"Select reference genome"*: `Contigs` (Input dataset collection)
+> - *"Set read groups information?"*: `Do not set`
+> - *"Select analysis mode"*: `1: Default setting only`
+> - *"Do you want to tweak SAM/BAM Options?"*: `No`
+> - *"Save the bowtie2 mapping statistics to the history"*: `Yes`
+>
{: .hands_on}
+> Sort bam files
+>
+> 1. {% tool [Samtools sort](toolshed.g2.bx.psu.edu/repos/devteam/samtools_sort/samtools_sort/2.0.7) %} with the following parameters:
+> - {% icon param-file %} *"BAM File"*: output of **Bowtie2** {% icon tool %}
+> - *"Primary sort key"*: `coordinate`
+>
+{: .hands_on}
+
+The sorted bam file can be used as input for any of the binning tools.
+
# Binning
As explained before, there are many challenges to metagenomics binning. The most common of them are listed below:
@@ -186,125 +283,117 @@ As explained before, there are many challenges to metagenomics binning. The most
- Chimeric sequences.
- Strain variation.
-{:width="60%"}
+{:width="60%"}
-In this tutorial we will learn how to use **MetaBAT 2** {%cite Kang2019%} tool through Galaxy. **MetaBAT** stands for "Metagenome Binning based on Abundance and Tetranucleotide frequency". It is:
+In this tutorial, we offer dedicated versions, which highlight each of the following binners:
-> Grouping large fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Here we developed automated metagenome binning software, called MetaBAT, which integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency. On synthetic datasets MetaBAT on average achieves 98percent precision and 90% recall at the strain level with 281 near complete unique genomes. Applying MetaBAT to a human gut microbiome data set we recovered 176 genome bins with 92% precision and 80% recall. Further analyses suggest MetaBAT is able to recover genome fragments missed in reference genomes up to 19%, while 53 genome bins are novel. In summary, we believe MetaBAT is a powerful tool to facilitate comprehensive understanding of complex microbial communities.
-{: .quote author="Kang et al, 2019" }
+{% include _includes/cyoa-choices.html option1="MetaBAT2" option2="MaxBin2" option3="SemiBin" option4="CONCOCT" option5="COMEBin" default="MetaBAT2" %}
-We will use the uploaded assembled fasta files as input to the algorithm (For simplicity reasons all other parameters will be preserved with their default values).
+
+{% include topics/microbiome/tutorials/metagenomics-binning/metabet2_version.md %}
+
+
+{% include topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md %}
+
+
+{% include topics/microbiome/tutorials/metagenomics-binning/semibin_version.md %}
+
+
+{% include topics/microbiome/tutorials/metagenomics-binning/concoct_version.md %}
+
+
+{% include topics/microbiome/tutorials/metagenomics-binning/comebin_version.md %}
+
-> Individual binning of short-reads with MetaBAT 2
-> 1. {% tool [MetaBAT 2](toolshed.g2.bx.psu.edu/repos/iuc/megahit/megahit/1.2.9+galaxy0) %} with parameters:
-> - *"Fasta file containing contigs"*: `assembly fasta files`
->
+# Bin refinement
+
+Now, that you have produced bins with your favorite Binning algorithms you can refine the recovered bins.
+Therefore, you need to convert the bins into a contig to bin mapping table, combine the tables from each binner into one collection and
+use Binette to creat consensus bins. An alternative tool would be {% tool [DAS Tool](toolshed.g2.bx.psu.edu/repos/iuc/das_tool/das_tool/1.1.7+galaxy1) %} which is also available in Galaxy.
+
+For the refinement we will use the bins created by all the binners used before. If you do not want to run them all by yourself,
+we have provided the results here as well.
+
+> Get result bins
+>
+> 1. Import the contig file from [Zenodo]({{ page.zenodo_link }}) or a data library:
+>
+> ```text
+> {{ page.zenodo_link }}/files/semibin_0.fasta
+> {{ page.zenodo_link }}/files/maxbin_0.fasta
+> {{ page.zenodo_link }}/files/maxbin_1.fasta
+> {{ page.zenodo_link }}/files/metabat_0.fasta
+> {{ page.zenodo_link }}/files/concoct_1.fasta
+> {{ page.zenodo_link }}/files/concoct_2.fasta
+> {{ page.zenodo_link }}/files/concoct_3.fasta
+> {{ page.zenodo_link }}/files/concoct_4.fasta
+> {{ page.zenodo_link }}/files/concoct_5.fasta
+> {{ page.zenodo_link }}/files/concoct_6.fasta
+> {{ page.zenodo_link }}/files/concoct_7.fasta
+> {{ page.zenodo_link }}/files/concoct_8.fasta
+> {{ page.zenodo_link }}/files/concoct_9.fasta
+> ```
+>
+> 2. Create a collection for each bin set called e.g. maxbin, semibin ... by selecting only the bins created by this binner and creating a collection:
>
+> {% snippet faqs/galaxy/collections_build_list.md %}
{: .hands_on}
-The output files generated by MetaBAT 2 include (some of the files below are optional and not produced unless it is required by the user):
+Once each bin set is converting into one collection they can be converted into a contig to bin mapping table. Perform this step for every bin set.
-1. The final set of genome bins in FASTA format (`.fa`)
-2. A summary file with information on each genome bin, including its length, completeness, contamination, and taxonomy classification (`.txt`)
-3. A file with the mapping results showing how each contig was assigned to a genome bin (`.bam`)
-4. A file containing the abundance estimation of each genome bin (`.txt`)
-5. A file with the coverage profile of each genome bin (`.txt`)
-6. A file containing the nucleotide composition of each genome bin (`.txt`)
-7. A file with the predicted gene sequences of each genome bin (`.faa`)
+> Convert the bins into a contig to bin mapping table
+>
+> 1. {% tool [Converts genome bins in fasta format](toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1) %} with the following parameters:
+> - {% icon param-file %} *"Bin sequences"*: `bins` (output of any of the binners {% icon tool %})
+>
+{: .hands_on}
-These output files can be further analyzed and used for downstream applications such as functional annotation, comparative genomics, and phylogenetic analysis.
->
+> Build a list of the binning tables
>
-> Since the binning process would take some we are just going to import the results of the binning previously run.
+> 1. {% tool [Build list](__BUILD_LIST__) %} with the following parameters:
+> - In *"Dataset"*:
+> - {% icon param-repeat %} *"Insert Dataset"*
+> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of SemiBin {% icon tool %})
+> - *"Label to use"*: `Index`
+> - {% icon param-repeat %} *"Insert Dataset"*
+> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of MetaBAT2 {% icon tool %})
+> - *"Label to use"*: `Index`
+> - {% icon param-repeat %} *"Insert Dataset"*
+> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of MaxBin2 {% icon tool %})
+> - *"Label to use"*: `Index`
+> - {% icon param-repeat %} *"Insert Dataset"*
+> - {% icon param-file %} *"Input Dataset"*: `contigs2bin` (output of **Converts genome bins in fasta format** of CONCOCT {% icon tool %})
+> - *"Label to use"*: `Index`
>
-> > Import generated assembly files
-> >
-> > 1. Import the six folders containg binning result files from [Zenodo]({{ page.extra.zenodo_link_results }}) or the Shared Data library:
-> >
-> > ```text
-> > {{ page.extra.zenodo_link_results }}/files/26_%20MetaBAT2%20on%20data%20ERR2231567_%20Bins.zip
-> > {{ page.extra.zenodo_link_results }}/files/38_%20MetaBAT2%20on%20data%20ERR2231568_%20Bins.zip
-> > {{ page.extra.zenodo_link_results }}/files/47_%20MetaBAT2%20on%20data%20ERR2231569_%20Bins.zip
-> > {{ page.extra.zenodo_link_results }}/files/57_%20MetaBAT2%20on%20data%20ERR2231570_%20Bins.zip
-> > {{ page.extra.zenodo_link_results }}/files/65_%20MetaBAT2%20on%20data%20ERR2231571_%20Bins.zip
-> > {{ page.extra.zenodo_link_results }}/files/74_%20MetaBAT2%20on%20data%20ERR2231572_%20Bins.zip
-> > ```
-> >
-> >
-> > 2. Create a collection named `MEGAHIT Contig`, rename your pairs with the sample name
-> >
-> {: .hands_on}
-{: .comment}
+{: .hands_on}
->
+> Refine with Binette
>
-> 1. How many bins has been for ERR2231567 sample?
-> 2. How many sequences are contained in the second bin?
+> 1. {% tool [Binette](toolshed.g2.bx.psu.edu/repos/iuc/binette/binette/1.2.0+galaxy0) %} with the following parameters:
+> - {% icon param-file %} *"Input contig table"*: `output` (output of **Build list** {% icon tool %})
+> - {% icon param-collection %} *"Input contig file"*: `output` (Input dataset collection)
+> - *"Select if database should be used either via file or cached database"*: `cached database`
+>
+{: .hands_on}
+
+> Bin refinement
+>
+> 1. How many bins are left after refinement ?
>
> >
> >
-> > 1. There are 6 bins identified
-> > 2. 167 sequences are classified into the second bin.
+> > 1. Two bins are left. Most contigs from different bins where combined into one bin. There is still one single contig bin left.
> >
> {: .solution}
>
{: .question}
-# De-replication
-
-De-replication is the process of identifying sets of genomes that are the "same" in a list of genomes, and removing all but the “best” genome from each redundant set. How similar genomes need to be to be considered “same”, how to determine which genome is “best”, and other important decisions are discussed in [Important Concepts](https://drep.readthedocs.io/en/latest/choosing_parameters.html).
-
-A common use for genome de-replication is the case of individual assembly of metagenomic data. If metagenomic samples are collected in a series, a common way to assemble the short reads is with a “co-assembly”. That is, combining the reads from all samples and assembling them together. The problem with this is assembling similar strains together can severely fragment assemblies, precluding recovery of a good genome bin. An alternative option is to assemble each sample separately, and then “de-replicate” the bins from each assembly to make a final genome set.
-
-{:width="80%"}
-
-MetaBAT 2 does not explicitly perform dereplication in the sense of identifying groups of identical or highly similar genomes in a given dataset. Instead, MetaBAT 2 focuses on improving the accuracy of binning by leveraging various features such as read coverage, differential coverage across samples, and sequence composition. It aims to distinguish between different genomes present in the metagenomic dataset and assign contigs to the appropriate bins.
-
-Several tools have been designed for the proccess of de-replication. **`dRep`** is a software tool designed for the dereplication of genomes in metagenomic datasets. The goal is to retain a representative set of genomes to improve downstream analyses, such as taxonomic profiling and functional annotation.
-
-An typical workflow of how `dRep` works for dereplication in metagenomics includes:
-
-- *Genome Comparison*: `dRep` uses a pairwise genome comparison approach to assess the similarity between genomes in a given metagenomic dataset.
-
-- *Clustering*: Based on the genome similarities, `dRep` performs clustering to group similar genomes into "genome clusters." Each cluster represents a group of closely related genomes.
-
-- *Genome Quality Assessment*: `dRep` evaluates the quality of each genome within a cluster. It considers factors such as completeness, contamination, and strain heterogeneity.
-
-- *Genome Selection*: Within each genome cluster, `dRep` selects a representative genome based on user-defined criteria. This representative genome is considered as the "dereplicated" version of the cluster.
-
-- *Dereplication Output*: The output of `dRep` includes information about the dereplicated genomes, including their identity, completeness, and contamination. The user can choose a threshold for genome similarity to control the level of dereplication.
-
-> General list of actions for de-replication
-> 1. Create new history
-> 2. Assemble each sample separately using your favorite assembler
-> 3. Perform a co-assembly to catch low-abundance microbes
-> 4. Bin each assembly separately using your favorite binner
-> 5. Bin co-assembly using your favorite binner
-> 6. Pull the bins from all assemblies together
-> 7. rRun **`dRep`** on them
-> 8. Perform downstream analysis on the de-replicated genome list
->
-{: .hands_on}
-
-
# Checking the quality of the bins
Once binning is done, it is important to check its quality.
-Binning results can be evaluated with **CheckM** ({%cite Parks2015%}). CheckM is a software tool used in metagenomics binning to assess the completeness and contamination of genome bins. Metagenomics binning is the process of separating DNA fragments from a mixed community of microorganisms into individual bins, each representing a distinct genome.
+Binning results can be evaluated with **CheckM** ({%cite Parks2015%}). CheckM is a software tool used in metagenomics binning to assess the completeness and contamination of genome bins.
CheckM compares the genome bins to a set of universal single-copy marker genes that are present in nearly all bacterial and archaeal genomes. By identifying the presence or absence of these marker genes in the bins, CheckM can estimate the completeness of each genome bin (i.e., the percentage of the total set of universal single-copy marker genes that are present in the bin) and the degree of contamination (i.e., the percentage of marker genes that are found in more than one bin).
@@ -324,48 +413,17 @@ Based on the previous analysis we will use **CheckM lineage_wf**: *Assessing the
`CheckM lineage_wf` is a specific workflow within the CheckM software tool that is used for taxonomic classification of genome bins based on their marker gene content. This workflow uses a reference database of marker genes and taxonomic information to classify the genome bins at different taxonomic levels, from domain to species.
+Now you can investigate the completeness and contamination of any of your previously generated genome bins as well as the refined set.
+
> Assessing the completeness and contamination of genome bins using lineage-specific marker sets with `CheckM lineage_wf`
> 1. {% tool [CheckM lineage_wf](toolshed.g2.bx.psu.edu/repos/iuc/checkm_lineage_wf/checkm_lineage_wf/1.2.0+galaxy0) %} with parameters:
> - *"Bins"*: `Folder containing the produced bins`
->
->
{: .hands_on}
->
->
-> Since the CheckM process would take some time we are just going to import the results:
->
-> > Import generated `CheckM lineage_wf` results
-> >
-> > 1. Import the `CheckM lineage_wf` report files from [Zenodo]({{ page.extra.zenodo_link_results }}) or the Shared Data library:
-> >
-> > ```text
-> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231567__Bin_statistics.txt
-> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231568__Bin_statistics.txt
-> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231569__Bin_statistics.txt
-> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231570__Bin_statistics.txt
-> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231571__Bin_statistics.txt
-> > {{ page.extra.zenodo_link_results }}/files/CheckM_lineage_wf_on_data_ERR2231572__Bin_statistics.txt
-> > ```
-> >
-> {: .hands_on}
-{: .comment}
The output of "CheckM lineage_wf" includes several files and tables that provide information about the taxonomic classification and quality assessment of genome bins. Here are some of the key outputs:
-- **CheckM Lineage Workflow Output Report**: This report provides a summary of the quality assessment performed by CheckM. It includes statistics such as the number of genomes analyzed, their completeness, contamination, and other quality metrics.
+- **CheckM Lineage Workflow Output Report (Bin statistics)**: This report provides a summary of the quality assessment performed by CheckM. It includes statistics such as the number of genomes analyzed, their completeness, contamination, and other quality metrics.
- **Lineage-specific Quality Assessment**: CheckM generates lineage-specific quality assessment files for each analyzed genome. These files contain detailed information about the completeness and contamination of the genome based on its taxonomic lineage.
@@ -377,13 +435,103 @@ The output of "CheckM lineage_wf" includes several files and tables that provide
It should be noted that "CheckM lineage_wf" offers a range of optional outputs that can be generated to provide additional information to the user.
-
-
# Conclusions
-In summary, this tutorial shows a step-by-step on how to bin metagenomic contigs using MetaBAT 2.
+In summary, this tutorial shows a step-by-step on how to bin metagenomic contigs using various Binners, including Bin refinement.
It is critical to select the appropriate binning tool for a specific metagenomics study, as different binning methods may have different strengths and limitations depending on the type of metagenomic data being analyzed. By comparing the outcomes of several binning techniques, researchers can increase the precision and accuracy of genome binning.
@@ -409,4 +555,4 @@ There are various binning methods available for metagenomic data, including refe
Comparing the outcomes of multiple binning methods can help to identify the most accurate and reliable method for a specific study. This can be done by evaluating the quality of the resulting bins in terms of completeness, contamination, and strain heterogeneity, as well as by comparing the composition and functional profiles of the identified genomes.
-Overall, by carefully selecting and comparing binning methods, researchers can improve the quality and reliability of genome bins, which can ultimately lead to a better understanding of the functional and ecological roles of microbial communities in various environments.
+Overall, by carefully selecting and comparing binning methods, researchers can improve the quality and reliability of genome bins, which can ultimately lead to a better understanding of the functional and ecological roles of microbial communities in various environments.
\ No newline at end of file
diff --git a/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning-tests.yml b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning-tests.yml
new file mode 100644
index 00000000000000..09939e694c3f0a
--- /dev/null
+++ b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning-tests.yml
@@ -0,0 +1,31 @@
+- doc: Test outline for Assembly-of-metagenomic-sequencing-data
+ job:
+ Trimmed reads:
+ class: Collection
+ collection_type: list:paired
+ elements:
+ - class: Collection
+ type: paired
+ identifier: ERR2231567
+ elements:
+ - class: File
+ identifier: forward
+ path: https://zenodo.org/records/17660820/files/reads_forward.fastqsanger.gz
+ - class: File
+ identifier: reverse
+ path: https://zenodo.org/records/17660820/files/reads_reverse.fastqsanger.gz
+ Assemblies:
+ class: Collection
+ collection_type: list
+ elements:
+ - class: File
+ identifier: ERR2231567
+ path: https://zenodo.org/records/17660820/files/MEGAHIT_contigs.fasta
+
+ outputs:
+ final:
+ asserts:
+ has_text:
+ text: "binette_bin1"
+ has_text:
+ text: "16.69"
diff --git a/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning.ga b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning.ga
new file mode 100644
index 00000000000000..268bf7ee1e50a3
--- /dev/null
+++ b/topics/microbiome/tutorials/metagenomics-binning/workflows/Metagenomic-Binning.ga
@@ -0,0 +1,1111 @@
+{
+ "a_galaxy_workflow": "true",
+ "annotation": "Binning workflows that uses abundance information and performs binning of metagenomic contigs using 4 different binners as well as bin refinement.",
+ "comments": [
+ {
+ "color": "orange",
+ "data": {
+ "text": "# Bin refinement"
+ },
+ "id": 5,
+ "position": [
+ 2404.7906885054667,
+ 1141.6
+ ],
+ "size": [
+ 1271,
+ 1458
+ ],
+ "type": "markdown"
+ },
+ {
+ "color": "none",
+ "data": {
+ "text": "# CONCOCT\n"
+ },
+ "id": 0,
+ "position": [
+ 680.4906885054666,
+ 0
+ ],
+ "size": [
+ 1713,
+ 597
+ ],
+ "type": "markdown"
+ },
+ {
+ "color": "lime",
+ "data": {
+ "text": "# Mapping"
+ },
+ "id": 1,
+ "position": [
+ 11.290688505466562,
+ 1078.1999999999998
+ ],
+ "size": [
+ 692,
+ 415
+ ],
+ "type": "markdown"
+ },
+ {
+ "color": "red",
+ "data": {
+ "text": "# MetaBAT2\n"
+ },
+ "id": 2,
+ "position": [
+ 1496.8906885054666,
+ 656.9
+ ],
+ "size": [
+ 541,
+ 404
+ ],
+ "type": "markdown"
+ },
+ {
+ "color": "pink",
+ "data": {
+ "text": "# MaxBin2"
+ },
+ "id": 3,
+ "position": [
+ 1533.5906885054667,
+ 1150.1999999999998
+ ],
+ "size": [
+ 587,
+ 490
+ ],
+ "type": "markdown"
+ },
+ {
+ "color": "lime",
+ "data": {
+ "text": "# SemiBin"
+ },
+ "id": 4,
+ "position": [
+ 1089.8906885054666,
+ 1704.6999999999998
+ ],
+ "size": [
+ 887,
+ 532
+ ],
+ "type": "markdown"
+ }
+ ],
+ "creator": [
+ {
+ "class": "Person",
+ "identifier": "https://orcid.org/0000-0003-2982-388X",
+ "name": "Paul Zierep"
+ }
+ ],
+ "format-version": "0.1",
+ "help": "",
+ "license": "MIT",
+ "name": "Metagenomic Binning",
+ "readme": "",
+ "report": {
+ "markdown": "\n# Workflow Execution Report\n\n## Workflow Inputs\n```galaxy\ninvocation_inputs()\n```\n\n## Workflow Outputs\n```galaxy\ninvocation_outputs()\n```\n\n## Workflow\n```galaxy\nworkflow_display()\n```\n"
+ },
+ "steps": {
+ "0": {
+ "annotation": "Samples grouped for co-assembly. For individual assembly use same reads as `Trimmed reads input`. The tool fastq_groupmerge can be used to perform the grouping.",
+ "content_id": null,
+ "errors": null,
+ "id": 0,
+ "input_connections": {},
+ "inputs": [
+ {
+ "description": "Samples grouped for co-assembly. For individual assembly use same reads as `Trimmed reads input`. The tool fastq_groupmerge can be used to perform the grouping.",
+ "name": "Trimmed reads"
+ }
+ ],
+ "label": "Trimmed reads",
+ "name": "Input dataset collection",
+ "outputs": [],
+ "position": {
+ "left": 0,
+ "top": 896.6187286877193
+ },
+ "tool_id": null,
+ "tool_state": "{\"optional\": false, \"tag\": null, \"collection_type\": \"list:paired\", \"fields\": null}",
+ "tool_version": null,
+ "type": "data_collection_input",
+ "uuid": "dd8faa6e-3f29-4fa2-befb-38ef4a7832b5",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "1": {
+ "annotation": "CONCOCT requires the read length for coverage. Best use fastQC to estimate the mean value.",
+ "content_id": null,
+ "errors": null,
+ "id": 1,
+ "input_connections": {},
+ "inputs": [
+ {
+ "description": "CONCOCT requires the read length for coverage. Best use fastQC to estimate the mean value.",
+ "name": "Read length (CONCOCT)"
+ }
+ ],
+ "label": "Read length (CONCOCT)",
+ "name": "Input parameter",
+ "outputs": [],
+ "position": {
+ "left": 1097.0837390044621,
+ "top": 41.539782924321685
+ },
+ "tool_id": null,
+ "tool_state": "{\"default\": 100, \"validators\": [{\"min\": null, \"max\": null, \"negate\": false, \"type\": \"in_range\"}], \"parameter_type\": \"integer\", \"optional\": false}",
+ "tool_version": null,
+ "type": "parameter_input",
+ "uuid": "30109f1d-816b-4f85-a0b3-e54506ae32ae",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "2": {
+ "annotation": "This workflow allows using a custom assembly as input. If provided, select `custom assembly` as Assembler.\nProvide one assembly for each group of trimmed input reads.",
+ "content_id": null,
+ "errors": null,
+ "id": 2,
+ "input_connections": {},
+ "inputs": [
+ {
+ "description": "This workflow allows using a custom assembly as input. If provided, select `custom assembly` as Assembler.\nProvide one assembly for each group of trimmed input reads.",
+ "name": "Assemblies"
+ }
+ ],
+ "label": "Assemblies",
+ "name": "Input dataset collection",
+ "outputs": [],
+ "position": {
+ "left": 248.96247766826204,
+ "top": 1670.8582741820105
+ },
+ "tool_id": null,
+ "tool_state": "{\"optional\": false, \"tag\": null, \"collection_type\": \"list\", \"fields\": null}",
+ "tool_version": null,
+ "type": "data_collection_input",
+ "uuid": "e2f5ad16-674f-4687-94a4-a5e55680440a",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "3": {
+ "annotation": "Environment for the built-in model (SemiBin), options are: human_gut, dog_gut, ocean, soil, cat_gut, human_oral, mouse_gut, pig_gut, built_environment, wastewater, chicken_caecum, global",
+ "content_id": null,
+ "errors": null,
+ "id": 3,
+ "input_connections": {},
+ "inputs": [
+ {
+ "description": "Environment for the built-in model (SemiBin), options are: human_gut, dog_gut, ocean, soil, cat_gut, human_oral, mouse_gut, pig_gut, built_environment, wastewater, chicken_caecum, global",
+ "name": "Environment for the built-in model (SemiBin)"
+ }
+ ],
+ "label": "Environment for the built-in model (SemiBin)",
+ "name": "Input parameter",
+ "outputs": [],
+ "position": {
+ "left": 1139.1165875728577,
+ "top": 2094.707949827118
+ },
+ "tool_id": null,
+ "tool_state": "{\"default\": \"global\", \"multiple\": false, \"validators\": [], \"restrictions\": [\"global\", \"human_gut\", \"dog_gut\", \"ocean\", \"soil\", \"cat_gut\", \"human_oral\", \"mouse_gut\", \"pig_gut\", \"built_environment\", \"wastewater\", \"chicken_caecum\"], \"parameter_type\": \"text\", \"optional\": false}",
+ "tool_version": null,
+ "type": "parameter_input",
+ "uuid": "86fc3246-f756-49ce-b196-ffa358e5ac41",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "4": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_cut_up_fasta/concoct_cut_up_fasta/1.1.0+galaxy2",
+ "errors": null,
+ "id": 4,
+ "input_connections": {
+ "input_fasta": {
+ "id": 2,
+ "output_name": "output"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "CONCOCT: Cut up contigs",
+ "outputs": [
+ {
+ "name": "output_fasta",
+ "type": "fasta"
+ },
+ {
+ "name": "output_bed",
+ "type": "bed"
+ }
+ ],
+ "position": {
+ "left": 749.6942143874219,
+ "top": 224.77559328535233
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_cut_up_fasta/concoct_cut_up_fasta/1.1.0+galaxy2",
+ "tool_shed_repository": {
+ "changeset_revision": "4d8bc5dd9e95",
+ "name": "concoct_cut_up_fasta",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"bedfile\": true, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"chunk_size\": \"10000\", \"input_fasta\": {\"__class__\": \"ConnectedValue\"}, \"input_fasta|__identifier__\": \"ERR2231567.fastqsanger\", \"merge_last\": true, \"overlap_size\": \"0\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.0+galaxy2",
+ "type": "tool",
+ "uuid": "3d11cf76-8a00-4aab-8cee-14647ad02165",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "5": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.4+galaxy0",
+ "errors": null,
+ "id": 5,
+ "input_connections": {
+ "library|input_1": {
+ "id": 0,
+ "output_name": "output"
+ },
+ "reference_genome|own_file": {
+ "id": 2,
+ "output_name": "output"
+ }
+ },
+ "inputs": [
+ {
+ "description": "runtime parameter for tool Bowtie2",
+ "name": "library"
+ },
+ {
+ "description": "runtime parameter for tool Bowtie2",
+ "name": "reference_genome"
+ }
+ ],
+ "label": null,
+ "name": "Bowtie2",
+ "outputs": [
+ {
+ "name": "output",
+ "type": "bam"
+ },
+ {
+ "name": "mapping_stats",
+ "type": "txt"
+ }
+ ],
+ "position": {
+ "left": 170.4540358517052,
+ "top": 1189.3571323702458
+ },
+ "post_job_actions": {
+ "HideDatasetActionoutput": {
+ "action_arguments": {},
+ "action_type": "HideDatasetAction",
+ "output_name": "output"
+ }
+ },
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.4+galaxy0",
+ "tool_shed_repository": {
+ "changeset_revision": "f76cbb84d67f",
+ "name": "bowtie2",
+ "owner": "devteam",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"analysis_type\": {\"analysis_type_selector\": \"simple\", \"__current_case__\": 0, \"presets\": \"no_presets\"}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"library\": {\"type\": \"paired_collection\", \"__current_case__\": 1, \"input_1\": {\"__class__\": \"ConnectedValue\"}, \"unaligned_file\": false, \"aligned_file\": false, \"paired_options\": {\"paired_options_selector\": \"no\", \"__current_case__\": 1}}, \"own_file|__identifier__\": \"ERR2231567.fastqsanger\", \"reference_genome\": {\"source\": \"history\", \"__current_case__\": 1, \"own_file\": {\"__class__\": \"ConnectedValue\"}}, \"rg\": {\"rg_selector\": \"do_not_set\", \"__current_case__\": 3}, \"sam_options\": {\"sam_options_selector\": \"no\", \"__current_case__\": 1}, \"save_mapping_stats\": true, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "2.5.4+galaxy0",
+ "type": "tool",
+ "uuid": "e41a21d2-91c7-42e3-8072-e254280733ab",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "6": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/samtools_sort/samtools_sort/2.0.7",
+ "errors": null,
+ "id": 6,
+ "input_connections": {
+ "input1": {
+ "id": 5,
+ "output_name": "output"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "Samtools sort",
+ "outputs": [
+ {
+ "name": "output1",
+ "type": "bam"
+ }
+ ],
+ "position": {
+ "left": 425.0155087709174,
+ "top": 1226.3643944120972
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/samtools_sort/samtools_sort/2.0.7",
+ "tool_shed_repository": {
+ "changeset_revision": "f2f2650aeade",
+ "name": "samtools_sort",
+ "owner": "devteam",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input1\": {\"__class__\": \"ConnectedValue\"}, \"input1|__identifier__\": \"ERR2231567.fastqsanger\", \"minhash\": false, \"prim_key_cond\": {\"prim_key_select\": \"\", \"__current_case__\": 0}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "2.0.7",
+ "type": "tool",
+ "uuid": "863d5ee6-9eb0-4ac7-a634-9e258807f8cb",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "7": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_coverage_table/concoct_coverage_table/1.1.0+galaxy2",
+ "errors": null,
+ "id": 7,
+ "input_connections": {
+ "bedfile": {
+ "id": 4,
+ "output_name": "output_bed"
+ },
+ "mode|bamfile": {
+ "id": 6,
+ "output_name": "output1"
+ }
+ },
+ "inputs": [
+ {
+ "description": "runtime parameter for tool CONCOCT: Generate the input coverage table",
+ "name": "mode"
+ }
+ ],
+ "label": null,
+ "name": "CONCOCT: Generate the input coverage table",
+ "outputs": [
+ {
+ "name": "output",
+ "type": "tabular"
+ }
+ ],
+ "position": {
+ "left": 1135.7679978931717,
+ "top": 217.73566551102363
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_coverage_table/concoct_coverage_table/1.1.0+galaxy2",
+ "tool_shed_repository": {
+ "changeset_revision": "fd31cd168efc",
+ "name": "concoct_coverage_table",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"bedfile\": {\"__class__\": \"ConnectedValue\"}, \"bedfile|__identifier__\": \"ERR2231567.fastqsanger\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"mode\": {\"type\": \"individual\", \"__current_case__\": 0, \"bamfile\": {\"__class__\": \"ConnectedValue\"}}, \"mode|bamfile|__identifier__\": \"ERR2231567.fastqsanger\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.0+galaxy2",
+ "type": "tool",
+ "uuid": "11dcb0bb-7626-4c97-a15c-0366194f1cea",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "8": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2_jgi_summarize_bam_contig_depths/metabat2_jgi_summarize_bam_contig_depths/2.17+galaxy0",
+ "errors": null,
+ "id": 8,
+ "input_connections": {
+ "mode|bam_indiv_input": {
+ "id": 6,
+ "output_name": "output1"
+ }
+ },
+ "inputs": [
+ {
+ "description": "runtime parameter for tool Calculate contig depths",
+ "name": "mode"
+ }
+ ],
+ "label": null,
+ "name": "Calculate contig depths",
+ "outputs": [
+ {
+ "name": "outputDepth",
+ "type": "tabular"
+ }
+ ],
+ "position": {
+ "left": 896.6290086604289,
+ "top": 924.5426879033071
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2_jgi_summarize_bam_contig_depths/metabat2_jgi_summarize_bam_contig_depths/2.17+galaxy0",
+ "tool_shed_repository": {
+ "changeset_revision": "00e3b4ef7e0c",
+ "name": "metabat2_jgi_summarize_bam_contig_depths",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"advanced\": {\"percentIdentity\": \"97\", \"output_paired_contigs\": false, \"noIntraDepthVariance\": false, \"showDepth\": false, \"minMapQual\": \"0\", \"weightMapQual\": \"0.0\", \"includeEdgeBases\": false, \"maxEdgeBases\": \"75\"}, \"bam_indiv_input|__identifier__\": \"ERR2231567.fastqsanger\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"mode\": {\"type\": \"individual\", \"__current_case__\": 0, \"bam_indiv_input\": {\"__class__\": \"ConnectedValue\"}, \"use_reference_cond\": {\"use_reference\": \"no\", \"__current_case__\": 0}}, \"shredding\": {\"shredLength\": \"16000\", \"shredDepth\": \"5\", \"minContigLength\": \"1\", \"minContigDepth\": \"0.0\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "2.17+galaxy0",
+ "type": "tool",
+ "uuid": "99aabfaa-9e98-45e2-83f7-3009b959603d",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "9": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/semibin/semibin/2.1.0+galaxy1",
+ "errors": null,
+ "id": 9,
+ "input_connections": {
+ "mode|environment": {
+ "id": 3,
+ "output_name": "output"
+ },
+ "mode|input_bam": {
+ "id": 6,
+ "output_name": "output1"
+ },
+ "mode|input_fasta": {
+ "id": 2,
+ "output_name": "output"
+ }
+ },
+ "inputs": [
+ {
+ "description": "runtime parameter for tool SemiBin",
+ "name": "mode"
+ },
+ {
+ "description": "runtime parameter for tool SemiBin",
+ "name": "mode"
+ },
+ {
+ "description": "runtime parameter for tool SemiBin",
+ "name": "mode"
+ }
+ ],
+ "label": null,
+ "name": "SemiBin",
+ "outputs": [
+ {
+ "name": "output_bins",
+ "type": "input"
+ },
+ {
+ "name": "single_data",
+ "type": "csv"
+ },
+ {
+ "name": "single_data_split",
+ "type": "csv"
+ },
+ {
+ "name": "single_cov",
+ "type": "csv"
+ },
+ {
+ "name": "single_split_cov",
+ "type": "csv"
+ }
+ ],
+ "position": {
+ "left": 1644.2381997196558,
+ "top": 1785.1442996053324
+ },
+ "post_job_actions": {
+ "TagDatasetActionoutput_bins": {
+ "action_arguments": {
+ "tags": "sample-bins"
+ },
+ "action_type": "TagDatasetAction",
+ "output_name": "output_bins"
+ }
+ },
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/semibin/semibin/2.1.0+galaxy1",
+ "tool_shed_repository": {
+ "changeset_revision": "afee33334a63",
+ "name": "semibin",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"annot\": {\"ml_threshold\": null}, \"bin\": {\"max_node\": \"1.0\", \"max_edges\": \"200\", \"minfasta_kbs\": \"200\"}, \"extra_output\": [\"data\", \"coverage\"], \"min_len\": {\"method\": \"automatic\", \"__current_case__\": 0}, \"mode\": {\"select\": \"single\", \"__current_case__\": 0, \"input_fasta\": {\"__class__\": \"ConnectedValue\"}, \"input_bam\": {\"__class__\": \"ConnectedValue\"}, \"ref\": {\"select\": \"ml\", \"__current_case__\": 2}, \"environment\": {\"__class__\": \"ConnectedValue\"}}, \"orf_finder\": \"fast-naive\", \"random_seed\": \"0\", \"training\": {\"epoches\": \"20\", \"batch_size\": \"2048\"}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "2.1.0+galaxy1",
+ "type": "tool",
+ "uuid": "6fd5ada8-8884-4988-a895-a6c41172022d",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "10": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct/concoct/1.1.0+galaxy2",
+ "errors": null,
+ "id": 10,
+ "input_connections": {
+ "advanced|read_length": {
+ "id": 1,
+ "output_name": "output"
+ },
+ "composition_file": {
+ "id": 4,
+ "output_name": "output_fasta"
+ },
+ "coverage_file": {
+ "id": 7,
+ "output_name": "output"
+ }
+ },
+ "inputs": [
+ {
+ "description": "runtime parameter for tool CONCOCT",
+ "name": "advanced"
+ }
+ ],
+ "label": null,
+ "name": "CONCOCT",
+ "outputs": [
+ {
+ "name": "output_clustering",
+ "type": "csv"
+ },
+ {
+ "name": "output_pca_components",
+ "type": "csv"
+ },
+ {
+ "name": "output_pca_transformed",
+ "type": "csv"
+ }
+ ],
+ "position": {
+ "left": 1516.0490446249091,
+ "top": 184.3040223201906
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct/concoct/1.1.0+galaxy2",
+ "tool_shed_repository": {
+ "changeset_revision": "eae7ee167917",
+ "name": "concoct",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"advanced\": {\"clusters\": \"400\", \"kmer_length\": \"4\", \"length_threshold\": \"1000\", \"read_length\": {\"__class__\": \"ConnectedValue\"}, \"total_percentage_pca\": \"90\", \"seed\": \"1\", \"iterations\": \"500\", \"no_cov_normalization\": false}, \"composition_file\": {\"__class__\": \"ConnectedValue\"}, \"coverage_file\": {\"__class__\": \"ConnectedValue\"}, \"output\": {\"no_total_coverage\": false, \"converge_out\": false, \"log\": false}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.0+galaxy2",
+ "type": "tool",
+ "uuid": "f4094547-c644-4657-a908-25e789703384",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "11": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2/metabat2/2.17+galaxy0",
+ "errors": null,
+ "id": 11,
+ "input_connections": {
+ "advanced|base_coverage_depth_cond|abdFile": {
+ "id": 8,
+ "output_name": "outputDepth"
+ },
+ "inFile": {
+ "id": 2,
+ "output_name": "output"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "MetaBAT2",
+ "outputs": [
+ {
+ "name": "bins",
+ "type": "input"
+ },
+ {
+ "name": "lowDepth",
+ "type": "fasta"
+ },
+ {
+ "name": "tooShort",
+ "type": "fasta"
+ },
+ {
+ "name": "unbinned",
+ "type": "fasta"
+ },
+ {
+ "name": "process_log",
+ "type": "txt"
+ }
+ ],
+ "position": {
+ "left": 1755.1233348021783,
+ "top": 667.814925017154
+ },
+ "post_job_actions": {
+ "TagDatasetActionbins": {
+ "action_arguments": {
+ "tags": "sample-bins"
+ },
+ "action_type": "TagDatasetAction",
+ "output_name": "bins"
+ }
+ },
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/metabat2/metabat2/2.17+galaxy0",
+ "tool_shed_repository": {
+ "changeset_revision": "f375b4f6ef57",
+ "name": "metabat2",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"advanced\": {\"base_coverage_depth_cond\": {\"base_coverage_depth\": \"yes\", \"__current_case__\": 1, \"abdFile\": {\"__class__\": \"ConnectedValue\"}, \"cvExt\": null}, \"minContig\": \"1500\", \"maxP\": \"95\", \"minS\": \"60\", \"maxEdges\": \"200\", \"pTNF\": \"0\", \"noAdd\": false, \"minCV\": \"1.0\", \"minCVSum\": \"1.0\", \"seed\": \"0\"}, \"advanced|abdFile|__identifier__\": \"ERR2231567.fastqsanger\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inFile\": {\"__class__\": \"ConnectedValue\"}, \"inFile|__identifier__\": \"ERR2231567.fastqsanger\", \"out\": {\"minClsSize\": \"200000\", \"onlyLabel\": false, \"saveCls\": false, \"extra_outputs\": [\"lowDepth\", \"tooShort\", \"unbinned\", \"log\"]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "2.17+galaxy0",
+ "type": "tool",
+ "uuid": "12715693-dcaa-4097-bd1e-61bfaa0fbe42",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "12": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/mbernt/maxbin2/maxbin2/2.2.7+galaxy6",
+ "errors": null,
+ "id": 12,
+ "input_connections": {
+ "assembly|inputs|abund": {
+ "id": 8,
+ "output_name": "outputDepth"
+ },
+ "contig": {
+ "id": 2,
+ "output_name": "output"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "MaxBin2",
+ "outputs": [
+ {
+ "name": "bins",
+ "type": "input"
+ },
+ {
+ "name": "markers",
+ "type": "input"
+ },
+ {
+ "name": "noclass",
+ "type": "fasta"
+ },
+ {
+ "name": "toshort",
+ "type": "fasta"
+ },
+ {
+ "name": "summary",
+ "type": "tabular"
+ },
+ {
+ "name": "log",
+ "type": "txt"
+ },
+ {
+ "name": "marker",
+ "type": "tabular"
+ },
+ {
+ "name": "plot",
+ "type": "pdf"
+ }
+ ],
+ "position": {
+ "left": 1670.4872771406842,
+ "top": 1205.5486143537003
+ },
+ "post_job_actions": {
+ "TagDatasetActionbins": {
+ "action_arguments": {
+ "tags": "sample-bins"
+ },
+ "action_type": "TagDatasetAction",
+ "output_name": "bins"
+ }
+ },
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/mbernt/maxbin2/maxbin2/2.2.7+galaxy6",
+ "tool_shed_repository": {
+ "changeset_revision": "0917b2d6010d",
+ "name": "maxbin2",
+ "owner": "mbernt",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"adv\": {\"min_contig_length\": \"1000\", \"max_iteration\": \"50\", \"prob_threshold\": \"0.5\"}, \"assembly\": {\"type\": \"individual\", \"__current_case__\": 0, \"inputs\": {\"type\": \"abund\", \"__current_case__\": 1, \"abund\": {\"__class__\": \"ConnectedValue\"}}}, \"contig\": {\"__class__\": \"ConnectedValue\"}, \"output\": {\"plotmarker\": true, \"marker\": true, \"markers\": true, \"log\": true, \"markerset\": \"107\"}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "2.2.7+galaxy6",
+ "type": "tool",
+ "uuid": "c3282b9b-30ec-4e23-9859-5931275f7fdf",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "13": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "errors": null,
+ "id": 13,
+ "input_connections": {
+ "inputs": {
+ "id": 9,
+ "output_name": "output_bins"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "Converts genome bins in fasta format",
+ "outputs": [
+ {
+ "name": "contigs2bin",
+ "type": "tabular"
+ }
+ ],
+ "position": {
+ "left": 2805.967292429587,
+ "top": 1922.7528899355289
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "tool_shed_repository": {
+ "changeset_revision": "fb2bed0eb02f",
+ "name": "fasta_to_contig2bin",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.7+galaxy1",
+ "type": "tool",
+ "uuid": "9a7aa4f6-b60b-4415-a1a4-73cf829ec2c3",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "14": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_merge_cut_up_clustering/concoct_merge_cut_up_clustering/1.1.0+galaxy2",
+ "errors": null,
+ "id": 14,
+ "input_connections": {
+ "cutup_clustering_result": {
+ "id": 10,
+ "output_name": "output_clustering"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "CONCOCT: Merge cut clusters",
+ "outputs": [
+ {
+ "name": "output",
+ "type": "csv"
+ }
+ ],
+ "position": {
+ "left": 1812.6886863992684,
+ "top": 330.66525053302036
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_merge_cut_up_clustering/concoct_merge_cut_up_clustering/1.1.0+galaxy2",
+ "tool_shed_repository": {
+ "changeset_revision": "20ccec4a2c38",
+ "name": "concoct_merge_cut_up_clustering",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"cutup_clustering_result\": {\"__class__\": \"ConnectedValue\"}, \"cutup_clustering_result|__identifier__\": \"ERR2231567.fastqsanger\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.0+galaxy2",
+ "type": "tool",
+ "uuid": "d7c52e93-16f4-4071-b47a-e6a0c3feaf6e",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "15": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "errors": null,
+ "id": 15,
+ "input_connections": {
+ "inputs": {
+ "id": 11,
+ "output_name": "bins"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "Converts genome bins in fasta format",
+ "outputs": [
+ {
+ "name": "contigs2bin",
+ "type": "tabular"
+ }
+ ],
+ "position": {
+ "left": 2792.8906885054666,
+ "top": 1238.8
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "tool_shed_repository": {
+ "changeset_revision": "fb2bed0eb02f",
+ "name": "fasta_to_contig2bin",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.7+galaxy1",
+ "type": "tool",
+ "uuid": "aca0d099-2011-42db-9cac-d0eed8750ec8",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "16": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "errors": null,
+ "id": 16,
+ "input_connections": {
+ "inputs": {
+ "id": 12,
+ "output_name": "bins"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "Converts genome bins in fasta format",
+ "outputs": [
+ {
+ "name": "contigs2bin",
+ "type": "tabular"
+ }
+ ],
+ "position": {
+ "left": 2766.4841273841676,
+ "top": 1687.649001081842
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "tool_shed_repository": {
+ "changeset_revision": "fb2bed0eb02f",
+ "name": "fasta_to_contig2bin",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.7+galaxy1",
+ "type": "tool",
+ "uuid": "e7e5c640-78f8-4af4-b589-75cea9f5a7fa",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "17": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_extract_fasta_bins/concoct_extract_fasta_bins/1.1.0+galaxy2",
+ "errors": null,
+ "id": 17,
+ "input_connections": {
+ "cluster_file": {
+ "id": 14,
+ "output_name": "output"
+ },
+ "fasta_file": {
+ "id": 2,
+ "output_name": "output"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "CONCOCT: Extract a fasta file",
+ "outputs": [
+ {
+ "name": "bins",
+ "type": "input"
+ }
+ ],
+ "position": {
+ "left": 2144.664315483395,
+ "top": 278.4488665876612
+ },
+ "post_job_actions": {
+ "TagDatasetActionbins": {
+ "action_arguments": {
+ "tags": "sample-bins"
+ },
+ "action_type": "TagDatasetAction",
+ "output_name": "bins"
+ }
+ },
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/concoct_extract_fasta_bins/concoct_extract_fasta_bins/1.1.0+galaxy2",
+ "tool_shed_repository": {
+ "changeset_revision": "8b1b09fcd8b7",
+ "name": "concoct_extract_fasta_bins",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"cluster_file\": {\"__class__\": \"ConnectedValue\"}, \"cluster_file|__identifier__\": \"ERR2231567.fastqsanger\", \"fasta_file\": {\"__class__\": \"ConnectedValue\"}, \"fasta_file|__identifier__\": \"ERR2231567.fastqsanger\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.0+galaxy2",
+ "type": "tool",
+ "uuid": "5e80f512-bda6-49a4-9294-d80a12c4209b",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "18": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "errors": null,
+ "id": 18,
+ "input_connections": {
+ "inputs": {
+ "id": 17,
+ "output_name": "bins"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "Converts genome bins in fasta format",
+ "outputs": [
+ {
+ "name": "contigs2bin",
+ "type": "tabular"
+ }
+ ],
+ "position": {
+ "left": 2801.2970664511176,
+ "top": 1485.5000932902894
+ },
+ "post_job_actions": {},
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_to_contig2bin/Fasta_to_Contig2Bin/1.1.7+galaxy1",
+ "tool_shed_repository": {
+ "changeset_revision": "fb2bed0eb02f",
+ "name": "fasta_to_contig2bin",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"inputs\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.1.7+galaxy1",
+ "type": "tool",
+ "uuid": "024263f9-774c-46f0-9ed1-7dbc733a3e45",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "19": {
+ "annotation": "",
+ "content_id": "__BUILD_LIST__",
+ "errors": null,
+ "id": 19,
+ "input_connections": {
+ "datasets_0|input": {
+ "id": 18,
+ "output_name": "contigs2bin"
+ },
+ "datasets_1|input": {
+ "id": 15,
+ "output_name": "contigs2bin"
+ },
+ "datasets_2|input": {
+ "id": 16,
+ "output_name": "contigs2bin"
+ },
+ "datasets_3|input": {
+ "id": 13,
+ "output_name": "contigs2bin"
+ }
+ },
+ "inputs": [],
+ "label": null,
+ "name": "Build list",
+ "outputs": [
+ {
+ "name": "output",
+ "type": "input"
+ }
+ ],
+ "position": {
+ "left": 3142.8906885054666,
+ "top": 1608.8
+ },
+ "post_job_actions": {},
+ "tool_id": "__BUILD_LIST__",
+ "tool_state": "{\"datasets\": [{\"__index__\": 0, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}, {\"__index__\": 1, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}, {\"__index__\": 2, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}, {\"__index__\": 3, \"input\": {\"__class__\": \"ConnectedValue\"}, \"id_cond\": {\"id_select\": \"idx\", \"__current_case__\": 0}}], \"__page__\": 0, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.2.0",
+ "type": "tool",
+ "uuid": "111a0da7-0ea8-488c-bcc1-cd40bffdd03f",
+ "when": null,
+ "workflow_outputs": []
+ },
+ "20": {
+ "annotation": "",
+ "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/binette/binette/1.2.0+galaxy0",
+ "errors": null,
+ "id": 20,
+ "input_connections": {
+ "contig2bin_tables": {
+ "id": 19,
+ "output_name": "output"
+ },
+ "contigs": {
+ "id": 2,
+ "output_name": "output"
+ }
+ },
+ "inputs": [
+ {
+ "description": "runtime parameter for tool Binette",
+ "name": "proteins"
+ }
+ ],
+ "label": null,
+ "name": "Binette",
+ "outputs": [
+ {
+ "name": "bins",
+ "type": "input"
+ },
+ {
+ "name": "quality",
+ "type": "input"
+ },
+ {
+ "name": "final",
+ "type": "tabular"
+ }
+ ],
+ "position": {
+ "left": 3160.031508556423,
+ "top": 2104.2689073130805
+ },
+ "post_job_actions": {
+ "TagDatasetActionbins": {
+ "action_arguments": {
+ "tags": "refined-sample-bins"
+ },
+ "action_type": "TagDatasetAction",
+ "output_name": "bins"
+ }
+ },
+ "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/binette/binette/1.2.0+galaxy0",
+ "tool_shed_repository": {
+ "changeset_revision": "37ab2cfedac4",
+ "name": "binette",
+ "owner": "iuc",
+ "tool_shed": "toolshed.g2.bx.psu.edu"
+ },
+ "tool_state": "{\"contamination_weight\": {\"__class__\": \"ConnectedValue\"}, \"contig2bin_tables\": {\"__class__\": \"ConnectedValue\"}, \"contigs\": {\"__class__\": \"ConnectedValue\"}, \"database_type\": {\"is_select\": \"cached\", \"__current_case__\": 1, \"datamanager\": \"1.0.2\"}, \"min_completeness\": {\"__class__\": \"ConnectedValue\"}, \"proteins\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": 0, \"__rerun_remap_job_id__\": null}",
+ "tool_version": "1.2.0+galaxy0",
+ "type": "tool",
+ "uuid": "c2645ba2-7568-43ab-8eb0-4095fb6a4f45",
+ "when": null,
+ "workflow_outputs": []
+ }
+ },
+ "tags": ["microbiome", "microgalaxy", "binning"],
+ "uuid": "b949eaf7-bf7c-4284-b0b8-fb7a94737de1",
+ "version": 6
+}
\ No newline at end of file
diff --git a/topics/microbiome/tutorials/metagenomics-binning/workflows/index.md b/topics/microbiome/tutorials/metagenomics-binning/workflows/index.md
new file mode 100644
index 00000000000000..e092e0ae66ddd4
--- /dev/null
+++ b/topics/microbiome/tutorials/metagenomics-binning/workflows/index.md
@@ -0,0 +1,3 @@
+---
+layout: workflow-list
+---