From 703df1122671d2d6ca3fdd90ee7ee7e966bbc840 Mon Sep 17 00:00:00 2001 From: Vini Salazar <17276653+vinisalazar@users.noreply.github.com> Date: Thu, 9 Oct 2025 14:29:42 +1100 Subject: [PATCH 1/6] Update CONTRIBUTORS.md - Add vinisalazar --- CONTRIBUTORS.yaml | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/CONTRIBUTORS.yaml b/CONTRIBUTORS.yaml index 42cecf54591a0f..8133d37cad160d 100644 --- a/CONTRIBUTORS.yaml +++ b/CONTRIBUTORS.yaml @@ -3055,6 +3055,14 @@ VerenaMoo: name: Verena Moosmann joined: 2024-12 +vinisalazar: + name: Vini Salazar + joined: 2025-10 + orcid: 0000-0002-8362-3195 + affiliations: + - unimelb + - melbournebioinformatics + vivekbhr: name: Vivek Bhardwaj joined: 2017-09 From 7f4c380fd5a6e2f16a63d5581707a56cd8dc6cfe Mon Sep 17 00:00:00 2001 From: Vini Salazar <17276653+vinisalazar@users.noreply.github.com> Date: Thu, 9 Oct 2025 17:43:20 +1100 Subject: [PATCH 2/6] tutorials/metagenomic-binning: small improvements - Create images directory - Point readers to metagenomics-assembly dir as prerequisite --- .../{ => images}/binning.png | Bin .../metagenomics-binning/tutorial.md | 52 ++++++++---------- 2 files changed, 22 insertions(+), 30 deletions(-) rename topics/microbiome/tutorials/metagenomics-binning/{ => images}/binning.png (100%) diff --git a/topics/microbiome/tutorials/metagenomics-binning/binning.png b/topics/microbiome/tutorials/metagenomics-binning/images/binning.png similarity index 100% rename from topics/microbiome/tutorials/metagenomics-binning/binning.png rename to topics/microbiome/tutorials/metagenomics-binning/images/binning.png diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md index d79801cc5ac05b..8792894b2b15fb 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md @@ -4,17 +4,16 @@ title: Binning of metagenomic sequencing data zenodo_link: https://zenodo.org/record/7818827 extra: zenodo_link_results: https://zenodo.org/record/7845138 -level: Introductory +level: Intermediate questions: - What is metagenomic binning refers to? -- Which tools should be used for metagenomic binning? -- How to assess the quality of metagenomic data binning? +- Which tools may be used for metagenomic binning? +- How to assess the quality of metagenomic binning? objectives: -- Describe what metagenomics binning is -- Describe common problems in metagenomics binning -- What software tools are available for metagenomics binning -- Binning of contigs into metagenome-assembled genomes (MAGs) using MetaBAT 2 software -- Evaluation of MAG quality and completeness using CheckM software +- Describe what is metagenomics binning. +- Describe common challenges in metagenomics binning. +- Perform metagenomic binning using MetaBAT 2 software. +- Evaluation of MAG quality and completeness using CheckM software. time_estimation: 2H key_points: - Metagenomics binning is a computational approach to grouping together DNA sequences @@ -56,11 +55,14 @@ recordings: --- - Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other. The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins. +> +> Before starting this tutorial, it is recommended to do the [**Metagenomics Assembly Tutorial**]({% link topics/microbiome/tutorials/metagenomics-assembly/tutorial.md %}) +{: .comment} + There are several approaches to metagenomics binning, including: - **Sequence composition-based binning**: This method is based on the observation that different genomes have distinct sequence composition patterns, such as GC content or codon usage bias. By analyzing these patterns in metagenomic data, sequence fragments can be assigned to individual genomes or groups of genomes. @@ -76,7 +78,7 @@ There are several approaches to metagenomics binning, including: Each of these methods has its strengths and limitations, and the choice of binning method depends on the specific characteristics of the metagenomic data set and the research question being addressed. -**Metagenomics binning is a complex process that involves many steps and can be challenging due to several problems that can occur during the process**. Some of the most common problems encountered in metagenomics binning include: +**Metagenomic binning is a complex process that involves many steps and can be challenging due to several problems that can occur during the process**. Some of the most common problems encountered in metagenomic binning include: - **High complexity**: Metagenomic samples contain DNA from multiple organisms, which can lead to high complexity in the data. - **Fragmented sequences**: Metagenomic sequencing often generates fragmented sequences, which can make it difficult to assign reads to the correct bin. @@ -99,7 +101,7 @@ There are plenty of computational tools to perform metafenomics binning. Some of A benchmark study of metagenomics software can be found at {%cite Sczyrba2017%}. MetaBAT 2 outperforms previous MetaBAT and other alternatives in both accuracy and computational efficiency . All are based on default parameters ({%cite Sczyrba2017%}). -**In this tutorial, we will learn how to run metagenomic binning tools and evaluate the quality of the results**. In order to do that, we will use data from the study: [Temporal shotgun metagenomic dissection of the coffee fermentation ecosystem](https://www.ebi.ac.uk/metagenomics/studies/MGYS00005630#overview) and MetaBAT 2 algorithm. MetaBAT is a popular software tool for metagenomics binning, and there are several reasons why it is often used: +**In this tutorial, we will learn how to run metagenomic binning tools and evaluate the quality of the results**. In order to do that, we will use data from the study: [Temporal shotgun metagenomic dissection of the coffee fermentation ecosystem](https://www.ebi.ac.uk/metagenomics/studies/MGYS00005630#overview) and the MetaBAT 2 algorithm. MetaBAT is a popular software tool for metagenomics binning, and there are several reasons why it is often used: - *High accuracy*: MetaBAT uses a combination of tetranucleotide frequency, coverage depth, and read linkage information to bin contigs, which has been shown to be highly accurate and efficient. - *Easy to use*: MetaBAT has a user-friendly interface and can be run on a standard desktop computer, making it accessible to a wide range of researchers with varying levels of computational expertise. - *Flexibility*: MetaBAT can be used with a variety of sequencing technologies, including Illumina, PacBio, and Nanopore, and can be applied to both microbial and viral metagenomes. @@ -186,7 +188,7 @@ As explained before, there are many challenges to metagenomics binning. The most - Chimeric sequences. - Strain variation. -![Image show the binning process where sequences are grouped together based on genome signatures like the kmer profiles of each contig, contig coverage, or GC content](./binning.png "Binning"){:width="60%"} +![Metagenomic binning involves grouping contigs into 'bins' based on sequence composition, coverage, or other properties.](./images/binning.png "Metagenomic binning involves grouping contigs into 'bins' based on sequence composition, coverage, or other properties."){:width="60%"} In this tutorial we will learn how to use **MetaBAT 2** {%cite Kang2019%} tool through Galaxy. **MetaBAT** stands for "Metagenome Binning based on Abundance and Tetranucleotide frequency". It is: @@ -198,19 +200,9 @@ We will use the uploaded assembled fasta files as input to the algorithm (For si > Individual binning of short-reads with MetaBAT 2 > 1. {% tool [MetaBAT 2](toolshed.g2.bx.psu.edu/repos/iuc/megahit/megahit/1.2.9+galaxy0) %} with parameters: > - *"Fasta file containing contigs"*: `assembly fasta files` -> +> - In **Advanced options**, keep all as **default**. +> - In **Output options:** +> - *"Save cluster memberships as a matrix format?"*: `"Yes"` > {: .hands_on} @@ -249,15 +241,15 @@ These output files can be further analyzed and used for downstream applications > {: .hands_on} {: .comment} -> +> Binning metrics > > 1. How many bins has been for ERR2231567 sample? -> 2. How many sequences are contained in the second bin? +> 2. How many contigs are in the bin with most contigs? What about the one with the least? > > > > > -> > 1. There are 6 bins identified -> > 2. 167 sequences are classified into the second bin. +> > 1. There are 6 bins identified. +> > 2. 7170 in the one with the most contigs, and 140 in the one with the least (these numbers may differ slightly depending on the version of MetaBAT2). > > > {: .solution} > @@ -269,7 +261,7 @@ De-replication is the process of identifying sets of genomes that are the "same" A common use for genome de-replication is the case of individual assembly of metagenomic data. If metagenomic samples are collected in a series, a common way to assemble the short reads is with a “co-assembly”. That is, combining the reads from all samples and assembling them together. The problem with this is assembling similar strains together can severely fragment assemblies, precluding recovery of a good genome bin. An alternative option is to assemble each sample separately, and then “de-replicate” the bins from each assembly to make a final genome set. -![Image shows the process of individual assembly on two strains and five samples, after individual assembly of samples two samples are chosen for de-replication process. In parallel, co-assembly on all five samples is performed](./individual-assembly.png "Individual assembly followed by de-replication vs co-assembly"){:width="80%"} +![Image shows the process of individual assembly on two strains and five samples, after individual assembly of samples two samples are chosen for de-replication process. In parallel, co-assembly on all five samples is performed](./individual-assembly.png "Individual assembly followed by de-replication vs co-assembly."){:width="80%"} MetaBAT 2 does not explicitly perform dereplication in the sense of identifying groups of identical or highly similar genomes in a given dataset. Instead, MetaBAT 2 focuses on improving the accuracy of binning by leveraging various features such as read coverage, differential coverage across samples, and sequence composition. It aims to distinguish between different genomes present in the metagenomic dataset and assign contigs to the appropriate bins. From 4b9afb2b42d6af584f0d843d2e945d39701885e2 Mon Sep 17 00:00:00 2001 From: Vini Salazar <17276653+vinisalazar@users.noreply.github.com> Date: Thu, 9 Oct 2025 17:51:39 +1100 Subject: [PATCH 3/6] Add vamb to binning tools --- .../tutorials/metagenomics-binning/tutorial.bib | 11 +++++++++++ .../tutorials/metagenomics-binning/tutorial.md | 9 ++++++--- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib b/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib index 18aac5e68c40ef..a70fc4afc6cd6c 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.bib @@ -1,3 +1,14 @@ +@article{nissen2021improved, + title={Improved metagenome binning and assembly using deep variational autoencoders}, + author={Nissen, Jakob Nybo and Johansen, Joachim and Alles{\o}e, Rosa Lundbye and S{\o}nderby, Casper Kaae and Armenteros, Jose Juan Almagro and Gr{\o}nbech, Christopher Heje and Jensen, Lars Juhl and Nielsen, Henrik Bj{\o}rn and Petersen, Thomas Nordahl and Winther, Ole and others}, + journal={Nature biotechnology}, + volume={39}, + number={5}, + pages={555--560}, + year={2021}, + publisher={Nature Publishing Group US New York} +} + @article{maxbin2015, author = {Wu, Yu-Wei and Simmons, Blake A. and Singer, Steven W.}, title = "{MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets}", diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md index 8792894b2b15fb..ba9bcbac4560d4 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md @@ -88,16 +88,19 @@ Each of these methods has its strengths and limitations, and the choice of binni - **Chimeric sequences**: Sequences that are the result of sequencing errors or contamination can lead to chimeric sequences, which can make it difficult to accurately bin reads. - **Strain variation**: Organisms within a species can exhibit significant genetic variation, which can make it difficult to distinguish between different strains in a metagenomic sample. -There are plenty of computational tools to perform metafenomics binning. Some of the most widely used include: +There are plenty of algorithms that perform metagenomic binning. Some of the most widely used include: - **MaxBin** ({%cite maxbin2015%}): A popular de novo binning algorithm that uses a combination of sequence features and marker genes to cluster contigs into genome bins. - **MetaBAT** ({%cite Kang2019%}): Another widely used de novo binning algorithm that employs a hierarchical clustering approach based on tetranucleotide frequency and coverage information. - **CONCOCT** ({%cite Alneberg2014%}): A de novo binning tool that uses a clustering algorithm based on sequence composition and coverage information to group contigs into genome bins. - **MyCC** ({%cite Lin2016%}): A reference-based binning tool that uses sequence alignment to identify contigs belonging to the same genome or taxonomic group. - **GroopM** ({%cite Imelfort2014%}): A hybrid binning tool that combines reference-based and de novo approaches to achieve high binning accuracy. -- **MetaWRAP** ({%cite Uritskiy2018%}): A comprehensive metagenomic analysis pipeline that includes various modules for quality control, assembly, binning, and annotation. -- **Anvi'o** ({%cite Eren2015%}): A platform for visualizing and analyzing metagenomic data, including features for binning, annotation, and comparative genomics. - **SemiBin** ({%cite Pan2022%}): A command tool for metagenomic binning with deep learning, handles both short and long reads. +- **Vamb** ({%cite nissen2021improved%}): An algorithm that uses variational autoencoders (VAEs) to encode sequence composition and coverage information. + +Other tools also include: +- **MetaWRAP** ({%cite Uritskiy2018%}): A comprehensive metagenomic analysis pipeline that includes various modules for quality control, assembly, binning, and annotation. +- **Anvi'o** ({%cite Eren2015%}): A platform for visualizing and analyzing metagenomic data, including features for binning, annotation, and comparative genomics. Uses CONCOCT as the default binning backend. A benchmark study of metagenomics software can be found at {%cite Sczyrba2017%}. MetaBAT 2 outperforms previous MetaBAT and other alternatives in both accuracy and computational efficiency . All are based on default parameters ({%cite Sczyrba2017%}). From 95948671063c854f29805ebe704b58be8ac53d71 Mon Sep 17 00:00:00 2001 From: Vini Salazar <17276653+vinisalazar@users.noreply.github.com> Date: Fri, 10 Oct 2025 14:32:22 +1100 Subject: [PATCH 4/6] Fix references to MetaBAT2 - References were incorrectly pointing to MEGAHIT --- topics/microbiome/tutorials/metagenomics-binning/tutorial.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md index ba9bcbac4560d4..19a72eccc7be06 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md @@ -201,7 +201,7 @@ In this tutorial we will learn how to use **MetaBAT 2** {%cite Kang2019%} tool t We will use the uploaded assembled fasta files as input to the algorithm (For simplicity reasons all other parameters will be preserved with their default values). > Individual binning of short-reads with MetaBAT 2 -> 1. {% tool [MetaBAT 2](toolshed.g2.bx.psu.edu/repos/iuc/megahit/megahit/1.2.9+galaxy0) %} with parameters: +> 1. {% tool [MetaBAT 2](https://toolshed.g2.bx.psu.edu/view/iuc/metabat2/01f02c5ddff8) %} with parameters: > - *"Fasta file containing contigs"*: `assembly fasta files` > - In **Advanced options**, keep all as **default**. > - In **Output options:** @@ -239,7 +239,7 @@ These output files can be further analyzed and used for downstream applications > > ``` > > > > -> > 2. Create a collection named `MEGAHIT Contig`, rename your pairs with the sample name +> > 2. Create a collection named `MetaBAT2 Bins` and add the zip files to it. > > > {: .hands_on} {: .comment} From 3c4c3dbaebc4ca50738678d60a533630fc296692 Mon Sep 17 00:00:00 2001 From: Vini Salazar <17276653+vinisalazar@users.noreply.github.com> Date: Wed, 22 Oct 2025 12:46:26 +1100 Subject: [PATCH 5/6] Add requirements for metagenomics binning tutorial Code review from @shiltemann --- topics/microbiome/tutorials/metagenomics-binning/tutorial.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md index 19a72eccc7be06..4d82c9a82a5046 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md @@ -31,6 +31,11 @@ contributions: authorship: - npechl - fpsom +requirements: + - type: internal + topic: metagenomics + tutorials: + - metagenomics-assembly subtopic: metagenomics tags: - binning From 5806cfcf4e116c3570b279d08b6e5fd45e130e69 Mon Sep 17 00:00:00 2001 From: paulzierep Date: Mon, 17 Nov 2025 16:55:13 +0100 Subject: [PATCH 6/6] Update topics/microbiome/tutorials/metagenomics-binning/tutorial.md --- topics/microbiome/tutorials/metagenomics-binning/tutorial.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md index 4d82c9a82a5046..f80b0b9bf54972 100644 --- a/topics/microbiome/tutorials/metagenomics-binning/tutorial.md +++ b/topics/microbiome/tutorials/metagenomics-binning/tutorial.md @@ -31,6 +31,8 @@ contributions: authorship: - npechl - fpsom + - vinisalazar + - paulzierep requirements: - type: internal topic: metagenomics