Skip to content

Conversation

@paulzierep
Copy link
Collaborator

@paulzierep paulzierep commented Nov 18, 2025

This will be a major update adding 3 binners, mapping and bin refinement,
I probably also need to update the test data, keeping it smaller.

Takes over this one: #6409
Thanks a lot for the start @vinisalazar.

@paulzierep
Copy link
Collaborator Author

@bebatut this is ready for review ! One thing to add is the note to look into the MAG full tutorial for some stuff, but I will add this once yours is ready !

Copy link
Member

@bebatut bebatut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @paulzierep and @vinisalazar for this update.
I added suggestions, mostly text clarification/correction and sometimes some reorganization between text and hands-on


Each of these methods has its strengths and limitations, and the choice of binning method depends on the specific characteristics of the metagenomic data set and the research question being addressed.

## Binning challanges
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Binning challanges
## Binning challenges

Does using more binners always improve results? In practice, one must also consider computational resources and time constraints. Running many binners can be very time-consuming and resource-intensive, especially for large studies. In some cases, adding extra binners does not lead to a meaningful increase in bin quality, so the choice of binners should be made carefully. Overall, identifying the optimal combination of binners remains an active area of research, and clear, widely accepted guidelines are still being established.

For an in-depth analysis of the structure and functions of the coffee microbiome, a temporal shotgun metagenomic study (six time points) was performed. The six samples have been sequenced with Illumina MiSeq utilizing whole genome sequencing.
# Mock binning dataset for this training
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Mock binning dataset for this training
## Mock binning dataset for this training

# Mock binning dataset for this training

Based on the 6 original dataset of the coffee fermentation system, we generated mock datasets for this tutorial.
Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases.
Read mapping and binning real metagenomic datasets is a computationally demanding and time-consuming task. To demonstrate the basics of binning in this tutorial, we generated a small mock dataset that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real-life datasets, but as said, plan in some time, up to weeks in some cases.

---


Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.
Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolating or cultivating individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.


Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.

The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins.
The goal of metagenomics binning is to assign DNA sequences to the organisms or taxonomic groups from which they originate, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins.


In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.

## Bin contigs using MaxBin2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Bin contigs using MaxBin2

@@ -0,0 +1,50 @@
## MaxBin2

In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.
**MaxBin2** ({% cite maxbin2015 %}) is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.

In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.

## Bin contigs using MaxBin2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The first step when using tools like MetaBAT or MaxBin2 is to compute contig depths from the raw alignment data. Both tools require per-contig depth tables as input, as their binning algorithms rely on summarized coverage statistics at the contig level. However, standard BAM files store read-level alignment information, which must first be processed to generate the necessary contig-level coverage data. This preprocessing step ensures compatibility with the input requirements of these binning tools.

Comment on lines +14 to +19
> > <comment-title> Why not use bam directly </comment-title>
> >
> > MetaBAT and MaxBin2 only accept per-contig depth tables because that is the specific input format their binning algorithm requires.
> > BAM files contain read-level alignment data.
> > These binners need summarized, contig-level coverage statistics.
> {: .comment}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> > <comment-title> Why not use bam directly </comment-title>
> >
> > MetaBAT and MaxBin2 only accept per-contig depth tables because that is the specific input format their binning algorithm requires.
> > BAM files contain read-level alignment data.
> > These binners need summarized, contig-level coverage statistics.
> {: .comment}

> {: .comment}
>
{: .hands_on}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can now launch the proper binning with MaxBin2

Copy link
Member

@bebatut bebatut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @paulzierep and @vinisalazar for this update.
I added suggestions, mostly text clarification/correction and sometimes some reorganization between text and hands-on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants