-
Notifications
You must be signed in to change notification settings - Fork 1k
Binning tutorial update #6486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Binning tutorial update #6486
Conversation
- Add vinisalazar
- Create images directory - Point readers to metagenomics-assembly dir as prerequisite
- References were incorrectly pointing to MEGAHIT
Code review from @shiltemann
|
@bebatut this is ready for review ! One thing to add is the note to look into the MAG full tutorial for some stuff, but I will add this once yours is ready ! |
bebatut
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @paulzierep and @vinisalazar for this update.
I added suggestions, mostly text clarification/correction and sometimes some reorganization between text and hands-on
|
|
||
| Each of these methods has its strengths and limitations, and the choice of binning method depends on the specific characteristics of the metagenomic data set and the research question being addressed. | ||
|
|
||
| ## Binning challanges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## Binning challanges | |
| ## Binning challenges |
| Does using more binners always improve results? In practice, one must also consider computational resources and time constraints. Running many binners can be very time-consuming and resource-intensive, especially for large studies. In some cases, adding extra binners does not lead to a meaningful increase in bin quality, so the choice of binners should be made carefully. Overall, identifying the optimal combination of binners remains an active area of research, and clear, widely accepted guidelines are still being established. | ||
|
|
||
| For an in-depth analysis of the structure and functions of the coffee microbiome, a temporal shotgun metagenomic study (six time points) was performed. The six samples have been sequenced with Illumina MiSeq utilizing whole genome sequencing. | ||
| # Mock binning dataset for this training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Mock binning dataset for this training | |
| ## Mock binning dataset for this training |
| # Mock binning dataset for this training | ||
|
|
||
| Based on the 6 original dataset of the coffee fermentation system, we generated mock datasets for this tutorial. | ||
| Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases. | |
| Read mapping and binning real metagenomic datasets is a computationally demanding and time-consuming task. To demonstrate the basics of binning in this tutorial, we generated a small mock dataset that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real-life datasets, but as said, plan in some time, up to weeks in some cases. |
| --- | ||
|
|
||
|
|
||
| Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other. | |
| Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolating or cultivating individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other. |
|
|
||
| Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other. | ||
|
|
||
| The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins. | |
| The goal of metagenomics binning is to assign DNA sequences to the organisms or taxonomic groups from which they originate, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins. |
|
|
||
| In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes. | ||
|
|
||
| ## Bin contigs using MaxBin2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## Bin contigs using MaxBin2 |
| @@ -0,0 +1,50 @@ | |||
| ## MaxBin2 | |||
|
|
|||
| In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes. | |
| **MaxBin2** ({% cite maxbin2015 %}) is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes. |
| In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes. | ||
|
|
||
| ## Bin contigs using MaxBin2 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The first step when using tools like MetaBAT or MaxBin2 is to compute contig depths from the raw alignment data. Both tools require per-contig depth tables as input, as their binning algorithms rely on summarized coverage statistics at the contig level. However, standard BAM files store read-level alignment information, which must first be processed to generate the necessary contig-level coverage data. This preprocessing step ensures compatibility with the input requirements of these binning tools. |
| > > <comment-title> Why not use bam directly </comment-title> | ||
| > > | ||
| > > MetaBAT and MaxBin2 only accept per-contig depth tables because that is the specific input format their binning algorithm requires. | ||
| > > BAM files contain read-level alignment data. | ||
| > > These binners need summarized, contig-level coverage statistics. | ||
| > {: .comment} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > > <comment-title> Why not use bam directly </comment-title> | |
| > > | |
| > > MetaBAT and MaxBin2 only accept per-contig depth tables because that is the specific input format their binning algorithm requires. | |
| > > BAM files contain read-level alignment data. | |
| > > These binners need summarized, contig-level coverage statistics. | |
| > {: .comment} |
| > {: .comment} | ||
| > | ||
| {: .hands_on} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| We can now launch the proper binning with MaxBin2 |
bebatut
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @paulzierep and @vinisalazar for this update.
I added suggestions, mostly text clarification/correction and sometimes some reorganization between text and hands-on
This will be a major update adding 3 binners, mapping and bin refinement,
I probably also need to update the test data, keeping it smaller.
Takes over this one: #6409
Thanks a lot for the start @vinisalazar.