Binning tutorial update #6486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

paulzierep wants to merge 18 commits into galaxyproject:main from paulzierep:binning-tutorial

+1,910 −179

Collaborator

paulzierep commented Nov 18, 2025 •

edited

Loading

This will be a major update adding 3 binners, mapping and bin refinement,
I probably also need to update the test data, keeping it smaller.

Takes over this one: #6409
Thanks a lot for the start @vinisalazar.

vinisalazar and others added 7 commits

October 9, 2025 16:17


          Update CONTRIBUTORS.md

703df11

  - Add vinisalazar


          tutorials/metagenomic-binning: small improvements

7f4c380

  - Create images directory
  - Point readers to metagenomics-assembly dir as prerequisite


          Add vamb to binning tools

4b9afb2


          Fix references to MetaBAT2

  - References were incorrectly pointing to MEGAHIT


          Add requirements for metagenomics binning tutorial

3c4c3db

Code review from @shiltemann


          add 4 binners, add cyoa, add mappng and bam sorting

b90fd14


          Merge branch 'galaxyproject:main' into binning-tutorial

2cea19e

paulzierep requested review from bebatut and shiltemann as code owners

November 18, 2025 15:11


          add vinisalazar

03b7207

github-actions bot added template-and-tools microbiome labels

paulzierep added 7 commits

November 18, 2025 16:38


          fix linting

1c3203d


          Merge branch 'binning-tutorial' of github.com:paulzierep/training-mat…

c49508a

…erial into binning-tutorial


          major update

816f932


          fix box

06cafc6


          add combine, add workflow, add wf test, add result question

8d7de5c


          linting fix

4c264b5


          add wf tags

e2a9e0b

Collaborator Author

paulzierep commented Nov 21, 2025

@bebatut this is ready for review ! One thing to add is the note to look into the MAG full tutorial for some stuff, but I will add this once yours is ready !

paulzierep added 3 commits

November 21, 2025 12:55


          Merge branch 'main' into binning-tutorial

fb76650


          add topic

aa5218e


          Merge branch 'binning-tutorial' of github.com:paulzierep/training-mat…

07a655e

…erial into binning-tutorial

bebatut reviewed

View reviewed changes

Member

bebatut left a comment

Thanks a lot @paulzierep and @vinisalazar for this update.
I added suggestions, mostly text clarification/correction and sometimes some reorganization between text and hands-on

topics/microbiome/tutorials/metagenomics-binning/tutorial.md

    
              Each of these methods has its strengths and limitations, and the choice of binning method depends on the specific characteristics of the metagenomic data set and the research question being addressed.

              ## Binning challanges

Member

bebatut Nov 21, 2025

Suggested change

      
            ## Binning challanges
          
            ## Binning challenges

topics/microbiome/tutorials/metagenomics-binning/tutorial.md

    
              Does using more binners always improve results? In practice, one must also consider computational resources and time constraints. Running many binners can be very time-consuming and resource-intensive, especially for large studies. In some cases, adding extra binners does not lead to a meaningful increase in bin quality, so the choice of binners should be made carefully. Overall, identifying the optimal combination of binners remains an active area of research, and clear, widely accepted guidelines are still being established.

              For an in-depth analysis of the structure and functions of the coffee microbiome, a temporal shotgun metagenomic study (six time points) was performed. The six samples have been sequenced with Illumina MiSeq utilizing whole genome sequencing.

              # Mock binning dataset for this training

Member

bebatut Nov 21, 2025

Suggested change

      
            # Mock binning dataset for this training
          
            ## Mock binning dataset for this training

topics/microbiome/tutorials/metagenomics-binning/tutorial.md

    
              # Mock binning dataset for this training

              Based on the 6 original dataset of the coffee fermentation system, we generated mock datasets for this tutorial.

              Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases.

Member

bebatut Nov 21, 2025

Suggested change

      
            Read mapping and binning real metagenommic datasets is a computational demanding task and time consuming. To demonstrate the basics of binning in this tutorial we generated a small mock dataset, that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real life datasets, but as said, plan in some time, up to weeks in some cases.
          
            Read mapping and binning real metagenomic datasets is a computationally demanding and time-consuming task. To demonstrate the basics of binning in this tutorial, we generated a small mock dataset that is just large enough to produce bins for all binners in this tutorial. The same binners can be applied for any real-life datasets, but as said, plan in some time, up to weeks in some cases.

topics/microbiome/tutorials/metagenomics-binning/tutorial.md

    
              ---

              Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.

Member

bebatut Nov 21, 2025

Suggested change

      
            Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.
          
            Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolating or cultivating individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.

topics/microbiome/tutorials/metagenomics-binning/tutorial.md

    
              Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or gut contents, without the need for isolation or cultivation of individual organisms. Metagenomics binning is a process used to classify DNA sequences obtained from metagenomic sequencing into discrete groups, or bins, based on their similarity to each other.

              The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins.

Member

bebatut Nov 21, 2025

Suggested change

      
            The goal of metagenomics binning is to assign the DNA sequences to the organisms or taxonomic groups that they originate from, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins.
          
            The goal of metagenomics binning is to assign DNA sequences to the organisms or taxonomic groups from which they originate, allowing for a better understanding of the diversity and functions of the microbial communities present in the sample. This is typically achieved through computational methods that include sequence similarity, composition, and other features to group the sequences into bins.

topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md

    
              In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.

              ## Bin contigs using MaxBin2

Member

bebatut Nov 21, 2025

Suggested change

## Bin contigs using MaxBin2

topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md

    
            @@ -0,0 +1,50 @@
          
              ## MaxBin2

              In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.

Member

bebatut Nov 21, 2025

Suggested change

      
            In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.
          
            **MaxBin2** ({% cite maxbin2015 %}) is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.

topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md

    
              In this tutorial version we will learn how to use MaxBin2 {%cite maxbin2015%} through Galaxy. MaxBin2 is an automated metagenomic binning tool that uses an Expectation-Maximization algorithm to group contigs into genome bins based on abundance, tetranucleotide frequency, and single-copy marker genes.

              ## Bin contigs using MaxBin2

Member

bebatut Nov 21, 2025

Suggested change

      
            The first step when using tools like MetaBAT or MaxBin2 is to compute contig depths from the raw alignment data. Both tools require per-contig depth tables as input, as their binning algorithms rely on summarized coverage statistics at the contig level. However, standard BAM files store read-level alignment information, which must first be processed to generate the necessary contig-level coverage data. This preprocessing step ensures compatibility with the input requirements of these binning tools.

topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md

Comment on lines +14 to +19

    
              >    > <comment-title> Why not use bam directly </comment-title>

              >    >

              >    > MetaBAT and MaxBin2 only accept per-contig depth tables because that is the specific input format their binning algorithm requires.

              >    > BAM files contain read-level alignment data.

              >    > These binners need summarized, contig-level coverage statistics.

              >    {: .comment}

Member

bebatut Nov 21, 2025

Suggested change

      
            >    > <comment-title> Why not use bam directly </comment-title>
          
            >    >
          
            >    > MetaBAT and MaxBin2 only accept per-contig depth tables because that is the specific input format their binning algorithm requires.
          
            >    > BAM files contain read-level alignment data.
          
            >    > These binners need summarized, contig-level coverage statistics.
          
            >    {: .comment}

topics/microbiome/tutorials/metagenomics-binning/maxbin2_version.md

    
              >    {: .comment}

              >

              {: .hands_on}

Member

bebatut Nov 21, 2025

Suggested change


	We can now launch the proper binning with MaxBin2

bebatut reviewed

View reviewed changes

Member

bebatut left a comment

Thanks a lot @paulzierep and @vinisalazar for this update.
I added suggestions, mostly text clarification/correction and sometimes some reorganization between text and hands-on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

microbiome template-and-tools