Update metagenomics assembly tutorial #6410

vinisalazar · 2025-10-10T01:22:47Z

Supersedes #6408

Work for the FAIRyMAGs 2025 hackathon

Task: update assembly tutorial

Summary of changes:

Add new figures
Fix punctuation and phrasing
Improve explanation on individual vs co-assembly
Remove mentions to dereplication and reference binning tutorial (where that is explained) instead

- Remove 'Describe what de-replication is' from objectives; this is in the scope of the binning tutorial

- Add vinisalazar

- Causing jekyll build to fail

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

shiltemann

Thanks @vinisalazar!

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

paulzierep

Very nice, minor updates, will continue review next week

paulzierep · 2025-11-14T13:58:13Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

+- Samples from different patients.
+- Samples from the same site, but over different seasons or under different environmental conditions, eg. a patch of soil before and after a bushfire event, a marine site under upwelling vs. under normal conditions.
+
+If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used:


one can bin per sample - which is mostly done - and then de-replicate later, that avoids chimeric bins, similar to co-assembly, I would rather suggest to de-replicate after binning

paulzierep · 2025-11-14T13:59:16Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

 - Related samples

-If it is not the case, **individual assembly** should be prefered. In this case, an extra step of **de-replication** should be used:
+Examples where co-assembly would be reasonable:


can you add this FAQ when its merged: #6474

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

paulzierep · 2025-11-14T15:18:57Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

 {: .question}

 > <details-title>Co-assembly with MetaSPAdes</details-title>
+> MetaSPAdes supports co-assembly by passing a list of paired-end read files. MEGAHIT, on the other hand, requires concatenating that list of paired-end read files into a single pair of forward and reverse files.


this can now be done with the tool in the faq and megahit supports it anyway as tool parameter

Can you modify the hands-on box below for that? Thanks

paulzierep · 2025-11-14T15:21:16Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

-  It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly.
+  It makes use of the Succinct de Bruijn Graph (SdBG) approach to achieve low memory assembly.

 Both tools are available in Galaxy. But currently, only MEGAHIT can be used in individual mode for several samples.


This is now easy to do with nested collections, can you add this FAQ once its merged: #6476

Where should that be added? Could you do it? Thanks a lot

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

paulzierep · 2025-11-17T15:40:53Z

Thanks a lot for the update, after suggestons and adding the FAQ, its good from my side !

Co-authored-by: paulzierep <[email protected]>

bebatut

Thanks for the update.
@paulzierep I added some extra suggestions but also comments for you

bebatut · 2025-11-20T15:27:50Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

 - Related samples

-If it is not the case, **individual assembly** should be prefered. In this case, an extra step of **de-replication** should be used:
+Examples where co-assembly would be reasonable:


Suggested change

Examples where co-assembly would be reasonable:

{% snippet faqs/galaxy/fastq_groupmerge.md %}

Examples where co-assembly would be reasonable:

bebatut · 2025-11-20T15:29:51Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

+- Samples from different patients.
+- Samples from the same site, but over different seasons or under different environmental conditions, eg. a patch of soil before and after a bushfire event, a marine site under upwelling vs. under normal conditions.
+
+If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used:


Suggested change

If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used:

If samples differ as described, **individual assembly** is preferred. In the case of individual assembly, **contigs should be binned** per sample and an extra step of **de-replication** should be used as binning:

bebatut · 2025-11-20T15:30:37Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

 ![Image shows the process of individual assembly on two strains and five samples, after individual assembly of samples two samples are chosen for de-replication process. In parallel, co-assembly on all five samples is performed](./images/individual-assembly.png "Individual assembly followed by de-replication vs co-assembly. Source: dRep documentation"){:width="80%"}

-Co-assembly is more commonly used than individual assembly and then de-replication after binning. But in this tutorial, to show all steps, we will run an **individual assembly**.
+For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).


Suggested change

For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).

> <comment-title></comment-title>

> For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).

{: .comment}

bebatut · 2025-11-20T15:35:11Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

+>
+> {% snippet faqs/galaxy/datasets_import_via_link.md %}
+>
+{: .hands_on}


Suggested change

{: .hands_on}

> <comment-title></comment-title>

>

> If the QUAST process takes too much time, we can import the results:

>

> > <hands-on-title>Import generated QUAST results</hands-on-title>

> >

> > 1. Import the QUAST report file from [Zenodo]({{ page.zenodo_link }}) or the Shared Data library:

> >

> > ```text

> > {{ page.zenodo_link }}/files/quast_ERR2231567.html

> > {{ page.zenodo_link }}/files/quast_ERR2231568.html

> > {{ page.zenodo_link }}/files/quast_ERR2231569.html

> > {{ page.zenodo_link }}/files/quast_ERR2231570.html

> > {{ page.zenodo_link }}/files/quast_ERR2231571.html

> > {{ page.zenodo_link }}/files/quast_ERR2231572.html

> > ```

> >

> {: .hands_on}

{: .comment}

bebatut · 2025-11-20T15:41:36Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

+> 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters
+>    - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs`
+>        - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads`
+>     - *"Select k-mer detection option"*: `User specific`
+>        - *"K-mer size values"*: `21,33,55,77`


Suggested change

> 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters

> - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs`

> - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads`

> - *"Select k-mer detection option"*: `User specific`

> - *"K-mer size values"*: `21,33,55,77`

> > <hands-on-title>Assembly with MetaSPAdes</hands-on-title>

> > 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters

> > - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs`

> > - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads`

> > - *"Select k-mer detection option"*: `User specific`

> > - *"K-mer size values"*: `21,33,55,77`

> >

> {: .hands_on}

bebatut · 2025-11-20T15:42:04Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

 {: .question}

 > <details-title>Co-assembly with MetaSPAdes</details-title>
+> MetaSPAdes supports co-assembly by passing a list of paired-end read files. MEGAHIT, on the other hand, requires concatenating that list of paired-end read files into a single pair of forward and reverse files.


Can you modify the hands-on box below for that? Thanks

bebatut · 2025-11-20T15:42:23Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

-  It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly.
+  It makes use of the Succinct de Bruijn Graph (SdBG) approach to achieve low memory assembly.

 Both tools are available in Galaxy. But currently, only MEGAHIT can be used in individual mode for several samples.


Where should that be added? Could you do it? Thanks a lot

bebatut · 2025-11-20T15:43:06Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

      A base in the reference genome is counted as aligned if at least one contig has at least one alignment to this base.

-      We did not provide any reference there, but metaQuast try to identify genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
+      We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.


Suggested change

We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.

We did not provide any reference genome, but QUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are downloaded from NCBI to map the assemblies on them and compute the genome fractions.

bebatut · 2025-11-20T15:43:35Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

+      We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
+
+      > <comment-title>Metagenome reference</comment-title>
+      > The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database\*** option to `0`.


Suggested change

> The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database\*** option to `0`.

> The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database*** option to `0`.

bebatut · 2025-11-20T15:44:10Z

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

+> <comment-title>Why not both?</comment-title>
+> It is also possible to run both individual assembly and co-assembly, and this approach can recover MAGs effectively. In this case: individual assembly can recover MAGs with a low amount of contamination, while co-assembly also allows for the recovery of low-abundance MAGs, with the downside of potentially more contamination. Although this approach can be effective, it also requires high computational resources and should be considered carefully.
+>
+> > {% snippet faqs/galaxy/fastq_groupmerge.md %}


Suggested change

> > {% snippet faqs/galaxy/fastq_groupmerge.md %}

> {% snippet faqs/galaxy/fastq_groupmerge.md %}

vinisalazar added 5 commits October 10, 2025 12:12

Edit text on individual vs co-assembly

bd9e5fc

- Remove 'Describe what de-replication is' from objectives; this is in the scope of the binning tutorial

Fix typos and punctuation

5bfa18a

Update CONTRIBUTORS.md

7066101

- Add vinisalazar

Remove runaway quotes

417a6ba

- Causing jekyll build to fail

Merge branch 'main' into FAIRyMAGs-hackathon-Oct-2025

e2b6fe2

vinisalazar requested review from bebatut, paulzierep and shiltemann as code owners October 10, 2025 01:22

github-actions bot added template-and-tools microbiome labels Oct 10, 2025

vinisalazar mentioned this pull request Oct 10, 2025

Update metagenomics-assembly tutorial #6408

Closed

shiltemann reviewed Oct 28, 2025

View reviewed changes

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md Outdated Show resolved Hide resolved

Update topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

514726b

shiltemann reviewed Nov 6, 2025

View reviewed changes

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md Outdated Show resolved Hide resolved

shiltemann added 2 commits November 6, 2025 15:53

Update topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

916b98d

Update topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

8e93ba2

shiltemann reviewed Nov 6, 2025

View reviewed changes

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md Outdated Show resolved Hide resolved

Update topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

99de97a

shiltemann changed the title ~~Update assembly tutorial~~ Update metagenomics assembly tutorial Nov 6, 2025

paulzierep reviewed Nov 14, 2025

View reviewed changes

paulzierep added 3 commits November 17, 2025 10:25

fix link, add authors, fix comment box

79e1c23

add faq, fix typos

fc83ee7

Merge branch 'main' into assembly-tutorial

fbea0b9

paulzierep reviewed Nov 17, 2025

View reviewed changes

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md Outdated Show resolved Hide resolved

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md Outdated Show resolved Hide resolved

paulzierep and others added 2 commits November 17, 2025 16:46

update faq placing

24fb5c9

Apply suggestions from code review

fd862c3

Co-authored-by: paulzierep <[email protected]>

bebatut reviewed Nov 20, 2025

View reviewed changes

	If samples differ like described, individual assembly is preferred. In the case of individual assembly, if contigs are binned after, an extra step of de-replication should be used:
	If samples differ as described, individual assembly is preferred. In the case of individual assembly, contigs should be binned per sample and an extra step of de-replication should be used as binning:

-For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).
+> <comment-title></comment-title>
+> For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).
+{: .comment}

-{: .hands_on}
+> <comment-title></comment-title>
+>
+> If the QUAST process takes too much time, we can import the results:
+>
+> > <hands-on-title>Import generated QUAST results</hands-on-title>
+> >
+> > 1. Import the QUAST report file from [Zenodo]({{ page.zenodo_link }}) or the Shared Data library:
+> >
+> >    ```text
+> >    {{ page.zenodo_link }}/files/quast_ERR2231567.html
+> >    {{ page.zenodo_link }}/files/quast_ERR2231568.html
+> >    {{ page.zenodo_link }}/files/quast_ERR2231569.html
+> >    {{ page.zenodo_link }}/files/quast_ERR2231570.html
+> >    {{ page.zenodo_link }}/files/quast_ERR2231571.html
+> >    {{ page.zenodo_link }}/files/quast_ERR2231572.html
+> >    ```
+> >
+> {: .hands_on}
+{: .comment}

	> The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database\* option to `0`.
	> The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database* option to `0`.

	> > {% snippet faqs/galaxy/fastq_groupmerge.md %}
	> {% snippet faqs/galaxy/fastq_groupmerge.md %}

Update metagenomics assembly tutorial #6410

Are you sure you want to change the base?

Update metagenomics assembly tutorial #6410

Uh oh!

Conversation

vinisalazar commented Oct 10, 2025

Uh oh!

Uh oh!

shiltemann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paulzierep left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paulzierep commented Nov 17, 2025

Uh oh!

bebatut left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants