Skip to content

Conversation

@vinisalazar
Copy link

Supersedes #6408

Work for the FAIRyMAGs 2025 hackathon

Task: update assembly tutorial

Summary of changes:

  • Add new figures
  • Fix punctuation and phrasing
  • Improve explanation on individual vs co-assembly
  • Remove mentions to dereplication and reference binning tutorial (where that is explained) instead

  - Remove 'Describe what de-replication is' from objectives; this is in the scope of the binning tutorial
  - Add vinisalazar
  - Causing jekyll build to fail
Copy link
Member

@shiltemann shiltemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vinisalazar!

@shiltemann shiltemann changed the title Update assembly tutorial Update metagenomics assembly tutorial Nov 6, 2025
Copy link
Collaborator

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, minor updates, will continue review next week

- Samples from different patients.
- Samples from the same site, but over different seasons or under different environmental conditions, eg. a patch of soil before and after a bushfire event, a marine site under upwelling vs. under normal conditions.
If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one can bin per sample - which is mostly done - and then de-replicate later, that avoids chimeric bins, similar to co-assembly, I would rather suggest to de-replicate after binning

- Related samples
If it is not the case, **individual assembly** should be prefered. In this case, an extra step of **de-replication** should be used:
Examples where co-assembly would be reasonable:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this FAQ when its merged: #6474

{: .question}
> <details-title>Co-assembly with MetaSPAdes</details-title>
> MetaSPAdes supports co-assembly by passing a list of paired-end read files. MEGAHIT, on the other hand, requires concatenating that list of paired-end read files into a single pair of forward and reverse files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can now be done with the tool in the faq and megahit supports it anyway as tool parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify the hands-on box below for that? Thanks

It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly.
It makes use of the Succinct de Bruijn Graph (SdBG) approach to achieve low memory assembly.
Both tools are available in Galaxy. But currently, only MEGAHIT can be used in individual mode for several samples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now easy to do with nested collections, can you add this FAQ once its merged: #6476

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should that be added? Could you do it? Thanks a lot

@paulzierep
Copy link
Collaborator

Thanks a lot for the update, after suggestons and adding the FAQ, its good from my side !

Copy link
Member

@bebatut bebatut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.
@paulzierep I added some extra suggestions but also comments for you

- Related samples
If it is not the case, **individual assembly** should be prefered. In this case, an extra step of **de-replication** should be used:
Examples where co-assembly would be reasonable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Examples where co-assembly would be reasonable:
{% snippet faqs/galaxy/fastq_groupmerge.md %}
Examples where co-assembly would be reasonable:

- Samples from different patients.
- Samples from the same site, but over different seasons or under different environmental conditions, eg. a patch of soil before and after a bushfire event, a marine site under upwelling vs. under normal conditions.
If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used:
If samples differ as described, **individual assembly** is preferred. In the case of individual assembly, **contigs should be binned** per sample and an extra step of **de-replication** should be used as binning:

![Image shows the process of individual assembly on two strains and five samples, after individual assembly of samples two samples are chosen for de-replication process. In parallel, co-assembly on all five samples is performed](./images/individual-assembly.png "Individual assembly followed by de-replication vs co-assembly. Source: dRep documentation"){:width="80%"}
Co-assembly is more commonly used than individual assembly and then de-replication after binning. But in this tutorial, to show all steps, we will run an **individual assembly**.
For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).
> <comment-title></comment-title>
> For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}).
{: .comment}

>
> {% snippet faqs/galaxy/datasets_import_via_link.md %}
>
{: .hands_on}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{: .hands_on}
> <comment-title></comment-title>
>
> If the QUAST process takes too much time, we can import the results:
>
> > <hands-on-title>Import generated QUAST results</hands-on-title>
> >
> > 1. Import the QUAST report file from [Zenodo]({{ page.zenodo_link }}) or the Shared Data library:
> >
> > ```text
> > {{ page.zenodo_link }}/files/quast_ERR2231567.html
> > {{ page.zenodo_link }}/files/quast_ERR2231568.html
> > {{ page.zenodo_link }}/files/quast_ERR2231569.html
> > {{ page.zenodo_link }}/files/quast_ERR2231570.html
> > {{ page.zenodo_link }}/files/quast_ERR2231571.html
> > {{ page.zenodo_link }}/files/quast_ERR2231572.html
> > ```
> >
> {: .hands_on}
{: .comment}

Comment on lines +289 to +293
> 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters
> - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs`
> - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads`
> - *"Select k-mer detection option"*: `User specific`
> - *"K-mer size values"*: `21,33,55,77`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters
> - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs`
> - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads`
> - *"Select k-mer detection option"*: `User specific`
> - *"K-mer size values"*: `21,33,55,77`
> > <hands-on-title>Assembly with MetaSPAdes</hands-on-title>
> > 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters
> > - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs`
> > - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads`
> > - *"Select k-mer detection option"*: `User specific`
> > - *"K-mer size values"*: `21,33,55,77`
> >
> {: .hands_on}

{: .question}
> <details-title>Co-assembly with MetaSPAdes</details-title>
> MetaSPAdes supports co-assembly by passing a list of paired-end read files. MEGAHIT, on the other hand, requires concatenating that list of paired-end read files into a single pair of forward and reverse files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify the hands-on box below for that? Thanks

It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly.
It makes use of the Succinct de Bruijn Graph (SdBG) approach to achieve low memory assembly.
Both tools are available in Galaxy. But currently, only MEGAHIT can be used in individual mode for several samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should that be added? Could you do it? Thanks a lot

A base in the reference genome is counted as aligned if at least one contig has at least one alignment to this base.
We did not provide any reference there, but metaQuast try to identify genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
We did not provide any reference genome, but QUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are downloaded from NCBI to map the assemblies on them and compute the genome fractions.

We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
> <comment-title>Metagenome reference</comment-title>
> The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database\*** option to `0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database\*** option to `0`.
> The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database*** option to `0`.

> <comment-title>Why not both?</comment-title>
> It is also possible to run both individual assembly and co-assembly, and this approach can recover MAGs effectively. In this case: individual assembly can recover MAGs with a low amount of contamination, while co-assembly also allows for the recovery of low-abundance MAGs, with the downside of potentially more contamination. Although this approach can be effective, it also requires high computational resources and should be considered carefully.
>
> > {% snippet faqs/galaxy/fastq_groupmerge.md %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> > {% snippet faqs/galaxy/fastq_groupmerge.md %}
> {% snippet faqs/galaxy/fastq_groupmerge.md %}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants