-
Notifications
You must be signed in to change notification settings - Fork 1k
Update metagenomics assembly tutorial #6410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Remove 'Describe what de-replication is' from objectives; this is in the scope of the binning tutorial
- Add vinisalazar
- Causing jekyll build to fail
shiltemann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vinisalazar!
paulzierep
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, minor updates, will continue review next week
| - Samples from different patients. | ||
| - Samples from the same site, but over different seasons or under different environmental conditions, eg. a patch of soil before and after a bushfire event, a marine site under upwelling vs. under normal conditions. | ||
| If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one can bin per sample - which is mostly done - and then de-replicate later, that avoids chimeric bins, similar to co-assembly, I would rather suggest to de-replicate after binning
| - Related samples | ||
| If it is not the case, **individual assembly** should be prefered. In this case, an extra step of **de-replication** should be used: | ||
| Examples where co-assembly would be reasonable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this FAQ when its merged: #6474
| {: .question} | ||
| > <details-title>Co-assembly with MetaSPAdes</details-title> | ||
| > MetaSPAdes supports co-assembly by passing a list of paired-end read files. MEGAHIT, on the other hand, requires concatenating that list of paired-end read files into a single pair of forward and reverse files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can now be done with the tool in the faq and megahit supports it anyway as tool parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the hands-on box below for that? Thanks
| It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly. | ||
| It makes use of the Succinct de Bruijn Graph (SdBG) approach to achieve low memory assembly. | ||
| Both tools are available in Galaxy. But currently, only MEGAHIT can be used in individual mode for several samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now easy to do with nested collections, can you add this FAQ once its merged: #6476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should that be added? Could you do it? Thanks a lot
|
Thanks a lot for the update, after suggestons and adding the FAQ, its good from my side ! |
Co-authored-by: paulzierep <[email protected]>
bebatut
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update.
@paulzierep I added some extra suggestions but also comments for you
| - Related samples | ||
| If it is not the case, **individual assembly** should be prefered. In this case, an extra step of **de-replication** should be used: | ||
| Examples where co-assembly would be reasonable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Examples where co-assembly would be reasonable: | |
| {% snippet faqs/galaxy/fastq_groupmerge.md %} | |
| Examples where co-assembly would be reasonable: |
| - Samples from different patients. | ||
| - Samples from the same site, but over different seasons or under different environmental conditions, eg. a patch of soil before and after a bushfire event, a marine site under upwelling vs. under normal conditions. | ||
| If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If samples differ like described, **individual assembly** is preferred. In the case of individual assembly, if **contigs are binned** after, an extra step of **de-replication** should be used: | |
| If samples differ as described, **individual assembly** is preferred. In the case of individual assembly, **contigs should be binned** per sample and an extra step of **de-replication** should be used as binning: |
| {:width="80%"} | ||
| Co-assembly is more commonly used than individual assembly and then de-replication after binning. But in this tutorial, to show all steps, we will run an **individual assembly**. | ||
| For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}). | |
| > <comment-title></comment-title> | |
| > For more information on dereplication, check out the [metagenomic binning tutorial]({% link topics/microbiome/tutorials/metagenomics-binning/tutorial.md %}). | |
| {: .comment} |
| > | ||
| > {% snippet faqs/galaxy/datasets_import_via_link.md %} | ||
| > | ||
| {: .hands_on} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| {: .hands_on} | |
| > <comment-title></comment-title> | |
| > | |
| > If the QUAST process takes too much time, we can import the results: | |
| > | |
| > > <hands-on-title>Import generated QUAST results</hands-on-title> | |
| > > | |
| > > 1. Import the QUAST report file from [Zenodo]({{ page.zenodo_link }}) or the Shared Data library: | |
| > > | |
| > > ```text | |
| > > {{ page.zenodo_link }}/files/quast_ERR2231567.html | |
| > > {{ page.zenodo_link }}/files/quast_ERR2231568.html | |
| > > {{ page.zenodo_link }}/files/quast_ERR2231569.html | |
| > > {{ page.zenodo_link }}/files/quast_ERR2231570.html | |
| > > {{ page.zenodo_link }}/files/quast_ERR2231571.html | |
| > > {{ page.zenodo_link }}/files/quast_ERR2231572.html | |
| > > ``` | |
| > > | |
| > {: .hands_on} | |
| {: .comment} |
| > 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters | ||
| > - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs` | ||
| > - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads` | ||
| > - *"Select k-mer detection option"*: `User specific` | ||
| > - *"K-mer size values"*: `21,33,55,77` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters | |
| > - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs` | |
| > - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads` | |
| > - *"Select k-mer detection option"*: `User specific` | |
| > - *"K-mer size values"*: `21,33,55,77` | |
| > > <hands-on-title>Assembly with MetaSPAdes</hands-on-title> | |
| > > 1. {% tool [MetaSPAdes](toolshed.g2.bx.psu.edu/repos/nml/metaspades/metaspades/4.2.0+galaxy0) %} with following parameters | |
| > > - *"Pair-end reads input format"*: `Paired-end: list of dataset pairs` | |
| > > - {% icon param-collection %} *"FASTQ file(s): collection"*: `Raw reads` | |
| > > - *"Select k-mer detection option"*: `User specific` | |
| > > - *"K-mer size values"*: `21,33,55,77` | |
| > > | |
| > {: .hands_on} |
| {: .question} | ||
| > <details-title>Co-assembly with MetaSPAdes</details-title> | ||
| > MetaSPAdes supports co-assembly by passing a list of paired-end read files. MEGAHIT, on the other hand, requires concatenating that list of paired-end read files into a single pair of forward and reverse files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the hands-on box below for that? Thanks
| It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly. | ||
| It makes use of the Succinct de Bruijn Graph (SdBG) approach to achieve low memory assembly. | ||
| Both tools are available in Galaxy. But currently, only MEGAHIT can be used in individual mode for several samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should that be added? Could you do it? Thanks a lot
| A base in the reference genome is counted as aligned if at least one contig has at least one alignment to this base. | ||
| We did not provide any reference there, but metaQuast try to identify genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions. | ||
| We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions. | |
| We did not provide any reference genome, but QUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are downloaded from NCBI to map the assemblies on them and compute the genome fractions. |
| We did not provide any reference genome, but metaQUAST tries to identify the genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions. | ||
| > <comment-title>Metagenome reference</comment-title> | ||
| > The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database\*** option to `0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database\*** option to `0`. | |
| > The alignment to automatically downloaded genomes for metagenomes is rather ambiguous and time-consuming. Most large-scale pipelines skip this step and set the **Maximum number of reference genomes (per each assembly) to download after searching in the SILVA database*** option to `0`. |
| > <comment-title>Why not both?</comment-title> | ||
| > It is also possible to run both individual assembly and co-assembly, and this approach can recover MAGs effectively. In this case: individual assembly can recover MAGs with a low amount of contamination, while co-assembly also allows for the recovery of low-abundance MAGs, with the downside of potentially more contamination. Although this approach can be effective, it also requires high computational resources and should be considered carefully. | ||
| > | ||
| > > {% snippet faqs/galaxy/fastq_groupmerge.md %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > > {% snippet faqs/galaxy/fastq_groupmerge.md %} | |
| > {% snippet faqs/galaxy/fastq_groupmerge.md %} |
Supersedes #6408
Work for the FAIRyMAGs 2025 hackathon
Task: update assembly tutorial
Summary of changes: