Skip to content

Commit b112004

Browse files
committed
Edit phrasing and fix typos
1 parent bb3a08c commit b112004

File tree

1 file changed

+28
-28
lines changed
  • topics/microbiome/tutorials/metagenomics-assembly

1 file changed

+28
-28
lines changed

topics/microbiome/tutorials/metagenomics-assembly/tutorial.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,18 @@ questions:
1111
- "How tools based on de Bruijn graph work?"
1212
- "How to assess the quality of metagenomic data assembly?"
1313
objectives:
14-
- "Describe what an assembly is"
15-
- "Explain the difference between co-assembly and individual assembly"
16-
- "Explain the difference between reads, contigs and scaffolds"
17-
- "Explain how tools based on De Bruijn graph work"
18-
- "Apply appropriate tools for analyzing the quality of metagenomic data"
19-
- "Construct and apply simple assembly pipelines on short read data"
20-
- "Apply appropriate tools for analyzing the quality of metagenomic assembly"
21-
- "Evaluate the Quality of the Assembly with Quast, Bowtie2, and CoverM-Contig"
14+
- "Describe what an assembly is."
15+
- "Explain the difference between co-assembly and individual assembly."
16+
- "Explain the difference between reads, contigs and scaffolds."
17+
- "Explain how tools based on de Bruijn graph work."
18+
- "Evaluate the Quality of the Assembly with QUAST, Bowtie2, and CoverM-Contig."
19+
- "Construct and apply simple assembly pipelines on short read data.""
2220
time_estimation: "2H"
2321
key_points:
24-
- "Assembly groups reads into contigs and scafolds."
25-
- "de Brujin Graphs use k-mers to assembly reads"
26-
- "MetaSPAdes and MEGAHIT are assemblers"
27-
- "Quast is the tool to assess the assembly quality"
22+
- "Assembly groups reads into contigs and scaffolds."
23+
- "de Brujin Graphs use k-mers to assembly reads."
24+
- "MetaSPAdes and MEGAHIT are short-read assemblers."
25+
- "MetaQUAST is a tool to assess metagenomic assembly quality."
2826
edam_ontology:
2927
- topic_3174 # Metagenomics
3028
- topic_0196 # Sequence assembly
@@ -210,21 +208,21 @@ For more information on dereplication, check out the [metagenomic binning tutori
210208
211209
In this tutorial, to show all steps, we will run an **individual assembly**.
212210
213-
> <comment-title></comment-title>
214-
> Sometimes it is important to run assembly tools both on individual samples and on all pooled samples, and use both outputs to get the better outputs for the certain dataset.
211+
> <comment-title>Why not both?</comment-title>
212+
> Sometimes it is important to run both individual assembly and co-assembly, and use both outputs to get better results for that dataset.
215213
{: .comment}
216214
217215
As mentioned in the introduction, several tools are available for metagenomic assembly. But 2 are the most used ones:
218216
219-
- **MetaSPAdes** ({%cite nurk2017%}): an short-read assembler designed specifically for large and complex metagenomics datasets
217+
- **MetaSPAdes** ({%cite nurk2017%}): an short-read assembler designed specifically for large and complex metagenomics datasets.
220218
221219
MetaSPAdes is part of the SPAdes toolkit, which has several assembly pipelines. Since SPAdes handles non-uniform coverage, it is useful for assembling simple communities, but metaSPAdes also handles other problems, allowing it to assemble complex communities' metagenomes.
222220
223221
As input for metaSPAdes it can accept short reads. However, there is an option to use additionally long reads besides short reads to produce hybrid input.
224222
225223
- **MEGAHIT** ({% cite li2015 %}): a single node assembler for large and complex metagenomics NGS reads, such as soil
226224
227-
It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly.
225+
It makes use of the Succinct de Bruijn Graph (SdBG) approach to achieve low memory assembly.
228226
229227
Both tools are available in Galaxy. But currently, only MEGAHIT can be used in individual mode for several samples.
230228
@@ -246,9 +244,11 @@ Both tools are available in Galaxy. But currently, only MEGAHIT can be used in i
246244
>
247245
{: .hands_on}
248246
249-
**MEGAHIT** produced a collection of output assemblies - one per sample - that can be proceeded further in binning step and then de-replication. The output contains **contigs**, contiguous lengths of genomic sequences in which bases are known to a high degree of certainty.
247+
**MEGAHIT** produced a collection of output assemblies - one per sample - that can be used for the subsequent step of **metagenomic binning**. The output contains **contigs**, contiguous lengths of genomic sequences in which bases are known to a high degree of certainty.
250248
251-
Contrary to **MetaSPAdes**, **MEGAHIT** does not output **scaffolds**, i.e. segments of genome sequence reconstructed fron contigs and gaps. The gaps occur when reads from the two sequenced ends of at least one fragment overlap with other reads from two different contigs (as long as the arrangement is otherwise consistent with the contigs being adjacent). It is possible to estimate the number of bases between contigs based on fragment lengths.
249+
<comment-title>Scaffolds</comment-title>
250+
Contrary to **MetaSPAdes**, **MEGAHIT** does not output **scaffolds**. **Scaffolds** are segments of genome sequence reconstructed fron contigs and gaps. The gaps occur when reads from the two sequenced ends of at least one fragment overlap with other reads from two different contigs (as long as the arrangement is otherwise consistent with the contigs being adjacent). It is possible to estimate the number of bases between contigs based on fragment lengths.
251+
{:. comment}
252252
253253
> <comment-title></comment-title>
254254
>
@@ -268,7 +268,7 @@ Contrary to **MetaSPAdes**, **MEGAHIT** does not output **scaffolds**, i.e. segm
268268
> > ```
269269
> >
270270
> >
271-
> > 2. Create a collection named `MEGAHIT Contig`, rename your pairs with the sample name
271+
> > 2. Create a collection named `MEGAHIT Contigs`, rename your pairs with the sample name
272272
> >
273273
> {: .hands_on}
274274
{: .comment}
@@ -309,7 +309,7 @@ Assemblies can be evaluated with **metaQUAST** ({%cite mikheenko2016%}), the met
309309
310310
> <hands-on-title>Evaluation assembly quality with metaQUAST</hands-on-title>
311311
>
312-
> 1. {% tool [Quast](toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.2.0+galaxy1) %} with parameters:
312+
> 1. {% tool [QUAST](toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.2.0+galaxy1) %} with parameters:
313313
> - *"Assembly mode?*": `Individual assembly (1 contig file per samples)`
314314
> - *"Use customized names for the input files?"*: `No, use dataset names`
315315
> - {% icon param-collection %} *"Contigs/scaffolds file"*: output **MEGAHIT**
@@ -328,11 +328,11 @@ Assemblies can be evaluated with **metaQUAST** ({%cite mikheenko2016%}), the met
328328
329329
> <comment-title></comment-title>
330330
>
331-
> Since the Quast process would take times we are just going to import the results:
331+
> Since the QUAST process would take times we are just going to import the results:
332332
>
333-
> > <hands-on-title>Import generated metaQuast results</hands-on-title>
333+
> > <hands-on-title>Import generated metaQUAST results</hands-on-title>
334334
> >
335-
> > 1. Import the metaQuast report file from [Zenodo]({{ page.zenodo_link }}) or the Shared Data library:
335+
> > 1. Import the metaQUAST report file from [Zenodo]({{ page.zenodo_link }}) or the Shared Data library:
336336
> >
337337
> > ```text
338338
> > {{ page.zenodo_link }}/files/quast_ERR2231567.html
@@ -346,7 +346,7 @@ Assemblies can be evaluated with **metaQUAST** ({%cite mikheenko2016%}), the met
346346
> {: .hands_on}
347347
{: .comment}
348348
349-
Quast main output are HTML reports which aggregate different metrics.
349+
QUAST main output are HTML reports which aggregate different metrics.
350350
351351
## Assembly statistics
352352
@@ -358,7 +358,7 @@ On the top of each report is a table with in rows statistics for contigs larger
358358
359359
A base in the reference genome is counted as aligned if at least one contig has at least one alignment to this base.
360360
361-
We did not provide any reference there, but metaQuast try to identify genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
361+
We did not provide any reference there, but metaQUAST try to identify genome content of the metagenome by aligning contigs to [SILVA](https://www.arb-silva.de/) 16S rRNA database. For each assembly, 50 reference genomes with top scores are chosen. The full reference genomes of the identified organisms are afterwards downloaded from NCBI to map the assemblies on them and compute the genome fractions.
362362
363363
For each identified genomes, the genome fraction is given when clicking on **Genome fraction (%)**
364364
@@ -475,7 +475,7 @@ On the top of each report is a table with in rows statistics for contigs larger
475475
476476
3. **Misassemblies**: joining sequences that should not be adjacent.
477477
478-
Quast identifies missassemblies by mapping the contigs to the reference genomes of the identified organisms. 3 types of misassemblies can be identified:
478+
QUAST identifies missassemblies by mapping the contigs to the reference genomes of the identified organisms. 3 types of misassemblies can be identified:
479479
480480
![Image shows on the top a contig with a blue and a gren parts with white arrows (pointing on the right) on them and below a reference with 2 chromosomes. The 3 types of misassemblies are after schematized. Relocation: the blue and gren parts of the contig are on chr 1 but separated. Inversion: the blue and gren parts of the contig are on chr 1 but separated and with the arrows facing each other. Translocation: the blue part is on chr 1 and gren part on chr 2.](./images/quast_misassemblies.png "Source: <a href="https://quast.sourceforge.net/docs/manual.html#sec3.1.2">QUAST manual</a>"){:width="60%"}
481481
@@ -800,8 +800,8 @@ Metagenomic data can be assembled to, ideally, obtain the genomes of the species
800800
- **different tools** like MetaSPAdes and MEGAHIT
801801
802802
Once the choices made, metagenomic assembly can start:
803-
1. Input data are assembled to obtain contigs and sometimes scaffolds
804-
2. Assembly quality is evaluated with various metrics
803+
1. Input data are assembled to obtain contigs and sometimes scaffolds.
804+
2. Assembly quality is evaluated with various metrics.
805805
3. The assembly graph can be visualized.
806806
807807
Once all these steps done, we can move to the next phase to build Metagenomics Assembled Genomes (MAGs): [metagenomic binning](../metagenomics-binning/tutorial.md).

0 commit comments

Comments
 (0)