Skip to content

Commit fa1add7

Browse files
committed
fix relative links
1 parent 8bfa4e7 commit fa1add7

File tree

4 files changed

+16
-16
lines changed

4 files changed

+16
-16
lines changed

docs/usage/differential_expression_analysis/de_rstudio.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ As in all analysis, firstly we need to create a new project:
3434

3535
2. Select **New Directory**, **New Project**, name the project as shown below and click on **Create Project**;
3636

37-
![r_project](../img/project_R.png)
37+
![r_project](../differential_expression_analysis/img/project_R.png)
3838

3939
3. The new project will be automatically opened in RStudio.
4040

@@ -47,7 +47,7 @@ To store our results in an organized way, we will create a folder named **de_res
4747

4848
and save the file as **de_script.R**. From now on, each command described in the tutorial can be added to your script. The resulting working directory should look like this:
4949

50-
![work_dir](../img/workdir_RStudio.png)
50+
![work_dir](../differential_expression_analysis/img/workdir_RStudio.png)
5151

5252
The analysis requires several R packages. To utilise them, we need to load the following libraries:
5353

@@ -159,7 +159,7 @@ design(dds_new) # to check the design formula
159159

160160
Comparing the structure of the newly created dds (`dds_new`) with the one automatically produced by the pipeline (`dds`), we can observe the differences:
161161

162-
![comparison_dds](../img/dds_comparison.png)
162+
![comparison_dds](../differential_expression_analysis/img/dds_comparison.png)
163163

164164
Before running the different steps of the analysis, a good practice consists in pre-filtering the genes to remove those with very low counts. This is useful to improve computional efficiency and enhance interpretability. In general, it is reasonable to keep only genes with a sum counts of at least 10 for a minimal number of 3 samples:
165165

docs/usage/differential_expression_analysis/interpretation.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ The results illustrated in this section might show slight variations compared to
1414

1515
The first plot we will examine is the Principal Component Analysis (PCA) plot. Since we're working with simulated data, our metadata is relatively simple, consisting of just three variables: `sample`, `condition`, and `replica`. In a typical RNA-seq experiment, however, metadata can be complex and encompass a wide range of variables that could contribute to sample variation, such as sex, age, and developmental stage.
1616

17-
![pca](../img/pca_plot.png)
17+
![pca](../differential_expression_analysis/img/pca_plot.png)
1818

1919
By plotting the PCA on the PC1 and PC2 axes, using `condition` as the main variable of interest, we can quickly identify the primary source of variation in our data. By accounting for this variation in our design model, we should be able to detect more differentially expressed genes related to `condition`. When working with real data, it's often useful to plot the data using different variables to explore how much variation is explained by the first two PCs. Depending on the results, it may be informative to examine variation on additional PC axes, such as PC3 and PC4, to gain a more comprehensive understanding of the data.
2020

2121
Next, we will examine the hierarchical clustering plot to explore the relationships between samples based on their gene expression profiles. The heatmap is organized such that samples with similar expression profiles are close to each other, allowing us to identify patterns in the data.
2222

23-
![cluster](../img/hierarchical_clustering.png)
23+
![cluster](../differential_expression_analysis/img/hierarchical_clustering.png)
2424

2525
Remember that to create this plot, we utilized the `dist()` function, so in the legend on the right, a value of 0 corresponds to high correlation, while a value of 5 corresponds to very low correlation. Similar to PCA, we can see that samples tend to cluster together according to `condition`, indeed we can observe a high degree of correlation between the three control samples and between the three treated samples.
2626

@@ -31,7 +31,7 @@ Overall, the integration of these plots suggests that we are working with high-q
3131
In this part of the tutorial, we will examine plots that are generated after the differential expression analysis. These plots are not quality control plots, but rather plots that help us to interpret the results.
3232
After running the `results()` function, a good way to start to have an idea about the results is to look at the MA plot.
3333

34-
![ma_plot](../img/MA_plot.png)
34+
![ma_plot](../differential_expression_analysis/img/MA_plot.png)
3535

3636
By default, genes are coloured in blue if the padj is less than 0.1 and the log2 fold change greater than or less than 0. Genes that fall outside the plotting region are represented as open triangles. At this stage, we have not yet applied a filter to select only significant DE genes, which we define as those with a padj value less than 0.5 and a log2 fold change of at least 1 or -1.
3737

@@ -48,27 +48,27 @@ ENSG00000156282 481.7624 1.095272 0.2969594 3.688289
4848

4949
To gain a comprehensive overview of the transcriptional profile, the volcano plot represents a highly informative tool.
5050

51-
![volcano_plot](../img/volcanoplot.png)
51+
![volcano_plot](../differential_expression_analysis/img/volcanoplot.png)
5252

5353
The treatment induced differential expression in five genes: one downregulated and four upregulated. This plot visually represents the numerical results reported in the table above.
5454

5555
After the identification of DE genes, it's informative to visualise the expression of specific genes of interest. The `plotCounts()` function applied directly on the `dds` object allows us to examine individual gene expression profiles without accessing the full `res` object.
5656

57-
![counts](../img/plotCounts.png)
57+
![counts](../differential_expression_analysis/img/plotCounts.png)
5858

5959
In our example, post-treatment, we observe a significant increase in the expression of the _ENSG00000142192_ gene, highlighting its responsiveness to the experimental conditions.
6060

6161
Finally, we can create a heatmap using the normalised expression counts of DE genes. The resulting heatmap visualises how the expression of significant genes varies across samples. Each row represents a gene, and each column represents a sample. The color intensity in the heatmap reflects the normalised expression levels: red colors indicate higher expression, while blue colors indicate lower expression.
6262

63-
![heatmap](../img/heatmap_de_genes.png)
63+
![heatmap](../differential_expression_analysis/img/heatmap_de_genes.png)
6464

6565
By examining the heatmap, we can visually identify the expression patterns of our five significant differentially expressed genes. This visualisation allows us to identify how these genes respond to the treatment. The heatmap provides a clear and intuitive way to explore gene expression dynamics.
6666

6767
## Over Representation Analysis (ORA)
6868

6969
Finally, we can attempt to assign biological significance to our differentially expressed genes through **Over Representation Analysis (ORA)**. The ORA analysis identifies specific biological pathways, molecular functions and cellular processes, according to the **Gene Ontology (GO)** database, that are enriched within our differentially expressed genes.
7070

71-
![enrichment](../img/enrichment_plot.png)
71+
![enrichment](../differential_expression_analysis/img/enrichment_plot.png)
7272

7373
The enrichment analysis reveals a possible involvement of cellular structures and processes, including "clathrin-coated pit", "dendritic spine", "neuron spine" and "endoplasmic reticulum lumen". These terms suggest a focus on cellular transport, structural integrity and protein processing, especially in neural contexts. This pattern points to pathways related to cellular organization and maintenance, possibly playing an important role in the biological condition under study.
7474

docs/usage/differential_expression_analysis/rnaseq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ In order to carry out a RNA-Seq analysis we will use the nf-core pipeline [rnase
1111

1212
The pipeline is organised following the diffent blocks shown below: pre-processing, traditional alignment (or lightweight alignment) and quantification, post-processing and final QC.
1313

14-
![metromap](../img/nf-core-rnaseq_metro_map_grey.png)
14+
![metromap](../differential_expression_analysis/img/nf-core-rnaseq_metro_map_grey.png)
1515

1616
In each process, the users can choose among a range of different options. Importantly, the users can decide to follow one of the two different routes in the alignment and quantification step:
1717

docs/usage/differential_expression_analysis/theory.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Given the central role of RNA in a wide range of molecular functions, RNA-seq ha
1212

1313
After RNA extraction and reverse transcription into complementary DNA (cDNA), the biological material is sequenced, generating NGS "reads" that correspond to the RNA captured in a specific cell, tissue, or organ at a given time. The sequencing data is then bioinformatically processed through a typical workflow summarised in the diagram below:
1414

15-
![excalidraw](../img/Excalidraw_RNAseq.png)
15+
![excalidraw](../differential_expression_analysis/img/Excalidraw_RNAseq.png)
1616

1717
In the scheme, we can identify three key phases in the workflow:
1818

@@ -92,11 +92,11 @@ The results will not be affected by the order of variables but the common practi
9292

9393
RNA-seq data typically contain a large number of genes with low expression counts, indicating that many genes are expressed at very low levels across samples. At the same time, RNA-seq data exhibit a skewed distribution with a long right tail due to the absence of an upper limit for gene expression levels. This means that while most genes have low to moderate expression levels, a small number are expressed at high levels. Accurate statistical modelling must therefore account for this distribution to avoid misleading conclusions.
9494

95-
![count_distribution](../img/count_distribution.png)
95+
![count_distribution](../differential_expression_analysis/img/count_distribution.png)
9696

9797
The core of the differential expression analysis is the `DESeq()` function, a wrapper that streamlines several key steps into a single command. The different functions are listed below:
9898

99-
![deseq2_function](../img/DESeq_function.png)
99+
![deseq2_function](../differential_expression_analysis/img/DESeq_function.png)
100100

101101
:::note
102102
While `DESeq()` combines these steps, a user could choose to perform each function separately to have more control over the whole process.
@@ -122,7 +122,7 @@ While normalised counts are useful for downstream visualisation of results, they
122122

123123
2. **Estimate dispersion and gene-wise dispersion**: the dispersion is a measure of how much the variance deviates from the mean. The dispersion estimates indicate the variance in gene expression at a specific mean expression level. Importantly, RNA-seq data are characterised by **overdispersion**, where the variance in gene expression levels often exceeds the mean (variance > mean).
124124

125-
![overdispersion](../img/overdispersion.png)
125+
![overdispersion](../differential_expression_analysis/img/overdispersion.png)
126126

127127
DESeq2 addresses this issue by employing the **negative binomial distribution**, which generalises the Poisson distribution by introducing an additional dispersion parameter. This parameter quantifies the extra variability present in RNA-seq data, providing a more realistic representation than the Poisson distribution, which assumes mean = variance. DESeq2 starts by estimating the **common dispersion**, a single estimate of dispersion applicable to all genes in the dataset. This estimate provides a baseline for variability across all genes in the dataset. Next, DESeq2 estimates **gene-wise dispersion**, a separate estimate of dispersion for each individual gene, taking into account that different genes may exhibit varying levels of expression variability due to biological differences.
128128
The dispersion parameter (α) is related to the mean (μ), and variance of the data, as described by the equation:
@@ -137,7 +137,7 @@ A key feature of DESeq2's dispersion estimates is their negative correlation wit
137137

138138
4. **Final dispersion estimates**: DESeq2 refines the gene-wise dispersion by shrinking it towards the fitted curve. The "shrinkage" helps control for overfitting, and makes the dispersion estimates more reliable. The strength of the shrinkage depends on the sample size (more samples = less shrinkage), and how close the initial estimates are to the fitted curve.
139139

140-
![dispersion](../img/dispersion_estimates.png)
140+
![dispersion](../differential_expression_analysis/img/dispersion_estimates.png)
141141

142142
The initial estimates (black dots) are shrunk toward the fitted curve (red line) to obtain the final estimates (blue dots). However, genes with exceptionally high dispersion values are not shrunk, as they likely deviate from the model assumptions exhibiting elevated variability due to biological or technical factors. Shrinking these values could lead to false positives.
143143

0 commit comments

Comments
 (0)