Skip to content

cyntsc/meta-analisis-athal

Repository files navigation

Here is the metatranscriptomic analysis done to identify genetic patterns related with the consensus response in Arabidopsis under diverse biotic stressors caused by fungi

Experimental Design

  • The experiment was designed in 2 blocks. First block with 8 transcriptomes from healthy plants arranged in 4 groups for control (healthy12, healthy18, healthy24 and healthy30), and Second block with 17 transcriptomes from infected plants in interaction with three Ascomycete fungi (B=Botrytis cinerea, Ch=Colletotrichum higginsianum, and Ss=Sclerotinia sclerotiorum) arranged in 5 groups for the treatments (Bc12, Bc18, Bc24, Ch22, Ch40 and Ss30) --the digits correspond to the time of inoculation (hpi) with the fungus.
  • The treatments represent 68% of the included samples (32% for C higginsianum, 24% for B cinerea and 12% for S sclerotiorum), the remaining 32% corresponds to the controls.
  • RNA was extracted from leaf among a range from 0 to 48 hpi
  • Just transcriptomes sequenced in Illumina platforms with a lenght.seq>100 & reads>5G were considered.

Material & Methods

  • Data are bulk RNASeq libraries downloaded from the SRA-NCBI:
    ID Bioprojects: PRJNA148307 (SRR364389, SRR364390, SRR364391, SRR364392, SRR364400, SRR364401, SRR364398 and SRR364399) for arabidopsis infected with Colletotrichum higginsianum at 22 and 40 hpi, PRJNA315516 (SRR3383696, SRR3383697, SRR3383779 and SRR3383780) arabidopsis infected with Botrytis cinerea at 12 and 18 hpi, PRJNA593073 (SRR10586397 and SRR10586399) with Botrytis cinerea at 24 hpi, and PRJNA418121 (SRR6283146, SRR6283147 and SRR6283148) arabidopsis infected with Sclerotinia sclerotiorum at 30 hpi. The arabidopsis healthy RNASeq libraries were downloaded from the same repository under the ID Bioprojects: PRJNA315516 (SRR3383640 and SRR3383641) mock treatment at 12hr, (SRR3383782 and SRR3383783) mock treatment at 18 hr and (SRR3383821 and SRR3383822) mock treatment at 30hr, and PRJNA418121 (SRR6283144 and SRR6283145) mock treatment at 30hr.
  • For the RNASeq raw-counts, I followed the alignment to genetic reference approach. The GENOME TAIR10 (GenBank accessions CP002684 – CP002688) and the ARAPORTt11 annotation were used, with a target of 27655 Protein-Coding-Sequence (CDS).
  • A Clustering (ML not-supervised) approach was used, made with WGCNA. Signed-networks were built with the Pearson method. A threshold 𝛃=0.80 for signed-ntw & merged at 0.1 euclidean distances was set. Genetic-Modules with corr ≷ 0.75 were extracted.
  • perks
  • Genetic-Modules on infected plants of interest were identified through logical operations, extracting modules differentiated between 100 and 77% from the healthy plants.

Folder content:

HTC_scripts_biotools: scripts to get the raw-counts.
R_scripts_WGCNA: scripts in R to build the genetic ntws with WGCNA.
grep_utilities: grep build-in commands to extract data from files (g.e:alignment stats).
meta-data: mostly files to link external data to the genetic ntws.
notebooks: phython 3+ scritps to build the expression matrices.
results-data: statistical results, ntw results, intermedian results, all that can be considered a product comming from a process.

Supplementary material


WGCNA Coexpression Ntw(s) were built with the help of several online public resources. Here, I share you some links that hopefully can help you in your own project:
Please be aware you need to adjust code to your own needs
https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/
http://pklab.med.harvard.edu/scw2014/WGCNA.html
https://wikis.utexas.edu/display/bioiteam/Clustering+using+WGCNA
https://www.polarmicrobes.org/weighted-gene-correlation-network-analysis-wgcna-applied-to-microbial-communities/


If you just want to have a glance at the outputs of the main scripts of this project, you can access this information by following the links.

Script to build the expression matrices

2_Matrix_A_integrate_raw_countsb
https://nbviewer.jupyter.org/github/cyntsc/meta-analisis-athal/blob/master/notebooks/2_Matrix_A_integrate_raw_counts.ipynb

3_Matrix_B_TPM_normalization
https://nbviewer.jupyter.org/github/cyntsc/meta-analisis-athal/blob/master/notebooks/3_Matrix_B_TPM_normalization.ipynb

4_Matrix_C_TPM_standardization
https://nbviewer.jupyter.org/github/cyntsc/meta-analisis-athal/blob/master/notebooks/4_Matrix_C_TPM_standardization.ipynb

5_Matrix_D_E_Log2_Atypicals
https://nbviewer.jupyter.org/github/cyntsc/meta-analisis-athal/blob/master/notebooks/5_Matrix_D_E_Log2_Atypicals.ipynb

Some interactive HTML files about the expression matrices built can be downloaded from the results-data/matrices_de_expresion/ folder: (just download the file in your local PC and open it.
Interactive stats for ARABIDOPSIS HEALTHY MatrixD.html
Interactive stats for ARABIDOPSIS INFECTED MatrixE.html
Interactive stats for comparition in ARABIDOPSIS INFECTED MatrixD and E.html
More stats...


Script to get the genetic modules in the expression matrices

WGCNA scripts: WGCNA scripts for Signed-Ntw (pearson)

WGCNA results: Genetic Modules gotten with WGCNA Signed-Ntw (pearson)
Preferred genetic modules results are in folders:
Athal_healthy_mods_merged_MatrixD for A thaliana Healthy
Athal_infected_mods_merged_MatrixE for A thaliana Infected


Be happy and Enjoy!
Cynthia SC

About

Respaldo del proyecto de meta-analsis en datos RNASeq para la planta arabidopsis infectada por hongo

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors