Skip to content

luslab/IntergenicTranscription

Repository files navigation

Intergenic Transcription

Repository of the code to reproduce analysis and figures for "Intergenic RNA mainly derives from nascent transcripts of known genes". bioRxiv, 2020.

Requirements

  • stringtie
  • igvtools
  • gffcompare
  • R

R packages requirements:

  • GenomicFeatures
  • rtracklayer
  • data.table
  • ggplot2
  • ggpubr
  • cowplot
  • ggthemes
  • ggsci
  • ggforce
  • ggExtra
  • ggrepel
  • scales
  • DT
  • circlize
  • BSgenome.Hsapiens.UCSC.hg38
  • ggbio
  • phastCons100way.UCSC.hg38
  • GenomicAlignments
  • genomation
  • VennDiagram
  • viridis

Preliminary steps

RNA-seq (and NET-seq) datasets

Pre-processing, alignment to the human reference genome and generation of the individual transcriptome assemblies for each dataset have been performed with the RNA-seq-pipeline; the Supplementary Table 1 contains all the accession codes of the datasets used for annotation and validation.

The output files obtained with this procedure should be placed in the following folders:

  • stringtie: GTF files produced by stringtie;
  • counts: QoRTs folders (containing the QC.geneCounts.detailed.txt.gz and QC.summary.txt files) and StrandCheck.out.tab files;
  • RNAseq_bw: stranded (plus and minus) CPM normalised bigWig files;
  • RNAseq_bam: BAM files (post-deduplication).

Reference annotation

Parsing of the Gencode v27 reference annotaion and generation of the R objects used in this analysis have been performed with the R_Gencode_Reference processing scripts.

The R objects obtained with this procedure should be placed in the GencodeReference folder (default); otherwise, change the annotationFolder path using their location on the current machine.

Running the analysis

The main code is in R markdown format (Rmd), which can be opened and executed via R studio or other compatible editors, and it is subdivided into multiple 'chunks', thus providing the ability to execute the different tasks step-by-step.

Final annotation files (BED format)

All identified TUs

  1. Gencode v27 + Intergenic TUs (all)

TUs expressed in HeLa cells

  1. Gencode v27 + Intergenic TUs (HeLa)

TUs selected for metaprofile analysis

  1. Metaprofiles Proximal TUs

  2. Metaprofiles Linker TUs

  3. Metaprofiles Independent TUs

About

Repository of codes used in the Intergenic transcription manuscript

Resources

License

Stars

Watchers

Forks

Contributors