Skip to content

Generating kmers from annotation files #19

@drtamermansour

Description

@drtamermansour

Annotation files include GFF, GTF and BED files. We can use any of these files to generate k-mers in 3 main scenarios:

  1. If these files are annotation of transcriptomes: We can use gffread (for GFF or GTF files) or getfasta from bedtools (for GFF or BED files). Note: getfasta in bedtools has 2 related arguments (-split and -rna). We to examine their effect carefully

  2. If the user does not want splicing to happen.
    a. If we have a BED file that annotation genomic blocks: getfasta from bedtools is straightforward
    b. If we have transcriptome annotation file but the user needs each exon as a separate entry: We need to convert the GFF or GTF to BED then we can use getfasta from bedtools as in (a).

## gffread can convert GFF to GTF  
gffread example.gff  -T -o example.gtf

##  UCSC_kent_commands has a binary tool to convert gtf to GenePred format 
wget https://github.com/drtamermansour/horse_trans/raw/master/scripts/UCSC_kent_commands/gtfToGenePred
chmod +x gtfToGenePred
./gtfToGenePred example.gtf example.gpred

## I have script that I got from somewhere I do not remember to convert GenePred to BED file
wget https://raw.githubusercontent.com/drtamermansour/horse_trans/master/scripts/genePredToBed
chmod +x genePredToBed
cat example.gpred | ./genePredToBed > example.bed
  1. If we have transcriptome annotation file but the user needs to generate k-mers from non-exonic structures (e.g. introns, upstream sequences, downstream sequences, exon-exon junctions: We can transform the annotation files to BED files then we need to create a simple script to transform this transcriptome BED file into another BED file that represent the target loci of the user

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions