Repository files navigation drosophila_parthenogenesis
RepeatModeler2 to identify repeats
RepeatMasker to mask identified repeats
Annotation to get braker.gtf
Trim raw reads for adapters
Map with STAR against unmasked reference
Merge bam
Use braker2 on merged bam supplying soft-masked genome
D. melanogaster orthologs *
Extract CDS sequence from reference given the braker gtf file using gffread.
Obtain gilist of Drosophila melanogaster using esearch and efetch.
Use blastx to blast CDS of genes to Drosophila melanogaster proteins.
Obtain 'Uniprot accession numbers to gene names' file from flybase (fbgn_NAseq_Uniprot_fb_2020_03.tsv) and edit.
Check effect of filtering BLAST output using different cut-offs (script: check_blast_output_cutoff.R)
Filter BLAST file for minimum evalue (<1e-10) and minimum query coverage (>50), percent identity (>35), and merge 'accession to gene names mapping' file (script: add_fbgn_to_blast_strict.R).
Reverse orthologs (D. melanogaster blasted against D. mercatorum) *
Used Dmerc CDS sequence extracted above, and obtained D. melanogaster CDS sequence (dmel-all-CDS-r6.35_edit.fasta).
Created blast database with Dmerc CDS.
Used tBLASTx to blast Dmel CDS against Dmerc CDS.
Filter BLAST file for minimum evalue (<1e-10) and minimum query coverage (>50), percent identity (>35).
Edited in R (script: create_tblastx_map_file.R).
For all blast analyses, max-target seqs was set to the size of the subject database.
Contigs to D. melanogaster chromosomes mapping (script: Map_to_Dmel_chrom.R) - "GENE-BASED"
Filter BLAST file as above
Extract FBgns (from: FBgn_blast_eval1e10_qcovs50_pident35.txt) and use 'Batch Download' tool from FlyBase to add chromosome info.
Dmerc gene IDs were then combined with FBgns and Dmel chromosomes
Dmerc genes mapping to multiple FBgns from different chromosomal arms were excluded
Then counted how many Dmerc genes from each scaffold map to each Dmel chromosome (output: Scaffold_to_Dmel_chromosome_top_hits.txt)
Contigs to D. melanogaster chromosomes mapping - "NUCLEOTIDE BASED"
Run nucmer on repeat masked sequences (MUMmer3.23)
Reference: dmel-all-chromosome-r6.27.fasta.masked
Query: dmerc.ctg.polished1.fa.masked_headerEdit
Options: --coords --maxgap 500 --maxmatch --prefix Dmerc_Dmel_hardmask_nucmer_maxmatch_maxgap500
Create coordinates file: show-coords Dmerc_Dmel_hardmask_nucmer_maxmatch_maxgap500.delta -l -c
Consider Dmerc contig mapping to Dmel chromosome if >40% alignments to one Dmel chromosome, and all other chromosomes have <20% alignments.
Tried same with D. albomicans, but more confusing because of chromosome fusions, I guess.
Differential expression analysis
Mapped trimmed reads against Dmerc genome assembly considering the braker.gtf gene annotations.
Counted reads using featureCounts.
Decided not to filter genes for a minimum count!
Used DESeq2 with model: read counts ~ Parthenogenesis Group. Then extracted results for pairwise contrasts.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.