Below I have a few important questions for my analysis (specifically moving onto SNPeff or Gowinda) and after I have more questions (less important) pertaining to visualization and final plots required.
Currently can filter both model output and selection coefficients for significance after adjusting p values, but how to filter the Fst values (other than keeping all those with a Fst value)?
Note: Fst values are windows, so use of comparison with other measures here is to ensure Fst is sufficiently high in window around positions of interest.
Should I downscale the Fst values with Sel:Sel and Con:Con comparisons? (below are the plots for 115)
--: Method? (previous ideas were (Fst_C:C + Fst_S:S)/2 for scaling)
When performing p.adjust, should the adjustments be for the full genome (all positions) or on a per chromosome basis?
-- Currently have to do per chromo for poolseq and am doing full genome for model output.
Fdr adjustment for p-values keeps more positions but Bonferroni gives more visually appealling plots (see below) and more accuracy for positions
For plots of outputs: would Bonferroni plots be better?
For finding positions of interest: is FDR still prefered better?
OR: keep consistent between the two (which method)?
Current method is to keep any significant (after FDR p.adjust) selection coefficients that are unique to predation lines (i.e no Selcoeef for Con).
This is the average Selcoef b/w two mappers (keeping the less significant p-value).
Does this method make sense?
The ancestral nucleotide diversity:
- Necessary for all populations?
-- Have all populations (and for bowtie and novoalign mappers)
- Average Pi for all mappers??
-- Calculate bwa Pi and average between three? or show one (or 2) mappers as a represenation?
- Overlay for changes in diversity over time?
-- Do we want overlay plots with ~splines showing the change in diversity from Ancestor --> 115?
Average pairwise Fst between control and selection replicates
-- Average b/w mappers and replicates
- Downscaling (available for all generations): Necessary? and methods?
-- previous ideas were (Fst_C:C + Fst_S:S)/2 for scaling
meanFst: Selection vs. Control: Generation115
meanFst: Control vs. Control: Generation115
meanFst: Selection vs. Selection: Generation115

- Cut off for positions?
-- Currently keeping anything with an Fst value for Con:Sel_115 comparison: any way to filter more deeply for peeks
-- Should I keep the top 50%? the top 10% Fst values?
Plots for original values and FDR adjusted
None corrected P values: TxG -log10(meanP-value)

FDR Corrected P-values: TxG -log10(meanP-value)

- is Bonferroni a better visualization for the paper (much less going on)
TxG: -log10(meanP) with Bonferroni Correction for multiple comparisons

Advantage with this: Can create a plot on with valued of the regular plot (first one) with coloured sig. values: Would not look good with FDR:
Output from PoolSeq package: the significant selection coeffients that were significant for Predation lines and not for controls
Ongoing with the slow pace of Poolseq: 3L and 3R almost completed
This is the average b/w two mappers (bwa and novoalign), keeping the least significant pvalue.
-
Plot like above (and all chromo eventually)?
-
Any cut off for selection coefficients or just any significant selection coefficients unique to predator lines?
Filtered positions for:
-- pvalues <0.05 after FDR from model output
-- similar positions in the poolseq significant selection coefficients
-- Then found any overlapping windows with these positions with Fst values != 0.
Ended up with ~400 positions for both 2L and 2R each
Trajectories are the mean absolute difference the treatments had from the ancestor
- Plots of individual positions?
-- Is it informative to select some large peaked positions that are shared and show the actual trajectories of frequencies?
- Overlay the postions of interest onto the output from the model:
-- Most interesting plot would be the -log10(p) plot from model, should the positions present from Poolseq and Fst be coloured and used as well?
-- larger and coloured positions on the model output for example:
2L with FDR:
2L with Bonferroni:







