Skip to content

Latest commit

 

History

History
190 lines (101 loc) · 7.44 KB

File metadata and controls

190 lines (101 loc) · 7.44 KB

File with the outline of plots and questions about these first draft of plots

Most Important Questions for Ian:

Below I have a few important questions for my analysis (specifically moving onto SNPeff or Gowinda) and after I have more questions (less important) pertaining to visualization and final plots required.

1. Filtering positions for Fst values:

Currently can filter both model output and selection coefficients for significance after adjusting p values, but how to filter the Fst values (other than keeping all those with a Fst value)?

Note: Fst values are windows, so use of comparison with other measures here is to ensure Fst is sufficiently high in window around positions of interest.

Should I downscale the Fst values with Sel:Sel and Con:Con comparisons? (below are the plots for 115)

 --: Method? (previous ideas were (Fst_C:C + Fst_S:S)/2 for scaling)

2. Adjusting P values: Chromosome OR full genome ??

When performing p.adjust, should the adjustments be for the full genome (all positions) or on a per chromosome basis?

-- Currently have to do per chromo for poolseq and am doing full genome for model output.

3. Bonferroni vs. Fdr:

Fdr adjustment for p-values keeps more positions but Bonferroni gives more visually appealling plots (see below) and more accuracy for positions

For plots of outputs: would Bonferroni plots be better?

For finding positions of interest: is FDR still prefered better?

OR: keep consistent between the two (which method)?

4. Selection Coeffcient Filtering:

Current method is to keep any significant (after FDR p.adjust) selection coefficients that are unique to predation lines (i.e no Selcoeef for Con).

This is the average Selcoef b/w two mappers (keeping the less significant p-value).

Does this method make sense?



Plots with specific questions below for plots:



Pi: Ancestral Pi for Novoalign:

Outline

The ancestral nucleotide diversity:

Ancestral Pi Plot for Novoalign

Questions

  1. Necessary for all populations?
-- Have all populations (and for bowtie and novoalign mappers)
  1. Average Pi for all mappers??
-- Calculate bwa Pi and average between three? or show one (or 2) mappers as a represenation?
  1. Overlay for changes in diversity over time?
-- Do we want overlay plots with ~splines showing the change in diversity from Ancestor --> 115?

Fst Plots:

Outline

Average pairwise Fst between control and selection replicates

-- Average b/w mappers and replicates

Generation 38: meanFst for F38

Generation 77: meanFst for F77

Generation 115: meanFst for F115

Questions

  1. Downscaling (available for all generations): Necessary? and methods?
-- previous ideas were (Fst_C:C + Fst_S:S)/2 for scaling

meanFst: Selection vs. Control: Generation115 meanFst for F115 meanFst: Control vs. Control: Generation115 Fst_Con:Con_115 meanFst: Selection vs. Selection: Generation115 Fst_Sel:Sel_115


  1. Cut off for positions?
-- Currently keeping anything with an Fst value for Con:Sel_115 comparison: any way to filter more deeply for peeks 

-- Should I keep the top 50%? the top 10% Fst values?

Model Outputs

Outline

Plots for original values and FDR adjusted

None corrected P values: TxG -log10(meanP-value) FullGenomeTxGPlot

FDR Corrected P-values: TxG -log10(meanP-value) FDRcorrection

Questions:

  1. is Bonferroni a better visualization for the paper (much less going on)

TxG: -log10(meanP) with Bonferroni Correction for multiple comparisons BonferroniCorrection_2200x1100

Advantage with this: Can create a plot on with valued of the regular plot (first one) with coloured sig. values: Would not look good with FDR:

Coloured Sig

Poolseq outputs:

Outline

Output from PoolSeq package: the significant selection coeffients that were significant for Predation lines and not for controls

Ongoing with the slow pace of Poolseq: 3L and 3R almost completed

This is the average b/w two mappers (bwa and novoalign), keeping the least significant pvalue.

poolseq_2L

Questions

  1. Plot like above (and all chromo eventually)?

  2. Any cut off for selection coefficients or just any significant selection coefficients unique to predator lines?

Trajectories and positions:

Outline

Filtered positions for:

-- pvalues <0.05 after FDR from model output

-- similar positions in the poolseq significant selection coefficients

-- Then found any overlapping windows with these positions with Fst values != 0.

Ended up with ~400 positions for both 2L and 2R each

Trajectories are the mean absolute difference the treatments had from the ancestor

Trajectory_2ndChromo

Questions:

  1. Plots of individual positions?

-- Is it informative to select some large peaked positions that are shared and show the actual trajectories of frequencies?

  1. Overlay the postions of interest onto the output from the model:

-- Most interesting plot would be the -log10(p) plot from model, should the positions present from Poolseq and Fst be coloured and used as well?

-- larger and coloured positions on the model output for example:

2L with FDR:

FDR_col

2L with Bonferroni:

bonfCOL