Skip to content

Remove outliers across per-contig VCFs#654

Draft
epiercehoffman wants to merge 7 commits intomainfrom
eph_remove_outliers_across_contigs
Draft

Remove outliers across per-contig VCFs#654
epiercehoffman wants to merge 7 commits intomainfrom
eph_remove_outliers_across_contigs

Conversation

@epiercehoffman
Copy link
Collaborator

Updates

New workflow to remove outlier samples.

  • Uses src/sv-pipeline/scripts/downstream_analysis_and_filtering/determine_svcount_outliers.R for plotting and outlier determination which only considers SV types with a median SVs per sample of at least 100
  • Takes per-contig VCFs as input
  • Only performs outlier determination based on autosomes
  • Can rerun with new inputs and settings to separately perform SV counting, outlier determination at different thresholds, and filtering without redoing previous steps
  • Includes bcftools preprocessing step to restrict SVs considered during outlier determination
  • Filters sample list
  • Can provide list of additional (ex. withdrawn) samples to exclude at the same time as outlier removal

Testing

Tested on 1kgp reference panel with different settings and inputs.

Marking as draft while development for Phase 2 is ongoing. Designed for Phase 2 usage so may need changes to be more generally applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant