You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AdaptMap data (background): select commercial and marginal (likely not subjected to intense artificial selection, but also free of major demographic events like bottlenecks)
ideally with sample size > 50
identify common HRR (across selected commercial populations and marginal naive populations)
likely regions that are conserved irrespective of selection pressure --> regions conserved in C. hircus and that may have a biological function
fix the parameters for the identification of HRR (may be tweaked) and explore the effect of imputing missing SNP genotypes
sex chromosomes? (currently, sex info only on 40% of samples --> exclude sex chromosomes for the moment()
filtering data prior to detectRUNS? MAF (no problems here with monomorphic sites, we are looking for HRR!), missing-rate (standard criteria should suffice):
use only one set of parameters (density, minimum run length, n. of homozygous SNPs etc.) for HRR "detection" (ideally to be gauged on one or more immunity-related loci)
define metrics to evaluate the "accuracy" of detected HRR
later, extend the analysis to other species (e.g. humans)
workflow
1.extract_data.sh: script to extract relevant data from the SMARTER goat database
2.descriptive_stats.sh [OPTIONAL]
3.filter_snps.sh: script to filter SNP data based on MAF and call-rate
script 3.1.update_pop.sh [OPTIONAL]: script to update population information in the Plink files
script 3.2.imputation.sh: script to impute (if needed/wanted) missing SNP genotypes (Beagle inside)
4.distance.sh [OPTIONAL]: script to calculate IBS genetic distances between samples
5.detect_hrr.sh: script to detect HRRs (detectRUNS inside)
parse_results.R: script to parse results and produce output tables and plots