-
Notifications
You must be signed in to change notification settings - Fork 24
2016 08 10 Haplotype Hackathon Notes
2016-08-10
Goals
- Establish the technical platform for assigning population-cohort-IDs and frequency-set-IDs following the pattern of gl-service, MAC service and feature-service. To standardize the input and output file formats as well as required/optional parameters/variables.
- To develop a strategy for standardizing/comparing methods (building on previous work). Establish methods for evaluating the “quality” of frequency sets, and computation methods for improving frequency sets.
- Identify populations where additional high-resolution typing is needed, set priorities and target cohort sizes algorithmically. Attendees: Pradeep Bashyal, Hans-Peter Eberhard, Loren Gragert, Michael Halagan, Jan Hofmann, Steven Mack, Martin Maiers, Jurgen Sauter, Joel Schneider.
http://allelefrequencies.net/datasets.asp#tag_4
EMDIS: DONOR_CB/CBU_FULL + HML cohort.xsd
hap.xsd
We need a population id that can reference an external curation service - with controlled vocabularies
Ask Steve Mack about this
POPULATION POP_ID (required) CURATOR (required) text description (optional)
CHORT COHORT_ID (req) POP_ID (req) attributes of cohort (optional) list of individuals
METHOD METHOD_ID (req) attributes of method (optional) software version black box settings
HF HF_ID (req) COHORT_ID (req) METHOD_ID (req)
Global Predictive Match HF use hierarchy
- no prediction
- single locus global
- haplotype global
- ethnic code based
- haplotype frequency id based (HF_ID or POP_ID)
Although we would prefer to have registries exchange at level 5 as an individual attribute, if these fields are blank we will populate the service with ISO code associated with each ION.
AFNA = African: North Africa AFSS = African: Sub-Saharan Africa ASSW = Asian: Southwest Asia (Middle East, Turkey) ASSO = Asian: Southern Asia (India, Pakistan, Bangladesh, Sri Lanka, Bhutan, Nepal) ASCE = Asian: Central Asia (Eastern Russia, Kazakhstan, Uzbekistan, Kyrgyzstan, Tajikistan) ASSE = Asian: Southeast Asia (China, Mongolia, Burma, Laos, Cambodia, Thailand, Vietnam, Taiwan) ASNE = Asian: North and Northeast Asia (Japan, North Korea, South Korea) ASOC = Asian: Oceania (Pacific Islands excluding Japan, Australia, Taiwan, Sakhalin, Aleutian Islands) CAEU = Caucasian: Mainland Europe, Greenland, Iceland,Western Russia CAER = Caucasian: Eastern Russia CANA = Caucasian: North America (USA, Canada, Mexico) CAAU = Caucasian: Australia (Australia, New Zealand) HICA = Central America, Caribbean HISA = South America MX = Mixed / multiple OT = Other (ex. Australian Aborigine) UK = Unknown
Access Control: - haplotypes and genotypes level - user, organization (ION), group (WMDA, BMDW), global - license (non-commercial use allowed?, redistribution allowed?, publication of derived work?)
Genotype upload optional for curation (of HF) Genotype upload required for creation (of HF)
--
HPE
1 World 99 2 Region E Eur 33 S Amer 66 32 National e.g CN = CN+CN1+HK+TW
Armenia is 99 only
Discussion: Here are 35 sets for inclusion in the new service
ARS vs P vs G:
P and G are defined in 2010 Nomenclature
“g” is defined in 2012 IHIW IDAWG doi: 10.1111/iji.12026
Halagan’s service on Github will convert