Skip to content

Commit bc8d7df

Browse files
NoopDogclaude
andauthored
feat: add consent code eligibility logic for search API (#199)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b29ba7e commit bc8d7df

File tree

9 files changed

+989
-430
lines changed

9 files changed

+989
-430
lines changed

backend/concept_search/EXTRACT_PROMPT.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,16 @@ For these facets, extract the user's text and leave `values` empty. A resolve ag
8585

8686
**Important:** A disease name is a **consentCode** (not focus) when the query describes what research the data is _consented for_, not what the data is _about_. Context clues: "consented for", "approved for", "data use", "research consented". Example: "diabetes datasets consented for Alzheimer's research" → focus="diabetes", consentCode="Alzheimer's". Also recognize semantic descriptions: "general research use" → consentCode, "health and medical" → consentCode, "not for profit" → consentCode.
8787

88+
**Eligibility language** — the following cue words signal the user is asking about what data they are _allowed to use_, which means consentCode: "what can I use", "what datasets can I use", "eligible for", "consented for", "approved for", "available for my research", "for-profit", "non-profit", "commercial use". When these cues are present, emit a **consentCode** mention in addition to any focus mention. Without these cues, disease mentions stay as `focus` only.
89+
90+
Key rule: "diabetes studies" = focus only. "What diabetes datasets can I use?" = focus + consentCode (because "can I use" signals eligibility).
91+
92+
**Dual mentions:** When eligibility cues appear alongside a disease, emit BOTH a focus mention (what the study is about) AND a consentCode mention (what the data is permitted for). Examples:
93+
94+
- "I'm a nonprofit studying cancer, what's available?" → focus="cancer" + consentCode="nonprofit cancer"
95+
- "for-profit diabetes datasets" → focus="diabetes" + consentCode="for-profit diabetes"
96+
- "what datasets can I use for Alzheimer's research?" → focus="Alzheimer's" + consentCode="Alzheimer's"
97+
8898
## Instructions
8999

90100
1. Determine the query **intent** (`"study"`, `"variable"`, or `"auto"`) — see "Query Intent" above.

backend/concept_search/RESOLVE_PROMPT.md

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -74,27 +74,48 @@ If category drill-down doesn't find a match, fall back to `search_concepts`:
7474
- You can combine both: try category drill-down, then search if needed.
7575
- **Cross-category terms:** Some terms appear in multiple categories (e.g., "cholesterol" exists in Laboratory Tests, Metabolomics, Dietary & Nutrition, and Medications). When a term could fit multiple categories, try 2–3 categories and **pick the concept with the highest study count**. Set `message` to disambiguate: "Did you mean Total Cholesterol (93 studies), HDL Cholesterol (78 studies), or Dietary Cholesterol Intake (9 studies)?"
7676

77-
## Consent Code Facet — Category Drill-Down
77+
## Consent Code Facet — Eligibility Resolution
7878

79-
For **consentCode** mentions, use context-driven drill-down:
79+
For **consentCode** mentions, use one of two patterns:
8080

81-
1. Call `get_consent_code_categories()` to see the base codes (GRU, HMB, DS, etc.) with descriptions, study counts, and modifier definitions.
82-
2. Use your understanding of the mention to pick the right base code:
83-
- "general research use", "open access", "unrestricted" → GRU
84-
- "health research", "biomedical only" → HMB
85-
- Any disease name → DS (disease-specific)
86-
- "not for profit" → look for NPU modifier on the right base
87-
3. If the mention refers to a disease, call `get_disease_specific_codes()` to see all DS-\* disease categories with their full names.
88-
4. Optionally call `get_consent_codes_for_base(base_code)` to see all variants with modifiers (e.g., all GRU-_ or DS-CVD-_ codes).
89-
5. Return the broadest matching code unless the user specifies modifiers.
81+
### Pattern A: Explicit Code
82+
83+
When the mention text IS a consent code (e.g. "GRU", "HMB-IRB", "DS-CVD"):
84+
85+
1. Call `compute_consent_eligibility(explicit_code=<the code>)` to get the code and all its modifier variants.
86+
2. Return ALL eligible codes from the result — these are combined with OR.
9087

9188
**Examples:**
9289

93-
- "general research use" → get categories → pick GRU
94-
- "breast cancer research" → get categories → see DS → get disease codes → pick DS-BRCA
95-
- "HMB-IRB" → direct code, return as-is
96-
- "cardiovascular disease, not for profit" → get disease codes → DS-CVD → get variants → DS-CVD-NPU-MDS or just DS-CVD
97-
- "open access no restrictions" → GRU (semantic match, "open access" = general research use)
90+
- "GRU" → `compute_consent_eligibility(explicit_code="GRU")` → returns GRU, GRU-IRB, GRU-NPU, etc.
91+
- "HMB-IRB" → `compute_consent_eligibility(explicit_code="HMB-IRB")` → returns just HMB-IRB
92+
- "DS-CVD" → `compute_consent_eligibility(explicit_code="DS-CVD")` → returns DS-CVD, DS-CVD-IRB, etc.
93+
94+
### Pattern B: Eligibility / Use-Case
95+
96+
When the mention describes a research use case or eligibility (e.g. "diabetes research", "for-profit cancer datasets", "general health research"):
97+
98+
1. Determine **purpose**: "general" (unrestricted), "health" (health/medical/biomedical), or "disease" (specific disease).
99+
2. Determine **is_nonprofit**: False if mention says "for-profit" or "commercial"; True or None otherwise.
100+
3. Call `compute_consent_eligibility(purpose=..., disease=..., is_nonprofit=...)` — the disease parameter accepts full names ("diabetes", "cancer", "type 1 diabetes") or abbreviations ("DIAB", "CA", "T1D"). The tool resolves names automatically.
101+
4. Return ALL eligible codes from the result.
102+
103+
**Examples:**
104+
105+
- "diabetes research" → `compute_consent_eligibility(purpose="disease", disease="diabetes")` → returns GRU*, HMB*, DS-DIAB*, DS-T1D*, etc.
106+
- "for-profit cancer datasets" → `compute_consent_eligibility(purpose="disease", disease="cancer", is_nonprofit=False)` → returns codes without NPU modifier
107+
- "general health research at a university" → `compute_consent_eligibility(purpose="health")` → returns GRU* + HMB* + HMP + HR
108+
- "general research use" → `compute_consent_eligibility(explicit_code="GRU")`
109+
- "open access no restrictions" → `compute_consent_eligibility(explicit_code="GRU")`
110+
- "type 1 diabetes consent" → `compute_consent_eligibility(purpose="disease", disease="type 1 diabetes")`
111+
- "consented for diabetes only" → `compute_consent_eligibility(purpose="disease", disease="diabetes", disease_only=True)` → returns only DS-DIAB\* codes, not GRU/HMB
112+
- "specifically consented for cancer" → `compute_consent_eligibility(purpose="disease", disease="cancer", disease_only=True)`
113+
114+
**disease_only flag:** Set `disease_only=True` when the user says "only", "specifically", "disease-specific", or otherwise indicates they want datasets with a disease-specific consent code — not all datasets that happen to be eligible. This excludes GRU, HMB, and other broad codes.
115+
116+
### Exploration Tools (still available)
117+
118+
You can still use `get_consent_code_categories()`, `get_disease_specific_codes()`, and `get_consent_codes_for_base()` to explore codes before calling `compute_consent_eligibility`.
98119

99120
## General Selection Rules
100121

backend/concept_search/consent_codes.json

Lines changed: 26 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -19,81 +19,31 @@
1919
"RD": "Research-specific restrictions or rare disease",
2020
"MDS-GSO": "Methods Development and Genetic Studies Only"
2121
},
22-
"disease_abbreviations": {
23-
"CA": "Cancer (all types)",
24-
"CVD": "Cardiovascular Disease",
25-
"BRCA": "Breast Cancer",
26-
"OVCA": "Ovarian Cancer",
27-
"LC": "Lung Cancer",
28-
"PC": "Prostate Cancer",
29-
"DIAB": "Diabetes",
30-
"T1D": "Type 1 Diabetes",
31-
"T1DR": "Type 1 Diabetes Research",
32-
"T2D": "Type 2 Diabetes",
33-
"COPD": "Chronic Obstructive Pulmonary Disease",
34-
"ASTHMA": "Asthma",
35-
"ASD": "Autism Spectrum Disorder",
36-
"ALZ": "Alzheimer's Disease",
37-
"SCD": "Sickle Cell Disease",
38-
"HIV": "Human Immunodeficiency Virus / AIDS",
39-
"HCC": "Hepatocellular Carcinoma",
40-
"SLE": "Systemic Lupus Erythematosus",
41-
"IBD": "Inflammatory Bowel Disease",
42-
"CF": "Cystic Fibrosis",
43-
"CFD": "Craniofacial Disorders",
44-
"CFR": "Cystic Fibrosis Research",
45-
"CC": "Colorectal Cancer",
46-
"CKD": "Chronic Kidney Disease",
47-
"EP": "Epilepsy",
48-
"CHD": "Congenital Heart Disease",
49-
"ED": "Eating Disorders",
50-
"SZRD": "Schizophrenia-Related Disorders",
51-
"HEM": "Hematologic Diseases",
52-
"PACA": "Pancreatic Cancer",
53-
"NHL": "Non-Hodgkin Lymphoma",
54-
"LEU": "Leukemia",
55-
"AMD": "Age-Related Macular Degeneration",
56-
"LD": "Liver Disease",
57-
"LIVD": "Liver Disease",
58-
"NDR": "Neurodevelopmental Research",
59-
"PD": "Parkinson's Disease",
60-
"STK": "Stroke",
61-
"MS": "Multiple Sclerosis",
62-
"RA": "Rheumatoid Arthritis",
63-
"TB": "Tuberculosis",
64-
"MM": "Multiple Myeloma",
65-
"HNC": "Head and Neck Cancer",
66-
"FDO": "Familial Dysautonomia / Dysautonomic Disorders",
67-
"PEDCR": "Pediatric Cancer Research",
68-
"BOED": "Bone Disease",
69-
"AF": "Atrial Fibrillation",
70-
"IR": "Insulin Resistance",
71-
"DEM": "Dementia",
72-
"HLBS": "Heart, Lung, Blood, and Sleep disorders",
73-
"DRC": "Diabetes-Related Complications",
74-
"LDS": "Lysosomal Storage Diseases",
75-
"ESD": "Esophageal Diseases",
76-
"MSC": "Musculoskeletal Conditions",
77-
"MAL": "Malaria",
78-
"HF": "Heart Failure",
79-
"MI": "Myocardial Infarction",
80-
"ADO": "Autoimmune Disorders",
81-
"BLADDERCA": "Bladder Cancer",
82-
"BROC": "Brain and Other Central Nervous System Cancer",
83-
"BT": "Brain Tumor",
84-
"CS": "Cushing's Syndrome",
85-
"ALC": "Alcohol-Related Disorders",
86-
"ADHD": "Attention Deficit Hyperactivity Disorder",
87-
"OC": "Ovarian Cancer",
88-
"HM": "Hematologic Malignancies",
89-
"CLL": "Chronic Lymphocytic Leukemia",
90-
"RCC": "Renal Cell Carcinoma",
91-
"BPD": "Bronchopulmonary Dysplasia",
92-
"PAD": "Peripheral Arterial Disease",
93-
"SARC": "Sarcoidosis",
94-
"OFC": "Orofacial Clefts",
95-
"UC": "Ulcerative Colitis",
96-
"LCTC": "Lung Cancer and Tobacco Control",
97-
"TMJD": "Temporomandibular Joint Disorders"
22+
"disease_hierarchy": {
23+
"CA": [
24+
"BRCA",
25+
"OVCA",
26+
"LC",
27+
"PC",
28+
"HCC",
29+
"CC",
30+
"PACA",
31+
"NHL",
32+
"LEU",
33+
"HNC",
34+
"MM",
35+
"RCC",
36+
"CLL",
37+
"OC",
38+
"BLADDERCA",
39+
"BROC",
40+
"BT",
41+
"PEDCR",
42+
"LCTC",
43+
"HM"
44+
],
45+
"CVD": ["AF", "HF", "MI", "PAD", "STK", "CHD"],
46+
"DIAB": ["T1D", "T2D", "DRC", "T1DR", "IR"],
47+
"LD": ["LIVD", "ESD"]
9848
}
9949
}

0 commit comments

Comments
 (0)