-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add a consent code compatibility model to the search API so researchers can find datasets by describing their use case ("I'm studying diabetes", "I'm a for-profit company") rather than knowing exact consent code strings.
PRD: docs/PRD-consent-code-compatibility.md
Problem
The search API treats consent codes as opaque strings with exact matching. Three failures:
- No permission hierarchy: "I'm studying diabetes" resolves to DS-DIAB (~9 studies), missing ~1,500 GRU and ~700 HMB studies also available for diabetes research
- No base code expansion: "GRU" exact-matches 1,081 studies, missing 462 GRU-IRB/GRU-NPU/etc. variants
- No modifier semantics: "I'm a for-profit company" has no mapping — the system doesn't understand that NPU means non-profit only
Solution
- Deterministic
consent_logic.pymodule that parses codes (base + disease + modifiers), knows the permission hierarchy (GRU ⊇ HMB ⊇ DS-X), and computes eligible codes given a researcher profile - New resolve agent tool
compute_consent_eligibility(purpose, disease, is_nonprofit, explicit_code)returns all compatible codes - Extract agent update to recognize eligibility language ("what can I use", "for-profit", "consented for") and generate consent mentions — consent filtering only activates when asked, not on every disease query
- Base code expansion so "GRU" returns all GRU-* variants; "GRU-IRB" stays exact
Acceptance Criteria
-
consent_logic.pycreated withparse_consent_code,expand_disease,compute_eligible_codes+ unit tests -
consent_codes.jsonupdated withdisease_hierarchymapping - Resolve agent has
compute_consent_eligibilitytool -
RESOLVE_PROMPT.mdupdated with eligibility-based resolution (Pattern A: explicit code, Pattern B: eligibility) -
EXTRACT_PROMPT.mdupdated with researcher profile extraction patterns - Eval cases added for extract and resolve agents
- "GRU" query returns all GRU-* variants (1,543 vs 1,081 today)
- "for-profit cancer research" excludes NPU-modified codes
- "eligible for diabetes research" returns GRU + HMB + DS-DIAB family
- "diabetes studies" (no eligibility cue) remains focus-only — no consent filter added
Files to Create/Modify
| File | Action |
|---|---|
backend/concept_search/consent_logic.py |
Create |
backend/concept_search/consent_codes.json |
Modify — add disease_hierarchy |
backend/concept_search/index.py |
Modify — add compute_consent_eligibility |
backend/concept_search/resolve_agent.py |
Modify — add tool |
backend/concept_search/RESOLVE_PROMPT.md |
Modify — eligibility strategy |
backend/concept_search/EXTRACT_PROMPT.md |
Modify — researcher profile patterns |
backend/concept_search/eval_extract.py |
Modify — add test cases |
backend/concept_search/eval_resolve.py |
Modify — add test cases |
backend/tests/test_consent_logic.py |
Create |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request