Skip to content

feat: consent code compatibility model for search API #198

@NoopDog

Description

@NoopDog

Summary

Add a consent code compatibility model to the search API so researchers can find datasets by describing their use case ("I'm studying diabetes", "I'm a for-profit company") rather than knowing exact consent code strings.

PRD: docs/PRD-consent-code-compatibility.md

Problem

The search API treats consent codes as opaque strings with exact matching. Three failures:

  1. No permission hierarchy: "I'm studying diabetes" resolves to DS-DIAB (~9 studies), missing ~1,500 GRU and ~700 HMB studies also available for diabetes research
  2. No base code expansion: "GRU" exact-matches 1,081 studies, missing 462 GRU-IRB/GRU-NPU/etc. variants
  3. No modifier semantics: "I'm a for-profit company" has no mapping — the system doesn't understand that NPU means non-profit only

Solution

  • Deterministic consent_logic.py module that parses codes (base + disease + modifiers), knows the permission hierarchy (GRU ⊇ HMB ⊇ DS-X), and computes eligible codes given a researcher profile
  • New resolve agent tool compute_consent_eligibility(purpose, disease, is_nonprofit, explicit_code) returns all compatible codes
  • Extract agent update to recognize eligibility language ("what can I use", "for-profit", "consented for") and generate consent mentions — consent filtering only activates when asked, not on every disease query
  • Base code expansion so "GRU" returns all GRU-* variants; "GRU-IRB" stays exact

Acceptance Criteria

  • consent_logic.py created with parse_consent_code, expand_disease, compute_eligible_codes + unit tests
  • consent_codes.json updated with disease_hierarchy mapping
  • Resolve agent has compute_consent_eligibility tool
  • RESOLVE_PROMPT.md updated with eligibility-based resolution (Pattern A: explicit code, Pattern B: eligibility)
  • EXTRACT_PROMPT.md updated with researcher profile extraction patterns
  • Eval cases added for extract and resolve agents
  • "GRU" query returns all GRU-* variants (1,543 vs 1,081 today)
  • "for-profit cancer research" excludes NPU-modified codes
  • "eligible for diabetes research" returns GRU + HMB + DS-DIAB family
  • "diabetes studies" (no eligibility cue) remains focus-only — no consent filter added

Files to Create/Modify

File Action
backend/concept_search/consent_logic.py Create
backend/concept_search/consent_codes.json Modify — add disease_hierarchy
backend/concept_search/index.py Modify — add compute_consent_eligibility
backend/concept_search/resolve_agent.py Modify — add tool
backend/concept_search/RESOLVE_PROMPT.md Modify — eligibility strategy
backend/concept_search/EXTRACT_PROMPT.md Modify — researcher profile patterns
backend/concept_search/eval_extract.py Modify — add test cases
backend/concept_search/eval_resolve.py Modify — add test cases
backend/tests/test_consent_logic.py Create

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions