Skip to content

feat(pipeline): add CRTDL preprocessing for DIMP enrichment#122

Merged
trobanga merged 1 commit intomainfrom
feature/75-crtdl-preprocessing
Mar 12, 2026
Merged

feat(pipeline): add CRTDL preprocessing for DIMP enrichment#122
trobanga merged 1 commit intomainfrom
feature/75-crtdl-preprocessing

Conversation

@trobanga
Copy link
Collaborator

Summary

Add capability to enrich CRTDL files with additional attributes required by DIMP (pseudonymization) before sending them to TORCH.

  • Add/update attributes in existing CRTDL groups by groupReference
  • Create new groups when addGroupIfNotExists: true
  • Resolve linkedGroups profile URLs to group IDs
  • Support both external JSON file and inline YAML configuration
  • Save enriched CRTDL to job directory for debugging/auditing

Files created:

  • internal/models/crtdl_preprocessing.go: Config and enrichment models
  • internal/services/crtdl_preprocessor.go: Pure enrichment functions
  • tests/unit/crtdl_preprocessor_test.go: 20 comprehensive unit tests

Closes #75

Test plan

  • All 20 new unit tests pass
  • Full test suite passes (make test)
  • Code compiles without errors
  • Manual testing with sample CRTDL file and enrichment config

@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch from 2f1498c to 4b1dcd7 Compare January 26, 2026 05:51
@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2026

Codecov Report

❌ Patch coverage is 85.43417% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.78%. Comparing base (cfff7f3) to head (f3cdaf3).

Files with missing lines Patch % Lines
internal/services/torch_client.go 54.54% 20 Missing and 5 partials ⚠️
internal/models/crtdl_preprocessing.go 87.59% 9 Missing and 7 partials ⚠️
internal/pipeline/import.go 78.04% 4 Missing and 5 partials ⚠️
internal/services/crtdl_preprocessor.go 98.27% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #122      +/-   ##
==========================================
- Coverage   84.78%   84.78%   -0.01%     
==========================================
  Files          48       50       +2     
  Lines        5173     5514     +341     
==========================================
+ Hits         4386     4675     +289     
- Misses        598      632      +34     
- Partials      189      207      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch 2 times, most recently from bbcfcd7 to 67e7f9a Compare January 26, 2026 08:20
@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch 4 times, most recently from 43a59b7 to 462fe4d Compare January 26, 2026 14:18
Copy link
Contributor

@juliangruendner juliangruendner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not seem to do anything -

when using this enrichment file:

[
  {
    "group_reference": "https://www.medizininformatik-initiative.de/fhir/core/modul-fall/StructureDefinition/KontaktGesundheitseinrichtung",
    "add_group_if_not_exists": true,
    "attributes_to_add": [
      {
        "attributeRef": "Encounter.class",
        "mustHave": false
      },
      {
        "attributeRef": "Encounter.status",
        "mustHave": false
      },
      {
        "attributeRef": "Encounter.serviceType",
        "mustHave": false
      }
    ]
  }
]

the group does not appear in the enriched CRTDL

@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch 2 times, most recently from b5ab1a4 to c324ed1 Compare February 4, 2026 10:16
@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch 2 times, most recently from 784922a to efa34b4 Compare February 4, 2026 12:22
@trobanga trobanga marked this pull request as draft February 4, 2026 14:13
@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch 3 times, most recently from 2d5fb2b to fd028ad Compare February 5, 2026 09:07
@trobanga trobanga marked this pull request as ready for review February 5, 2026 09:08
@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch 3 times, most recently from 558c077 to f57469f Compare February 20, 2026 07:54
Copy link
Contributor

@juliangruendner juliangruendner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the following enrichment config does not lead to the group being added if it does not exist in the CRTDL

[
  {
    "groupReference": "https://www.medizininformatik-initiative.de/fhir/core/modul-person/StructureDefinition/PatientPseudonymisiert",
    "addGroupIfNotExists": true,
    "attributesToAdd": [
      {
        "attributeRef": "Patient.identifier",
        "mustHave": false
      }
    ]
  },
  {
    "groupReference": "https://www.medizininformatik-initiative.de/fhir/core/modul-fall/StructureDefinition/KontaktGesundheitseinrichtung",
    "addGroupIfNotExists": true,
    "attributesToAdd": [
      {
        "attributeRef": "Encounter.identifier",
        "mustHave": false
      }
    ]
  }
]

@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch 2 times, most recently from 36b4e20 to 0ded055 Compare March 10, 2026 11:37
@trobanga trobanga force-pushed the feature/75-crtdl-preprocessing branch from 0ded055 to 935806a Compare March 11, 2026 14:00
Add capability to enrich CRTDL files with additional attributes required
by DIMP (pseudonymization) before sending them to TORCH. This addresses
the need for identifiers like Patient.identifier:PseudonymisierterIdentifier
and Encounter.identifier:Aufnahmenummer that are not part of research
queries but needed for pseudonymization.

Key features:
- Add/update attributes in existing CRTDL groups by groupReference
- Create new groups when addGroupIfNotExists is true
- Resolve linkedGroups profile URLs to group IDs
- Support both external JSON file and inline YAML configuration
- Save enriched CRTDL to job directory for debugging/auditing

Files created:
- internal/models/crtdl_preprocessing.go: Config and enrichment models
- internal/services/crtdl_preprocessor.go: Pure enrichment functions
- tests/unit/crtdl_preprocessor_test.go: 20 comprehensive unit tests

Closes #75
@trobanga trobanga merged commit 4cae69d into main Mar 12, 2026
12 checks passed
@trobanga trobanga deleted the feature/75-crtdl-preprocessing branch March 12, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dimp CRTDL pre-processing step

3 participants