Transfer Of Resources in Clinical Healthcare or Torch is a project that aims to provide a service that allows the execution of data extraction queries on a FHIR-server.
The tool will take a CRTDL and StructureDefinitions to extract specific resources from a FHIR server. It first extracts a cohort based on the cohort definition part of the CRTDL using either CQL or FLARE and FHIR Search. It then extracts resources for the cohort as specified in the cohort extraction part of the CRTDL, which specifies which resources to extract (Filters) and which attributes to extract for each resource.
The tool internally uses the HAPI implementation for handling FHIR Resources.
The Clinical Resource Transfer Definition Language or CRTDL is a JSON format, which specifies a data request. This request is composed of two parts: The cohort definition (for who (which patients) should data be extracted) The data extraction (what data should be extracted)
TORCH interacts with the following components directly:
- a CQL ready FHIR Server like Blaze **OR ** FLARE
- A FHIR Server / FHIR Search API
- Reverse Proxy (NGINX)
The reverse proxy allows for integration into a site's multi-server infrastructure and provides a means of serving the extracted data.
For container images, we use cosign to sign images. This allows users to confirm the image was built by the expected CI pipeline and has not been modified after publication.
cosign verify "ghcr.io/medizininformatik-initiative/torch:v1.0.0" \
--certificate-identity-regexp "https://github.com/medizininformatik-initiative/torch.*" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
--certificate-github-workflow-ref="refs/tags/v1.0.0" \
-o text
The expected output is:
Verification for ghcr.io/medizininformatik-initiative/torch:v1.0.0 --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- Existence of the claims in the transparency log was verified offline
- The code-signing certificate was verified using trusted certificate authority certificates
This output ensures that the image was build on the GitHub workflow on the repository
medizininformatik-initiative/torch
and tag v1.0.0
.
TORCH supports CQL or FHIR Search for the cohort selection part.
If your FHIR server does not support CQL itself then the FLARE component must be used to extract the cohort based on the cohort definition of the CRTDL.
The cohort evaluation strategy can be set using the TORCH_USE_CQL setting in the enviroment variables
For simplicity torch is integrated in the feasibility-triangle of the feasibility-deploy repository, but can also be installed without it.
For the latest build see: https://github.com/medizininformatik-initiative/torch/pkgs/container/torch
Name | Default | Description |
---|---|---|
SERVER_PORT | 8080 | The Port of the server to use |
TORCH_PROFILE_DIR | structureDefinitions | The directory for profile definitions. |
TORCH_MAPPING_CONSENT | mappings/consent-mappings_fhir.json | The file for consent mappings in FHIR format. |
TORCH_MAPPING_TYPE_TO_CONSENT | mappings/type_to_consent.json | The file mapping Resource Types to time fields against which the consent is checked |
TORCH_FHIR_USER | – | The FHIR server user. |
TORCH_FHIR_PASSWORD | – | The FHIR server password. |
TORCH_FHIR_OAUTH_ISSUER_URI | – | The URI for the OAuth issuer. |
TORCH_FHIR_OAUTH_CLIENT_ID | – | The client ID for OAuth. |
TORCH_FHIR_OAUTH_CLIENT_SECRET | – | The client secret for OAuth. |
TORCH_FHIR_URL | – | The base URL of the FHIR server to use. |
TORCH_FHIR_MAX_CONNECTIONS | 5 | The maximum number of concurrent connections to use - has to be (TORCH_MAXCONCURRENCY + 1) |
TORCH_FHIR_PAGE_COUNT | 500 | The number of pages in a FHIR search response. |
TORCH_FHIR_DISABLE_ASYNC | false | Set to true in order to disable use of Asynchronous Interaction Request Pattern. |
TORCH_FLARE_URL | – | The base URL of the FLARE server to use. |
TORCH_RESULTS_DIR | output/ | The directory for storing results. |
TORCH_RESULTS_PERSISTENCE | PT2160H | Time Block for result persistence in ISO 8601 format in hours/minutes/seconds. Default 90 days |
TORCH_BATCHSIZE | 500 | The batch size used for processing data. |
TORCH_MAXCONCURRENCY | 4 | The maximum concurrency level for processing. |
TORCH_MAPPINGS_FILE | ontology/mapping_cql.json | The file for ontology mappings using CQL. |
TORCH_BUFFERSIZE | 100 | Size in MB of the webclientbuffer that interacts with the FHIR server |
TORCH_CONCEPT_TREE_FILE | ontology/mapping_tree.json | The file for the concept tree mapping. |
TORCH_DSE_MAPPING_TREE_FILE | ontology/dse_mapping_tree.json | The file for DSE concept tree mapping. |
TORCH_USE_CQL | true | Flag indicating if CQL should be used. |
TORCH_BASE_URL | – | The server name before the proxy from which torch is accessed |
TORCH_OUTPUT_FILE_SERVER_URL | – | The URL to access Result location TORCH_RESULTS_DIR served by a proxy/fileserver |
LOG_LEVEL _DE_MEDIZININFORMATIKINITIATIVE_TORCH |
info | Log level for torch core functionality. |
LOG_LEVEL _CA_UHN_FHIR |
error | Log level for HAPI FHIR library. |
SPRING_PROFILES_ACTIVE | active | The active Spring profile. |
SPRING_CODEC_MAX_IN_MEMORY_SIZE | 100MB | The maximum in-memory size for Spring codecs. |
Torch implements the FHIR Asynchronous Bulk Data Request Pattern.
The $extract-data endpoint implements the kick-off request in the Async Bulk Pattern. It receives a FHIR parameters resource with a crtdl parameter containing a valueBase64Binary CRTDL. All examples are with a torch configured with http://localhost:8086.
scripts/create-parameters.sh src/test/resources/CRTDL/CRTDL_observation.json | curl -s 'http://localhost:8086/fhir/$extract-data' -H "Content-Type: application/fhir+json" -d @- -v
The Parameters resource created by scripts/create-parameters.sh
look like this:
{
"resourceType": "Parameters",
"parameter": [
{
"name": "crtdl",
"valueBase64Binary": "<Base64 encoded CRTDL>"
}
]
}
Optionally patient ids can be submitted for a known cohort, bypassing the cohort selection in the CRTDL:
{
"resourceType": "Parameters",
"parameter": [
{
"name": "crtdl",
"valueBase64Binary": "<Base64 encoded CRTDL>"
},
{
"name": "patient",
"valueString": "<Patient Id 1>"
},
{
"name": "patient",
"valueString": "<Patient Id 2>"
}
]
}
- HTTP Status Code of 4XX or 5XX
- HTTP Status Code of 202 Accepted
- Content-Location header with the absolute URL of an endpoint for subsequent status requests (polling location)
That location header can be used in the following status query: E.g. location:"/fhir/__status/1234"
Torch provides a Status Request Endpoint which can be called using the location from the extract Data Endpoint.
curl -s http://localhost:8080/fhir/__status/{location}
- HTTP Status Code of 202 Accepted
- HTTP status code of 4XX or 5XX
- Content-Type header of application/fhir+json
- Operation Outcome with fatal issue
- HTTP status of 200 OK
- Content-Type header of application/fhir+json
- A body containing a JSON file describing the file links to the batched transformation results
curl -s 'http://localhost:8080/fhir/$extract-data' -H "Content-Type: application/fhir+json" -d '<query>'
the result is a looks something like this:
{
"requiresAccessToken": false,
"output": [
{
"type": "Bundle",
"url": "http://localhost:8080/da4a1c56-f5d9-468c-b57a-b8186ea4fea8/6c88f0ff-0e9a-4cf7-b3c9-044c2e844cfc.ndjson"
},
{
"type": "Bundle",
"url": "http://localhost:8080/da4a1c56-f5d9-468c-b57a-b8186ea4fea8/a4dd907c-4d98-4461-9d4c-02d62fc5a88a.ndjson"
},
{
"type": "Bundle",
"url": "http://localhost:8080/da4a1c56-f5d9-468c-b57a-b8186ea4fea8/f33634bd-d51b-463c-a956-93409d96935f.ndjson"
}
],
"request": "http://localhost:8080//fhir/$extract-data",
"deleted": [],
"transactionTime": "2024-09-05T12:30:32.711151718Z",
"error": []
}
After Response Complete is returned the result files in ndjson format are located in Output directory set in enviroment variables. In the case of error an error.json can be found in the Output directory detailing the fatal error.
Note that each patient batch is written to the output directory and the files are written before the process completes. This can be used to track the progress of your data extraction.
If a server is set up for the files e.g. NGINX, the files can be fetched by a Request on the URL set in TORCH_OUTPUT_FILE_SERVER_URL in enviroment variables.
curl -s "http://localhost:8080/da4a1c56-f5d9-468c-b57a-b8186ea4fea8/f33634bd-d51b-463c-a956-93409d96935f.ndjson"
The ndjson will contain one transaction bundle per Patient.
Torch provides a Status Request Endpoint which provides a overview extract Data Endpoint.
curl -s http://localhost:8080/fhir/__status/
Required fields that were not extracted and slices that are unknown in the Structure Definition are set to Data Absent Reason "masked".
For example a CRTDL only extracting Condition.onset will result in this:
{
"resourceType": "Condition",
"id": "TestID",
"meta": {
"profile": [
"https://www.medizininformatik-initiative.de/fhir/core/modul-diagnose/StructureDefinition/Diagnose"
]
},
"code": {
"extension": [
{
"url": "http://hl7.org/fhir/StructureDefinition/data-absent-reason",
"valueCode": "masked"
}
]
},
"subject": {
"extension": [
{
"url": "http://hl7.org/fhir/StructureDefinition/data-absent-reason",
"valueCode": "masked"
}
]
},
"onsetDateTime": "2024-10"
}
TORCH provides a companion transfer script designed to automate the workflow of submitting a data extraction request, polling the status, and transferring the resulting files to a target FHIR server.
The transfer script will:
- Take the CRTDL and generate a FHIR parameters resource to send to TORCH.
- Execute $extract-data operation on the TORCH server using parameters resource as input.
- Poll the TORCH status endpoint until the export is complete.
- Download the resulting patient-oriented FHIR bundles into a temp dir.
- Upload these files to a configured target FHIR server using the
blazectl
tool. - Provide progress feedback and error handling at each step.
./transfer-extraction-to-dup-fhir-server.sh -c src/test/resources/CRTDL/CRTDL_observation.json -t http://target-fhir-server:8080/fhir
The transfer script respects several environment variables to configure server URLs, directories, retry counts, and timing:
Variable | Default | Description |
---|---|---|
TORCH_BASE_URL | http://localhost:8080 | Base URL of the TORCH server |
MAX_RETRIES | 60 | Number of status polling attempts before timing out |
SLEEP_SECONDS | 5 | Seconds to wait between polling attempts |
- Loading of StructureDefinitions
- Redacting and Copying of Resources based on StructureDefinitions
- Parsing CRTDL
- Interaction with a Flare and FHIR Server
- MII Consent handling
- OAuth Support
- Multi Profile Support
- Extended Reference Resolving
- Auto Extracting modifiers
- Black or Whitelisting of certain ElementIDs locally
- Validating against Profiles
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an " AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.