Skip to content

Commit 28a3970

Browse files
authored
script added to convert tabular data to json. Example input json files added to sandbox. Minor updates for scaffold to work with new json input files. (#493)
1 parent 5ed6c72 commit 28a3970

File tree

106 files changed

+637880
-23
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+637880
-23
lines changed

data ingestion model/README.md

Lines changed: 85 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,15 @@ The data was structured using FHIR resources to the extent possible. Since no su
1111
## Performance Data Structure
1212
### Provider information (`PractitionerRole`)
1313
The [`PractitionerRole`](https://build.fhir.org/practitionerrole.html) resource is used to represent message recipients(individuals or organizations), their relationships and their roles. The input data include a table (PractitionerRole.csv) with the following columns:
14-
- **[PractitionerRole.identifier](https://build.fhir.org/practitionerrole-definitions.html#PractitionerRole.identifier)**: Unique identifier for each row in the Practitiner table. This identifier links performance data, history and preferences to each recipient. In those datasets, PractitionerRole.identifier is refered to as subject.
14+
- **[PractitionerRole.identifier](https://build.fhir.org/practitionerrole-definitions.html#PractitionerRole.identifier)**: Unique identifier for each row in the Practitiner table. This identifier links performance data, history and preferences to each recipient. In those datasets, PractitionerRole.identifier is referred to as subject.
1515
- **[PractitionerRole.practitioner](https://build.fhir.org/practitionerrole-definitions.html#PractitionerRole.practitioner)**: Contains the practitioner identifier. If this column has a value, the row represents an individual practitioner otherwise it is aggregate data for a group for example a hospital.
1616
- **[PractitionerRole.organization](https://build.fhir.org/practitionerrole-definitions.html#PractitionerRole.organization)**: Contains the identifier of the institution where the recipient serves. This field, together with `PractitionerRole.code` is used to identify the comparator data associated with each recipient.
1717

1818
- **[PractitionerRole.code](https://build.fhir.org/practitionerrole-definitions.html#PractitionerRole.code)**: Contains the role of the recipient in the institution. Example values for this field could be `Resident`, `Attending` or `CRNA`.
1919
- **type**: Indicates whether the performance data belong to an individual provider or to a group of providers. Accordingly, a `PractitionerRole` may represent either a single provider or a group. This field is not part of the FHIR `PractitionerRole` resource; in our model, it is introduced to classify `PractitionerRole` as either individual or group, allowing us to distinguish between the two types of performance data. Example values include `Practitioner` and `Organization`.
2020

2121
### Performance data (`MeasureReport`)
22-
Performance data are modeled using the [`MeasureReport`](https://build.fhir.org/measurereport.html) resource, which represents the results of a measure evaluation. In SCAFFOLD, each row of performance data is modled as a measure report. Accordingly, the input data include a table (PerformanceMeasureReport.csv) with the following columns:
22+
Performance data are modeled using the [`MeasureReport`](https://build.fhir.org/measurereport.html) resource, which represents the results of a measure evaluation. In SCAFFOLD, each row of performance data is modeled as a measure report. Accordingly, the input data include a table (PerformanceMeasureReport.csv) with the following columns:
2323
- **[identifier](https://build.fhir.org/measurereport-definitions.html#MeasureReport.identifier)**: Uniquely identifies a specific performance data record.
2424
- **[measure](https://build.fhir.org/measurereport-definitions.html#MeasureReport.measure)**:
2525
A reference to the measure with which the performance record is associated.
@@ -86,6 +86,8 @@ Preferences captures providers' choices, priorities, and settings for messages t
8686
}
8787
```
8888
## Data Generator
89+
90+
### Generate Tabular data
8991
First, create a folder for new data (i.e. `new_data`).
9092
If the data is going to be generated for individual recipients, create a config.json inside the new data folder containing
9193
```json
@@ -117,25 +119,100 @@ python data\ ingestion\ model/sandbox\ generator/Preference.py --path new_data
117119
ENV_PATH=/Path/to/your/environment/file/dev.env python data\ ingestion\ model/sandbox\ generator/MessageHistory.py --path new_data
118120
```
119121

120-
This will start by creatig the list of hospitals in PractitionerRole.csv file. Then will generate performance data in PerformanceMeasureReports.csv. Next step will create the comparator data in ComparatorMeasureReport.csv. Then the preferences will be added to preferences.csv. Finall step will use SCAFFOLD to generate the history of messages generated by pipeline for the months before the performance month.
122+
This will start by creating the list of hospitals in PractitionerRole.csv file. Then will generate performance data in PerformanceMeasureReports.csv. Next step will create the comparator data in ComparatorMeasureReport.csv. Then the preferences will be added to preferences.csv. Final step will use SCAFFOLD to generate the history of messages generated by pipeline for the months before the performance month.
121123

122124
This process will start by creating a list of hospitals in the `PractitionerRole.csv` file. Next, performance data will be generated in `PerformanceMeasureReports.csv`. The following step will create comparator data in `ComparatorMeasureReport.csv`. Preferences will then be added to `preferences.csv`. Finally, SCAFFOLD will be used to generate the history of messages produced by the pipeline for the months preceding the performance month and store it in `MessageHistory.csv`.
123125

126+
### Convert Data To JSON-LD Inputs
127+
You can use the script in TabularToJson.py to convert data in tabular format to json-ld input files. Use the following to run this script on a path where the tabular data with all the required files that follow the SCAFFOLD data ingestion model exists to generate json-ld inputs files
128+
129+
```
130+
python data\ ingestion\ model/sandbox\ generator/TabularToJSON.py --path new_data --performance_month 2025-01-01
131+
```
132+
124133
## Example Data
125-
Sandbox hospital-level example data is generated for 100 hospitals and included in the sandbox examples folder. This folder includes:
134+
Sandbox hospital-level example data is generated for 100 hospitals and included in the sandbox examples folder. This folder includes both tabular data and json-ld input files for same hospitals.
135+
136+
137+
### Tabular data
138+
Tabular input data includes:
126139

127-
### Performance Data
140+
#### Performance Data
128141
- PractitionerRole.csv, which contains hospital definitions
129142
- PerformanceMeaasureReport, which contains performance data for each hospital on 12 defined measures in sandbox knowledge base for 12 month.
130143
- config.json, which is required to find the right comparator for each recipient
131144
- ComparatorMeasureReport.csv, which contains the comparator data based on the entire network for each measure, for each month.
132145

133-
### Prioritization Data
146+
#### Prioritization Data
134147
- Preferences.csv, which includes preferences for a small subgroup of recipients.
135148
- MessageHistory.csv, which includes history of generated messages for 11 month before the performance month.
136149

150+
### JSON-LD Input files
151+
The `JSON Messages` folder contains json-ld input files for the same data. Each file is created using the following template. See examples for more detail.
152+
153+
```json
154+
{
155+
"@context": {
156+
"dcterms": "http://purl.org/dc/terms/",
157+
"schema": "http://schema.org/",
158+
"scaffold": "http://displaylab.com/scaffold#",
159+
"psdo": "http://purl.obolibrary.org/obo/",
160+
"slowmo": "http://example.com/slowmo#",
161+
"message_template": {"@id": "psdo:PSDO_0000002"},
162+
"measure": {"@id": "psdo:PSDO_0000102"},
163+
"performance_summary_document": {"@id": "psdo:PSDO_0000098"},
164+
"performance_month": {"@id": "scaffold:performance_month"},
165+
"History": {"@id": "scaffold:History"},
166+
"Preferences": {"@id": "scaffold:Preferences"},
167+
"performance_measure_report": {"@id": "psdo:PSDO_0000107"},
168+
"comparator_measure_report": {"@id": "scaffold:comparator_measure_report"},
169+
"PractitionerRole": {"@id": "scaffold:PractitionerRole"},
170+
"subject": {"@id": "scaffold:subject"},
171+
},
172+
"message_instance_id": "",
173+
"performance_month": "2025-01-01",
174+
"@type": "psdo:performance_summary_document",
175+
"subject": "",
176+
"PractitionerRole": [
177+
[
178+
"PractitionerRole.identifier",
179+
"PractitionerRole.practitioner",
180+
"PractitionerRole.organization",
181+
"PractitionerRole.code",
182+
],
183+
],
184+
"performance_measure_report": [
185+
[
186+
"identifier",
187+
"measure",
188+
"subject",
189+
"period.start",
190+
"period.end",
191+
"measureScore.rate",
192+
"measureScore.denominator",
193+
"measureScore.range",
194+
],
195+
],
196+
"comparator_measure_report": [
197+
[
198+
"identifier",
199+
"measure",
200+
"period.start",
201+
"period.end",
202+
"measureScore.rate",
203+
"measureScore.denominator",
204+
"group.subject",
205+
"group.code",
206+
"PractitionerRole.code",
207+
],
208+
],
209+
"History": [],
210+
"Preferences": {},
211+
}
212+
```
213+
137214
## Run SCAFFOLD
138-
To run SCAFFOLD on sandbox data you need to prepare the environment and install SCAFFOLD. For more detail, follow the `Quick start` section of the [main SCAFFOLD documentation page](../README.md). Skip `Run SCAFFOLD API` and `Run SCAFFOLD CLI with JSON inputs` sections and continue with `Run SCAFFOLD CLI with CSV inputs`.
215+
To run SCAFFOLD on sandbox data you need to prepare the environment and install SCAFFOLD. For more detail, follow the `Quick start` section of the [main SCAFFOLD documentation page](../README.md). You can process JSON input files using `Run SCAFFOLD API` or `Run SCAFFOLD CLI with JSON inputs` sections. Use `Run SCAFFOLD CLI with CSV inputs` section to process tabular data.
139216

140217
## Expected Output
141218
Here is an example of the output from SCAFFOLD after processing the sandbox example data:
@@ -206,4 +283,4 @@ Successful: 100, Failed: 0
206283
| Quiet-Rating-01 | 361 | 9.7 | 2.25 | 12 | 12.0 | 2.84 | 3.3 |
207284
| Transfer-01 | 387 | 10.4 | 2.22 | 13 | 13.0 | 2.92 | 3.4 |
208285

209-
SCAFFOLD also creates a `messages` folder, which contains a summary of the generated candidates (`candidates.csv`) and a detailed JSON file for each generated message. Each JSON file includes information about the selected message, all created candidates with their scoreing details, and any generated images.
286+
SCAFFOLD also creates a `messages` folder, which contains a summary of the generated candidates (`candidates.csv`) and a detailed JSON file for each generated message. Each JSON file includes information about the selected message, all created candidates with their scoring details, and any generated images.

0 commit comments

Comments
 (0)