Skip to content

Commit fd56693

Browse files
committed
Update README
1 parent 3d08ee2 commit fd56693

File tree

1 file changed

+76
-22
lines changed

1 file changed

+76
-22
lines changed

apps/bfd-model-idr/README.md

Lines changed: 76 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ uv run compile_resources.py \
7575

7676
#### Patient Data - `patient_generator.py`
7777

78-
##### Usage
78+
##### `patient_generator.py` usage
7979

8080
```text
8181
usage: patient_generator.py [-h] [--patients PATIENTS] [--claims]
@@ -113,7 +113,7 @@ options:
113113
114114
```
115115

116-
##### Generating Data
116+
##### Generating patient data
117117

118118
To generate synthetic patient data, the patient_generator.py script is used.
119119
To utilize it to generate an entirely _new_ set of data from nothing:
@@ -146,33 +146,87 @@ The files output will be in the `out` folder:
146146

147147
The patient generator creates synthetic beneficiary data with realistic but _synthetic_ MBIs, coverage information, and historical records. It can generate multiple MBI versions per beneficiary and handles beneficiary cross-references with kill credit switches.
148148

149-
#### Claims data
149+
#### Claims data - `claims_generator.py`
150150

151-
To generate synthetic claims data, the claims_generator.py script is used.
152-
To utilize it:
151+
<!-- TODO: Provide an official location for downloading synthetic claims data -->
152+
> [!IMPORTANT]
153+
> Synthetic claims data is _much_ larger in size relative to patient data, and so it is not stored in the repository under `./synthetic-data`. If you are looking to regnerate this data, please reach out in #bfd so that the existing dataset can be provided to you.
154+
155+
#### `claims_generator.py` usage
156+
157+
```text
158+
Usage: claims_generator.py [OPTIONS] [PATHS]...
159+
160+
Generate synthetic claims data. Provided file PATHS will be updated with new
161+
fields.
162+
163+
Options:
164+
--sushi / --no-sushi Generate new StructureDefinitions. Use when
165+
testing locally if new .fsh files have been
166+
added.
167+
--min-claims INTEGER Minimum number of claims to generate per
168+
person
169+
--max-claims INTEGER Maximum number of claims to generate per
170+
person
171+
--force-pac-claims / --no-force-pac-claims
172+
Generate _new_ partially-adjudicated claims
173+
when existing pac claims tables exist in the
174+
synthetic data provided
175+
--help Show this message and exit.
176+
```
177+
178+
#### Generating claims data
179+
180+
> [!WARNING]
181+
> Either `SYNTHETIC_CLM.csv` or `SYNTHETIC_BENE_HSTRY.csv` **must** be provided as claims data generation requires an existing `BENE_SK` or `CLM` to generate/regenerate data.
182+
183+
To generate synthetic claims data, the `claims_generator.py` script is used.
184+
185+
##### Using `SYNTHETIC_BENE_HSTRY.csv`
186+
187+
The below will generate _entirely new claims_ for the given `BENE_SK`s in the provided file:
153188

154189
```sh
155190
uv run claims_generator.py \
156191
--sushi \
157192
out/SYNTHETIC_BENE_HSTRY.csv
158193
```
159194

160-
--sushi is not strictly needed, if you have a local copy of the compiled shorthand files, but recommended to reduce drift. To specify a list of benes, pass in a .csv file containing a column named BENE_SK.
161-
The files output will be in the out folder, there are several files:
162-
SYNTHETIC_CLM.csv
163-
SYNTHETIC_CLM_LINE.csv
164-
SYNTHETIC_CLM_VAL.csv
165-
SYNTHETIC_CLM_DT_SGNTR.csv
166-
SYNTHETIC_CLM_PROD.csv
167-
SYNTHETIC_CLM_INSTNL.csv
168-
SYNTHETIC_CLM_LINE_INSTNL.csv
169-
SYNTHETIC_CLM_DCMTN.csv
170-
SYNTHETIC_CLM_FISS.csv
171-
SYNTHETIC_CLM_PRFNL.csv
172-
SYNTHETIC_CLM_LINE_PRFNL.csv
173-
SYNTHETIC_CLM_ANSI_SGNTR.csv
174-
175-
These files represent the schema of the tables the information is sourced from, although for tables other than CLM_DT_SGNTR, the CLM_UNIQ_ID is propagated instead of the 5 part unique key from the IDR.
195+
##### Regenerating existing claims data
196+
197+
The below will _re-generate_ **existing claims data** (assume `<PATH_TO_CLAIMS_DATA>` is a local directory containing synthetic claims data):
198+
199+
```sh
200+
uv run claims_generator.py \
201+
--sushi \
202+
./synthetic-data <PATH_TO_CLAIMS_DATA>
203+
```
204+
205+
If _any_ claims-related tables have had columns added to their respective generation functions, those new columns will be populated with values without impacting existing values in other columns.
206+
207+
> [!CAUTION]
208+
> If an **existing column value** must be updated, that column value **MUST BE DELETED** from the respective table CSV first so that the values can be regenerated.
209+
210+
#### `--sushi`
211+
212+
`--sushi` is not strictly needed, if you have a local copy of the compiled shorthand files, but recommended to reduce drift. To specify a list of benes, pass in a .csv file containing a column named `BENE_SK`.
213+
214+
The files output will be in the `./out` folder, there are several files:
215+
216+
- `SYNTHETIC_CLM.csv`
217+
- `SYNTHETIC_CLM_LINE.csv`
218+
- `SYNTHETIC_CLM_VAL.csv`
219+
- `SYNTHETIC_CLM_DT_SGNTR.csv`
220+
- `SYNTHETIC_CLM_PROD.csv`
221+
- `SYNTHETIC_CLM_INSTNL.csv`
222+
- `SYNTHETIC_CLM_LINE_INSTNL.csv`
223+
- `SYNTHETIC_CLM_DCMTN.csv`
224+
- `SYNTHETIC_CLM_FISS.csv`
225+
- `SYNTHETIC_CLM_PRFNL.csv`
226+
- `SYNTHETIC_CLM_LINE_PRFNL.csv`
227+
- `SYNTHETIC_CLM_ANSI_SGNTR.csv`
228+
229+
These files represent the schema of the tables the information is sourced from, although for tables other than `CLM_DT_SGNTR`, the `CLM_UNIQ_ID` is propagated instead of the 5 part unique key from the IDR.
176230

177231
## Data Dictionary
178232

@@ -193,4 +247,4 @@ Run:
193247
DESCRIBE VIEW CMS_VDM_VIEW_MDCR_PRD.{TABLE_NAME}
194248
```
195249

196-
Export the results as a CSV named {TABLE_NAME}.csv and save it under ReferenceTables.
250+
Export the results as a CSV named {TABLE_NAME}.csv and save it under ReferenceTables.

0 commit comments

Comments
 (0)