Skip to content

Commit 998fde7

Browse files
committed
Update README
1 parent 04bac0b commit 998fde7

File tree

1 file changed

+75
-21
lines changed

1 file changed

+75
-21
lines changed

apps/bfd-model-idr/README.md

Lines changed: 75 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ uv run compile_resources.py \
102102

103103
#### Patient Data - `patient_generator.py`
104104

105-
##### Usage
105+
##### `patient_generator.py` usage
106106

107107
```text
108108
usage: patient_generator.py [-h] [--patients PATIENTS] [--claims]
@@ -140,7 +140,7 @@ options:
140140
141141
```
142142

143-
##### Generating Data
143+
##### Generating patient data
144144

145145
To generate synthetic patient data, the patient_generator.py script is used.
146146
To utilize it to generate an entirely _new_ set of data from nothing:
@@ -173,33 +173,87 @@ The files output will be in the `out` folder:
173173

174174
The patient generator creates synthetic beneficiary data with realistic but _synthetic_ MBIs, coverage information, and historical records. It can generate multiple MBI versions per beneficiary and handles beneficiary cross-references with kill credit switches.
175175

176-
#### Claims data
176+
#### Claims data - `claims_generator.py`
177177

178-
To generate synthetic claims data, the claims_generator.py script is used.
179-
To utilize it:
178+
<!-- TODO: Provide an official location for downloading synthetic claims data -->
179+
> [!IMPORTANT]
180+
> Synthetic claims data is _much_ larger in size relative to patient data, and so it is not stored in the repository under `./synthetic-data`. If you are looking to regnerate this data, please reach out in #bfd so that the existing dataset can be provided to you.
181+
182+
#### `claims_generator.py` usage
183+
184+
```text
185+
Usage: claims_generator.py [OPTIONS] [PATHS]...
186+
187+
Generate synthetic claims data. Provided file PATHS will be updated with new
188+
fields.
189+
190+
Options:
191+
--sushi / --no-sushi Generate new StructureDefinitions. Use when
192+
testing locally if new .fsh files have been
193+
added.
194+
--min-claims INTEGER Minimum number of claims to generate per
195+
person
196+
--max-claims INTEGER Maximum number of claims to generate per
197+
person
198+
--force-pac-claims / --no-force-pac-claims
199+
Generate _new_ partially-adjudicated claims
200+
when existing pac claims tables exist in the
201+
synthetic data provided
202+
--help Show this message and exit.
203+
```
204+
205+
#### Generating claims data
206+
207+
> [!WARNING]
208+
> Either `SYNTHETIC_CLM.csv` or `SYNTHETIC_BENE_HSTRY.csv` **must** be provided as claims data generation requires an existing `BENE_SK` or `CLM` to generate/regenerate data.
209+
210+
To generate synthetic claims data, the `claims_generator.py` script is used.
211+
212+
##### Using `SYNTHETIC_BENE_HSTRY.csv`
213+
214+
The below will generate _entirely new claims_ for the given `BENE_SK`s in the provided file:
180215

181216
```sh
182217
uv run claims_generator.py \
183218
--sushi \
184219
out/SYNTHETIC_BENE_HSTRY.csv
185220
```
186221

187-
--sushi is not strictly needed, if you have a local copy of the compiled shorthand files, but recommended to reduce drift. To specify a list of benes, pass in a .csv file containing a column named BENE_SK.
188-
The files output will be in the out folder, there are several files:
189-
SYNTHETIC_CLM.csv
190-
SYNTHETIC_CLM_LINE.csv
191-
SYNTHETIC_CLM_VAL.csv
192-
SYNTHETIC_CLM_DT_SGNTR.csv
193-
SYNTHETIC_CLM_PROD.csv
194-
SYNTHETIC_CLM_INSTNL.csv
195-
SYNTHETIC_CLM_LINE_INSTNL.csv
196-
SYNTHETIC_CLM_DCMTN.csv
197-
SYNTHETIC_CLM_FISS.csv
198-
SYNTHETIC_CLM_PRFNL.csv
199-
SYNTHETIC_CLM_LINE_PRFNL.csv
200-
SYNTHETIC_CLM_ANSI_SGNTR.csv
201-
202-
These files represent the schema of the tables the information is sourced from, although for tables other than CLM_DT_SGNTR, the CLM_UNIQ_ID is propagated instead of the 5 part unique key from the IDR.
222+
##### Regenerating existing claims data
223+
224+
The below will _re-generate_ **existing claims data** (assume `<PATH_TO_CLAIMS_DATA>` is a local directory containing synthetic claims data):
225+
226+
```sh
227+
uv run claims_generator.py \
228+
--sushi \
229+
./synthetic-data <PATH_TO_CLAIMS_DATA>
230+
```
231+
232+
If _any_ claims-related tables have had columns added to their respective generation functions, those new columns will be populated with values without impacting existing values in other columns.
233+
234+
> [!CAUTION]
235+
> If an **existing column value** must be updated, that column value **MUST BE DELETED** from the respective table CSV first so that the values can be regenerated.
236+
237+
#### `--sushi`
238+
239+
`--sushi` is not strictly needed, if you have a local copy of the compiled shorthand files, but recommended to reduce drift. To specify a list of benes, pass in a .csv file containing a column named `BENE_SK`.
240+
241+
The files output will be in the `./out` folder, there are several files:
242+
243+
- `SYNTHETIC_CLM.csv`
244+
- `SYNTHETIC_CLM_LINE.csv`
245+
- `SYNTHETIC_CLM_VAL.csv`
246+
- `SYNTHETIC_CLM_DT_SGNTR.csv`
247+
- `SYNTHETIC_CLM_PROD.csv`
248+
- `SYNTHETIC_CLM_INSTNL.csv`
249+
- `SYNTHETIC_CLM_LINE_INSTNL.csv`
250+
- `SYNTHETIC_CLM_DCMTN.csv`
251+
- `SYNTHETIC_CLM_FISS.csv`
252+
- `SYNTHETIC_CLM_PRFNL.csv`
253+
- `SYNTHETIC_CLM_LINE_PRFNL.csv`
254+
- `SYNTHETIC_CLM_ANSI_SGNTR.csv`
255+
256+
These files represent the schema of the tables the information is sourced from, although for tables other than `CLM_DT_SGNTR`, the `CLM_UNIQ_ID` is propagated instead of the 5 part unique key from the IDR.
203257

204258
## Data Dictionary
205259

0 commit comments

Comments
 (0)