Skip to content

Commit de82aea

Browse files
authored
Merge pull request #376 from NHSDigital/VED-163-FHIR-FLAT-JSON
VED-163-fhir to flat json date conversion
2 parents 56a47a0 + 7d3ca2e commit de82aea

16 files changed

+501
-39
lines changed

delta_backend/.coverage

0 Bytes
Binary file not shown.

delta_backend/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
output.json
2+
output.csv

delta_backend/Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,7 @@ package: build
55
mkdir -p build
66
docker run --rm -v $(shell pwd)/build:/build delta-lambda-build
77

8+
test:
9+
python -m unittest
10+
811
.PHONY: build package

delta_backend/README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# 🩺 FHIR to Flat JSON Conversion Engine
2+
3+
This project is designed to convert FHIR-compliant JSON data (e.g., Immunization records) into a flat JSON format based on a configurable schema layout. It is intended to support synchronization of Immunisation API generated data from external sources to DPS (Data Processing System) data system
4+
5+
---
6+
7+
## 📁 File Structure Overview
8+
9+
| File Name | What It Does |
10+
|------------------------|---------------|
11+
| **`converter.py`** | 🧠 The main brain — applies the schema, runs conversions, handles errors. |
12+
| **`FHIRParser.py`** | 🪜 Knows how to dig into nested FHIR structures and pull out values like dates, IDs, and patient names. |
13+
| **`SchemaParser.py`** | 📐 Reads your schema layout and tells the converter which FHIR fields to extract and how to rename/format them. |
14+
| **`ConversionLayout.py`** | ✍️ A plain Python list that defines which fields you want, and how they should be formatted (e.g. date format, renaming rules). |
15+
| **`ConversionChecker.py`** | 🔧 Handles transformation logic — e.g. turning a FHIR datetime into `YYYY-MM-DD`, applying lookups, gender codes, defaults, etc. |
16+
| **`Extractor.py`** | 🎣 Specialized logic to pull practitioner names, site codes, addresses, and apply time-aware rules. |
17+
| **`ExceptionMessages.py`** | 🚨 Holds reusable error messages and codes for clean debugging and validation feedback. |
18+
19+
---
20+
21+
22+
## 🛠️ Key Features
23+
24+
- Schema-driven field extraction and formatting
25+
- Support for custom date formats like `YYYYMMDD`, and CSV-safe UTC timestamps
26+
- Robust handling of patient, practitioner, and address data using time-aware logic
27+
- Extendable structure with static helper methods and modular architecture
28+
29+
---
30+
31+
## 📦 Example Use Case
32+
33+
- Input: FHIR `Immunization` resource (with nested fields)
34+
- Output: Flat JSON object with 34 standardized key-value pairs
35+
- Purpose: To export into CSV or push into downstream ETL systems
36+
37+
---
38+
39+
## ✅ Getting Started with `check_conversion.py`
40+
41+
To quickly test your conversion, use the provided `check_conversion.py` script.
42+
This script loads sample FHIR data, runs it through the converter, and automatically saves the output in both JSON and CSV formats.
43+
44+
### 🔄 How to Use It
45+
46+
1. Add your FHIR data (e.g., a dictionary or sample JSON) into the `fhir_sample` variable inside `check_conversion.py`
47+
2. Ensure the field mapping in `ConversionLayout.py` matches your desired output
48+
3. Run the script from the `tests` folder:
49+
50+
```bash
51+
python check_conversion.py
52+
```
53+
54+
### 📁 Output Location
55+
When the script runs, it will automatically:
56+
- Save a **flat JSON file** as `output.json`
57+
- Save a **CSV file** as `output.csv`
58+
59+
These will be located one level up from the `src/` folder:
60+
61+
```
62+
/mnt/c/Users/USER/desktop/shn/immunisation-fhir-api/delta_backend/output.json
63+
/mnt/c/Users/USER/desktop/shn/immunisation-fhir-api/delta_backend/output.csv
64+
```
65+
66+
### 👀 Visualization
67+
You can now:
68+
- Open `output.csv` in Excel or Google Sheets to view cleanly structured records
69+
- Inspect `output.json` to validate the flat key-value output programmatically
70+
71+
---

delta_backend/__init__.py

Whitespace-only changes.

delta_backend/poetry.lock

Lines changed: 110 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

delta_backend/pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ boto3 = "~1.26.90"
1111
mypy-boto3-dynamodb = "^1.26.164"
1212
moto = "~4.2.11"
1313

14+
[tool.poetry.group.dev.dependencies]
15+
coverage = "^7.8.0"
16+
1417
[build-system]
1518
requires = ["poetry-core"]
1619
build-backend = "poetry.core.masonry.api"

delta_backend/src/ConversionChecker.py

Lines changed: 73 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
1+
2+
# Handles the transformation logic for each field based on the schema
13
# Root and base type expression checker functions
24
import ExceptionMessages
3-
import datetime
4-
import uuid
5+
from datetime import datetime,timedelta
6+
from zoneinfo import ZoneInfo
57
import re
68
from LookUpData import LookUpData
79

810

911
# --------------------------------------------------------------------------------------------------------
10-
# record exception capture
12+
# Custom error type to handle validation failures
1113
class RecordError(Exception):
1214

1315
def __init__(self, code=None, message=None, details=None):
@@ -24,6 +26,7 @@ def __repr__(self):
2426

2527
# ---------------------------------------------------------------------------------------------------------
2628
# main conversion checker
29+
# Conversion engine for expression-based field transformation
2730
class ConversionChecker:
2831
# checker settings
2932
summarise = False
@@ -37,13 +40,17 @@ def __init__(self, dataParser, summarise, report_unexpected_exception):
3740
self.summarise = summarise # instance attribute
3841
self.report_unexpected_exception = report_unexpected_exception # instance attribute
3942

40-
# exposed functions
43+
# Main entry point called by converter.py
4144
def convertData(self, expressionType, expressionRule, fieldName, fieldValue):
4245
match expressionType:
4346
case "DATECONVERT":
4447
return self._convertToDate(
4548
expressionRule, fieldName, fieldValue, self.summarise, self.report_unexpected_exception
4649
)
50+
case "DATETIME":
51+
return self._convertToDateTime(
52+
expressionRule, fieldName, fieldValue, self.summarise, self.report_unexpected_exception
53+
)
4754
case "NOTEMPTY":
4855
return self._convertToNotEmpty(
4956
expressionRule, fieldName, fieldValue, self.summarise, self.report_unexpected_exception
@@ -75,15 +82,71 @@ def convertData(self, expressionType, expressionRule, fieldName, fieldValue):
7582
case _:
7683
return "Schema expression not found! Check your expression type : " + expressionType
7784

78-
# iso8086 date time validate
85+
# Convert ISO date string to a specific format (e.g. YYYYMMDD)
7986
def _convertToDate(self, expressionRule, fieldName, fieldValue, summarise, report_unexpected_exception):
87+
if not fieldValue:
88+
return ""
89+
90+
if not isinstance(fieldValue, str):
91+
raise RecordError(
92+
ExceptionMessages.RECORD_CHECK_FAILED,
93+
f"{fieldName} rejected: not a string.",
94+
f"Received: {type(fieldValue)}",
95+
)
96+
# Reject partial dates like "2024" or "2024-05"
97+
if re.match(r"^\d{4}(-\d{2})?$", fieldValue):
98+
raise RecordError(
99+
ExceptionMessages.RECORD_CHECK_FAILED,
100+
f"{fieldName} rejected: partial date not accepted.",
101+
f"Invalid partial date: {fieldValue}",
102+
)
80103
try:
81-
convertDate = datetime.datetime.fromisoformat(fieldValue)
82-
return convertDate.strftime(expressionRule)
83-
except Exception as e:
104+
dt = datetime.fromisoformat(fieldValue)
105+
format_str = expressionRule.replace("format:", "")
106+
return dt.strftime(format_str)
107+
except ValueError:
84108
if report_unexpected_exception:
85-
message = ExceptionMessages.MESSAGES[ExceptionMessages.UNEXPECTED_EXCEPTION] % (e.__class__.__name__, e)
86-
return message
109+
return f"Unexpected format: {fieldValue}"
110+
111+
# Convert FHIR datetime into CSV-safe UTC format
112+
def _convertToDateTime(self, expressionRule, fieldName, fieldValue, summarise, report_unexpected_exception):
113+
if not fieldValue:
114+
return ""
115+
116+
# Reject partial dates like "2024" or "2024-05"
117+
if re.match(r"^\d{4}(-\d{2})?$", fieldValue):
118+
raise RecordError(
119+
ExceptionMessages.RECORD_CHECK_FAILED,
120+
f"{fieldName} rejected: partial datetime not accepted.",
121+
f"Invalid partial datetime: {fieldValue}",
122+
)
123+
try:
124+
dt = datetime.fromisoformat(fieldValue)
125+
except ValueError:
126+
if report_unexpected_exception:
127+
return f"Unexpected format: {fieldValue}"
128+
129+
# Allow only +00:00 or +01:00 offsets (UTC and BST) and reject unsupported timezones
130+
offset = dt.utcoffset()
131+
allowed_offsets = [ZoneInfo("UTC").utcoffset(dt),
132+
ZoneInfo("Europe/London").utcoffset(dt)]
133+
if offset not in allowed_offsets:
134+
raise RecordError(
135+
ExceptionMessages.RECORD_CHECK_FAILED,
136+
f"{fieldName} rejected: unsupported timezone.",
137+
f"Unsupported offset: {offset}",
138+
)
139+
140+
# Convert to UTC
141+
dt_utc = dt.astimezone(ZoneInfo("UTC")).replace(microsecond=0)
142+
143+
format_str = expressionRule.replace("format:", "")
144+
145+
if format_str == "csv-utc":
146+
formatted = dt_utc.strftime("%Y%m%dT%H%M%S%z")
147+
return formatted.replace("+0000", "00").replace("+0100", "01")
148+
149+
return dt_utc.strftime(format_str)
87150

88151
# Not Empty Validate
89152
def _convertToNotEmpty(self, expressionRule, fieldName, fieldValue, summarise, report_unexpected_exception):

delta_backend/src/ConversionLayout.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

2-
#This is the base layout for converting from FHIR to Flat JSON
3-
#See the readme for an explanation of how this works
2+
# This file holds the schema/base layout that maps FHIR fields to flat JSON fields
3+
# Each entry tells the converter how to extract and transform a specific value
44

55
ConvertLayout = {
66
"id": "7d78e9a6-d859-45d3-bb05-df9c405acbdb",
@@ -67,8 +67,8 @@
6767
"fieldNameFlat": "DATE_AND_TIME",
6868
"expression": {
6969
"expressionName": "Date Convert",
70-
"expressionType": "DATECONVERT",
71-
"expressionRule": "%Y%m%dT%H%M%S"
70+
"expressionType": "DATETIME",
71+
"expressionRule": "format:csv-utc"
7272
}
7373
},
7474
{
@@ -140,7 +140,7 @@
140140
"expression": {
141141
"expressionName": "Date Convert",
142142
"expressionType": "DATECONVERT",
143-
"expressionRule": "%Y%m%d"
143+
"expressionRule": "format:%Y%m%d"
144144
}
145145
},
146146
{
@@ -221,7 +221,7 @@
221221
"expression": {
222222
"expressionName": "Date Convert",
223223
"expressionType": "DATECONVERT",
224-
"expressionRule": "%Y%m%d"
224+
"expressionRule": "format:%Y%m%d"
225225
}
226226
},
227227
{

delta_backend/src/Converter.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,14 @@ def __init__(self, fhir_data):
2929
self.FHIRData = fhir_data # Store JSON data directly
3030
self.SchemaFile = ConversionLayout.ConvertLayout
3131

32-
# create a FHIR parser - uses fhir json data from delta
32+
# create a FHIR parser - uses fhir json data from delta
33+
# (helper methods to extract values from the nested FHIR structure)
3334
def _getFHIRParser(self, fhir_data):
3435
fhirParser = FHIRParser()
3536
fhirParser.parseFHIRData(fhir_data)
3637
return fhirParser
3738

38-
# create a schema parser
39+
# create a schema parser - parses the schema that defines how FHIR fields should be mapped into flat fields.
3940
def _getSchemaParser(self, schemafile):
4041
schemaParser = SchemaParser()
4142
schemaParser.parseSchema(schemafile)
@@ -120,7 +121,7 @@ def extract_patient_details(self, json_data, FlatFieldName):
120121
self._cached_values = {}
121122

122123
if not self._cached_values:
123-
occurrence_time = datetime.strptime(json_data.get("occurrenceDateTime", ""), "%Y-%m-%dT%H:%M:%S%z")
124+
occurrence_time = datetime.fromisoformat(json_data.get("occurrenceDateTime", ""))
124125
patient = get_patient(json_data)
125126
if not patient:
126127
return None

0 commit comments

Comments
 (0)