VED-754 Handle non utf 8 encoded csv (bugfix)#792
VED-754 Handle non utf 8 encoded csv (bugfix)#792dlzhry2nhs wants to merge 3 commits intorelease-2025-09-04from
Conversation
The release branch does not yet have the fix from main which ensures the correct version of Python + Poetry is used, so the unit tests do not run. See: I have verified that the unit test I have added exercises the new code. |
b4ae9f1 to
93448ef
Compare
| file_bytes.seek(0) | ||
| return False | ||
|
|
||
| file_bytes.seek(0) |
There was a problem hiding this comment.
For discussion, this is a bit naughty. It avoids the function having a side effect of fast forwarding the file to a certain point and ensures it always sets the stream back to the start.
Other option would be to create to BytesIO object. One for encoding validation, and then one to pass to the dict reader. However, concluded minimising the objects is better even is we have to manually call .seek(0).
|
|
Closing - superseded by a separate fix. |


Summary
See https://nhsd-jira.digital.nhs.uk/browse/VED-754 for the investigation and root cause. Essentially, until we update the specification to stipulate that we expect utf-8 encoding and suppliers come into line, it is fair and reasonable to put in temporary handling.
We would want to get rid of this as soon as we can, and reject files that do not conform to spec.
Note: this only occurs in a slim minority of cases, as the differences in encoding only surfaces for non-ASCII characters, which appear quite rarely within the batch extracts.
Reviews Required
Review Checklist
ℹ️ This section is to be filled in by the reviewer.