Skip to content

VED-754 Handle non utf 8 encoded csv (bugfix)#792

Closed
dlzhry2nhs wants to merge 3 commits intorelease-2025-09-04from
bugfix/VED-754-handle-non-utf-8-encoded-csv
Closed

VED-754 Handle non utf 8 encoded csv (bugfix)#792
dlzhry2nhs wants to merge 3 commits intorelease-2025-09-04from
bugfix/VED-754-handle-non-utf-8-encoded-csv

Conversation

@dlzhry2nhs
Copy link
Collaborator

@dlzhry2nhs dlzhry2nhs commented Sep 5, 2025

Summary

  • Bugfix/handling for specific supplier settings

See https://nhsd-jira.digital.nhs.uk/browse/VED-754 for the investigation and root cause. Essentially, until we update the specification to stipulate that we expect utf-8 encoding and suppliers come into line, it is fair and reasonable to put in temporary handling.

We would want to get rid of this as soon as we can, and reject files that do not conform to spec.

Note: this only occurs in a slim minority of cases, as the differences in encoding only surfaces for non-ASCII characters, which appear quite rarely within the batch extracts.

Reviews Required

  • Dev

Review Checklist

ℹ️ This section is to be filled in by the reviewer.

  • I have reviewed the changes in this PR and they fill all or part of the acceptance criteria of the ticket, and the code is in a mergeable state.
  • If there were infrastructure, operational, or build changes, I have made sure there is sufficient evidence that the changes will work.
  • I have ensured the changelog has been updated by the submitter, if necessary.

@dlzhry2nhs
Copy link
Collaborator Author

Quality Gate Failed Quality Gate failed

Failed conditions 0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

The release branch does not yet have the fix from main which ensures the correct version of Python + Poetry is used, so the unit tests do not run. See: coverage-fhir-api step from https://github.com/NHSDigital/immunisation-fhir-api/actions/runs/17486786687/job/49667303142?pr=792

I have verified that the unit test I have added exercises the new code.

@dlzhry2nhs dlzhry2nhs force-pushed the bugfix/VED-754-handle-non-utf-8-encoded-csv branch from b4ae9f1 to 93448ef Compare September 5, 2025 08:02
@NHSDigital NHSDigital deleted a comment from sonarqubecloud bot Sep 5, 2025
@dlzhry2nhs dlzhry2nhs marked this pull request as ready for review September 5, 2025 13:34
file_bytes.seek(0)
return False

file_bytes.seek(0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For discussion, this is a bit naughty. It avoids the function having a side effect of fast forwarding the file to a certain point and ensures it always sets the stream back to the start.

Other option would be to create to BytesIO object. One for encoding validation, and then one to pass to the dict reader. However, concluded minimising the objects is better even is we have to manually call .seek(0).

@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 5, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@dlzhry2nhs
Copy link
Collaborator Author

Closing - superseded by a separate fix.

@dlzhry2nhs dlzhry2nhs closed this Sep 11, 2025
@dlzhry2nhs dlzhry2nhs deleted the bugfix/VED-754-handle-non-utf-8-encoded-csv branch September 11, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant