Skip to content

Conversation

@dlzhry2nhs
Copy link
Contributor

@dlzhry2nhs dlzhry2nhs commented Oct 9, 2025

Summary

  • Routine Change

Problem is summarised in full on ticket: https://nhsd-jira.digital.nhs.uk/browse/VED-859

Aims to deal with 2 problems:

  1. Brittle completion checking - currently we look at the number of newlines in the source file and compare that to the ack. However, the source file can contain new lines within cells.
  2. Slowness - very large files can take an extremely long time to process. Specifically the building of the ack file is quite cumbersome with the amount of file reading we do.

Solution
Set number of records processed from ECS in DynamoDB - this is a reliable full record count using CSV Dict Reader.
Use the row id - format is {batch_message_id}^{record_number} in the ack lambda events to check if we have reached the final record by comparing to the value saved to DynamoDB
This means we no longer read the full S3 source file on every invocation and perform the expensive row count check.

There is still some slowness, as record ordering is critical and we cannot have parallel writers to the same S3 object, so we need to append each time to the ack file.

More details on ticket, but on a 50MB file we saw a 33% decrease in overall time taken to process. This time saving may increase with larger files.

Further options considered
Before settling on the above option, I did think about reading the source file (properly with CSV reader to avoid the newlines issue) once and caching the value, but realised this would be susceptible to the same encoding issue we have to handle in ECS. Not to mention ECS already counts the total records.

Additional changes:

  • Fix imports and make Pythonpath consistent in Makefile and pipeline for the ack backend
  • Update ack backend moto version and use @mock_aws decorator
  • Updated and added robust tests (especially using moto) to account for the new functionality

Reviews Required

  • Dev

Review Checklist

ℹ️ This section is to be filled in by the reviewer.

  • I have reviewed the changes in this PR and they fill all or part of the acceptance criteria of the ticket, and the code is in a mergeable state.
  • If there were infrastructure, operational, or build changes, I have made sure there is sufficient evidence that the changes will work.
  • I have ensured the changelog has been updated by the submitter, if necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

This branch is working on a ticket in the NHS England VED JIRA Project. Here's a handy link to the ticket:

VED-859

@dlzhry2nhs dlzhry2nhs force-pushed the feature/VED-859-improve-ack-backend-process branch 3 times, most recently from 47f6a81 to 9812230 Compare October 9, 2025 14:30
@dlzhry2nhs dlzhry2nhs force-pushed the feature/VED-859-improve-ack-backend-process branch from 9812230 to 2606e31 Compare October 10, 2025 10:30
@dlzhry2nhs dlzhry2nhs marked this pull request as ready for review October 10, 2025 10:30
@sonarqubecloud
Copy link

@dlzhry2nhs dlzhry2nhs merged commit 88435d5 into master Oct 10, 2025
9 checks passed
@dlzhry2nhs dlzhry2nhs deleted the feature/VED-859-improve-ack-backend-process branch October 10, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants