Skip to content

Conversation

@cameronhargreaves1-nhs
Copy link
Contributor

@cameronhargreaves1-nhs cameronhargreaves1-nhs commented Nov 12, 2025

Description

we want to model the relationship between Appontments and the Extract it was booked in properly to give us the ability to more easily check sequence numbers.

Whenever appointments are created, a new Extract will need to be created as well.

Jira link

Review notes

Review checklist

  • Check database queries are correctly scoped to current_provider

@cameronhargreaves1-nhs cameronhargreaves1-nhs changed the title Dtoss 11551 add extract model [DTOSS-11551] Add extract model Nov 19, 2025
@cameronhargreaves1-nhs cameronhargreaves1-nhs force-pushed the DTOSS-11551-add-extract-model branch from cfb2f0f to bd17c1d Compare November 19, 2025 08:30
Copy link
Contributor

@Harriethw Harriethw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good so far! just some minor changes/ questions

@cameronhargreaves1-nhs cameronhargreaves1-nhs force-pushed the DTOSS-11551-add-extract-model branch 2 times, most recently from 54d67c6 to 5ff80f1 Compare November 21, 2025 15:42
@cameronhargreaves1-nhs cameronhargreaves1-nhs marked this pull request as ready for review November 24, 2025 09:17
@cameronhargreaves1-nhs cameronhargreaves1-nhs requested a review from a team November 24, 2025 09:17
Copy link
Contributor

@steventux steventux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, a couple of small things, the main thing is around how we read the header, I think we can make it more readable.


file_headers = self.get_file_header(blob_content)

extract = Extract.objects.create(sequence_number = int(file_headers[1].strip()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of magic indexes to what we are doing here, it makes the code hard to read when we don't know what file_headers[1] or file_headers[4] refer to.
Could we have a pattern like:

def create_extract(filename: str) -> Extract:
    bso_code = filename.split("_")[0]
    type_id, extract_id, start_date, start_time, record_count = raw_data.split("\r\n").split("|")
    # Maybe raise if the above fails
    return Extract.objects.create(
          sequence_number: extract_id,
          bso_code: bso_code,
          filename: filename,
          record_count: record_count,
    )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just seen this doesn't look like it was added, will take a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steventux have pushed in latest commit - IMO the string manipulation is harder to understand than just converting to a dataframe, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there were a few more steps to parsing the string than in your original suggestion - i guess we get all that "for free" from the pandas stuff

@Harriethw Harriethw force-pushed the DTOSS-11551-add-extract-model branch from 6d65503 to 96a4d2d Compare November 26, 2025 12:58
@Harriethw Harriethw marked this pull request as draft November 26, 2025 13:38
@Harriethw Harriethw force-pushed the DTOSS-11551-add-extract-model branch 4 times, most recently from 6936293 to d7f1e57 Compare November 26, 2025 16:00
@Harriethw Harriethw dismissed their stale review November 26, 2025 16:06

addressed changes

@Harriethw Harriethw force-pushed the DTOSS-11551-add-extract-model branch 3 times, most recently from e2962e9 to 9a1b6c9 Compare November 26, 2025 16:37
@Harriethw
Copy link
Contributor

I tidied up the commits to make it easier to grep, but there are a few changes of note from the original PR:

  • adding the related_name on the ManyToMany allows us to query extracts from the Appointment, which I think is useful to have
  • the validation on a unique constraint (e.g. if we try to process same Extract file twice) does happen on create once the Model is setup right - so no new Extract would be created, and we would be alerted.
  • I had to pass transaction=True into the unique test because pytest was struggling and didn't seem to the think the transaction was atomic - I think it should be because we've wrapped everything in a try/catch block, but not 100% sure 🤔

@Harriethw Harriethw marked this pull request as ready for review November 26, 2025 16:42
Copy link
Contributor

@steventux steventux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, very clean and well tested 💯 🥇 🛳️

def create_extract(self, filename: str, raw_data: str) -> Extract:
bso_code = filename.split("/")[1].split("_")[0]
type_id, extract_id, start_date, start_time, record_count = raw_data.split(
"\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this matters or even accurate but the spec states that the line separator is CR/LF which is \r\n in our money. I suppose the only side effect of splitting on \n would be rogue carriage returns.
Perhaps as we are already stripping quotes we could attempt to strip \r?
Not a dealbreaker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!
I didn't know how to get the \r to show up in the data we have though 🤷 at least it will get removed if it does turn up!

To store information about the .dat files we
receive, from which we extract Appointment info.
here we add an Appointment to an Extract wherever
it is created
To avoid converting to data frame again
@Harriethw Harriethw force-pushed the DTOSS-11551-add-extract-model branch from 55e3a23 to 5a29d3f Compare November 27, 2025 10:16
@Harriethw Harriethw merged commit fb844a3 into main Nov 27, 2025
12 checks passed
@Harriethw Harriethw deleted the DTOSS-11551-add-extract-model branch November 27, 2025 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants