Skip to content

Commit cb70feb

Browse files
Merge branch 'main' into feature/BCSS-20629-playwright-nhs-number-tools-markdown-improvements
2 parents 0b9c79b + 1980343 commit cb70feb

File tree

4 files changed

+232
-361
lines changed

4 files changed

+232
-361
lines changed

docs/utility-guides/BatchProcessing.md

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,40 @@
11
# Utility Guide: Batch Processing
22

3-
The Batch Processing utility allows for the processing of batches on the active batch list page to be completed in one method
3+
The Batch Processing utility provides a one-stop function for processing batches on the active batch list page, streamlining all necessary steps into a single call. **To process a batch, call the `batch_processing` function as described below.**
44

55
## Table of Contents
66

77
- [Utility Guide: Batch Processing](#utility-guide-batch-processing)
88
- [Table of Contents](#table-of-contents)
9+
- [Example Usage](#example-usage)
910
- [Functions Overview](#functions-overview)
1011
- [Batch Processing](#batch-processing)
1112
- [Required Arguments](#required-arguments)
1213
- [Optional Arguments](#optional-arguments)
1314
- [How This Function Works](#how-this-function-works)
1415
- [Prepare And Print Batch](#prepare-and-print-batch)
1516
- [Arguments](#arguments)
17+
- [Optional Arguments](#optional-arguments-1)
1618
- [How This Function Works](#how-this-function-works-1)
1719
- [Check Batch In Archived Batch List](#check-batch-in-archived-batch-list)
1820
- [Arguments](#arguments-1)
1921
- [How This Function Works](#how-this-function-works-2)
2022

23+
## Example Usage
24+
25+
```python
26+
from utils.batch_processing import batch_processing
27+
28+
batch_processing(
29+
page=page,
30+
batch_type="S1",
31+
batch_description="Pre-invitation (FIT)",
32+
latest_event_status=["Status1", "Status2"], # Can be str or list[str]
33+
run_timed_events=True,
34+
get_subjects_from_pdf=False
35+
)
36+
```
37+
2138
## Functions Overview
2239

2340
For this utility we have the following functions:
@@ -28,8 +45,7 @@ For this utility we have the following functions:
2845

2946
### Batch Processing
3047

31-
This is the main function that is called in order to process a batch.
32-
This will call the other two functions in order to successfully process a batch.
48+
This is the **main entry point function** that should be called to process a batch. It manages and coordinates all the required steps by internally calling the other two functions and auxiliary utilities as needed.
3349

3450
#### Required Arguments
3551

@@ -43,14 +59,15 @@ This will call the other two functions in order to successfully process a batch.
4359
- Type: `str`
4460
- This is the description of the batch. For example: **Pre-invitation (FIT)** or **Post-investigation Appointment NOT Required**
4561
- `latest_event_status`:
46-
- Type: `str | None`
47-
- This is the status the subject will get updated to after the batch has been processed. It is used to check that the subject has been updated to the correct status after a batch has been printed. If there are multiple different status in the same batch, provide them all in a list.
62+
- Type: `str | list[str] |`
63+
- This is the status or list of statuses the subject(s) will get updated to after the batch has been processed. It is used to check that the subject(s) have been updated to the correct status after a batch has been printed.
4864

4965
#### Optional Arguments
5066

5167
- `run_timed_events`:
5268
- Type: `bool`
5369
- If this is set to **True**, then bcss_timed_events will be executed against all the subjects found in the batch
70+
- These timed events simulate the passage of time-dependent processing steps.
5471
- `get_subjects_from_pdf`:
5572
- Type: `bool`
5673
- If this is set to **True**, then the subjects will be retrieved from the downloaded PDF file instead of from the DB
@@ -65,8 +82,7 @@ This will call the other two functions in order to successfully process a batch.
6582
5. Now it extracts the ID of the batch and stores it in the local variable `link_text`, this is used later on to extracts the subjects in the batch from the DB
6683
6. After the ID is stored, it clicks on the ID to get to the Manage Active Batch page
6784
7. From Here it calls the `prepare_and_print_batch` function.
68-
1. If `get_subjects_from_pdf` was set to False it calls `get_nhs_no_from_batch_id`, which is imported from *utils.oracle.oracle_specific_functions*, to get the subjects from the batch and stores them as a pandas DataFrame - **nhs_no_df**
69-
2. For more Info on `get_nhs_no_from_batch_id` please look at: [`PDFReader`](PDFReader.md)
85+
1. If `get_subjects_from_pdf` was set to False it calls `get_nhs_no_from_batch_id`, which is imported from *utils.oracle.oracle_specific_functions*, to get the subjects from the DB and stores them as a pandas DataFrame - **nhs_no_df**
7086
8. Once this is complete it calls the `check_batch_in_archived_batch_list` function
7187
9. Finally, once that function is complete it calls `verify_subject_event_status_by_nhs_no` which is imported from *utils/screening_subject_page_searcher*
7288

@@ -83,17 +99,21 @@ It is in charge of pressing on the following button: **Prepare Batch**, **Retrie
8399
- `link_text`:
84100
- Type: `str`
85101
- This is the batch ID of the batch currently being processed
102+
103+
#### Optional Arguments
104+
86105
- `get_subjects_from_pdf`:
87106
- Type: `bool`
88-
- This is an optional argument and if this is set to **True**, then the subjects will be retrieved from the downloaded PDF file instead of from the DB
107+
- If this is set to **True**, then the subjects will be retrieved from the downloaded PDF file instead of from the DB
89108

90109
#### How This Function Works
91110

92111
1. It starts off by clicking on the **Prepare Batch** button.
93112
2. After this it waits for the button to turn into **Re-Prepare Batch**. Once this happens it means that the batch is ready to be printed.
94113
3. Now It clicks on each **Retrieve** button visible.
95-
1. If `get_subjects_from_pdf` was set to True and the file is a **.pdf**, then it calls `extract_nhs_no_from_pdf`, which is imported from *utils/pdf_reader*, to get the subjects from the batch and stores them as a pandas DataFrame - **nhs_no_df**
96-
2. After a file is downloaded, it gets deleted.
114+
1. If `get_subjects_from_pdf` was set to True and the file is a **.pdf**, then it calls `extract_nhs_no_from_pdf`, which is imported from *utils/pdf_reader*, to get the subjects from the PDF and stores them as a pandas DataFrame - **nhs_no_df**
115+
2. For more Info on `extract_nhs_no_from_pdf` please look at: [`PDFReader`](PDFReader.md)
116+
3. After a file is downloaded, it gets deleted.
97117
4. Then it clicks on each **Confirm Printed** button ensuring to handle the dialog that appears.
98118
5. Finally it checks for the message: *Batch Successfully Archived and Printed*
99119

docs/utility-guides/PDFReader.md

Lines changed: 37 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Utility Guide: PDF Reader
22

3-
The PDF Reader utility allows for reading of PDF files and performing specific tasks on them.
3+
The PDF Reader utility allows for reading PDF files and extracting NHS numbers from them.
44

55
## Table of Contents
66

@@ -10,26 +10,55 @@ The PDF Reader utility allows for reading of PDF files and performing specific t
1010
- [Extract NHS No From PDF](#extract-nhs-no-from-pdf)
1111
- [Required Arguments](#required-arguments)
1212
- [How This Function Works](#how-this-function-works)
13+
- [Example Usage](#example-usage)
1314

1415
## Functions Overview
1516

16-
For this utility we have the following functions/methods:
17+
For this utility, the following function is available:
1718

1819
- `extract_nhs_no_from_pdf`
1920

2021
### Extract NHS No From PDF
2122

22-
This is called to extract all NHS numbers from a PDF file.
23-
The way it finds an NHS number is by looking for the string **"NHS No:"**
23+
This function extracts all NHS numbers from a PDF file by searching for the string **"NHS No:"** on each page.
2424

2525
#### Required Arguments
2626

2727
- `file`:
2828
- Type: `str`
29-
- This is the file path stored as a string.
29+
- The file path to the PDF file as a string.
3030

3131
#### How This Function Works
3232

33-
1. It starts off by storing the PDF file as a PdfReader object, this is from the `pypdf` package.
34-
2. Then it loops through each page.
35-
3. If it finds the string *"NHS No"* in the page, it extracts it and removes any whitespaces, then adds it to a pandas DataFrame - `nhs_no_df`
33+
1. Loads the PDF file using the `PdfReader` object from the `pypdf` package.
34+
2. Loops through each page of the PDF.
35+
3. Searches for the string *"NHS No"* on each page.
36+
4. If found, extracts the NHS number, removes any whitespaces, and adds it to a pandas DataFrame (`nhs_no_df`).
37+
5. If no NHS numbers are found on that page, it goes to the next page.
38+
6. Returns the DataFrame containing all extracted NHS numbers.
39+
40+
#### Example Usage
41+
42+
You can use this utility to extract NHS numbers from a PDF file as part of the [`Batch Processing`](BatchProcessing.md) utility or by providing the file path as a string.
43+
44+
**Extracting NHS numbers using a file path:**
45+
46+
```python
47+
from utils.pdf_reader import extract_nhs_no_from_pdf
48+
file_path = "path/to/your/file.pdf"
49+
nhs_no_df = extract_nhs_no_from_pdf(file_path)
50+
```
51+
52+
**Extracting NHS numbers using batch processing:**
53+
54+
```python
55+
from utils.pdf_reader import extract_nhs_no_from_pdf
56+
get_subjects_from_pdf = True
57+
file = download_file.suggested_filename # This is done via playwright when the "Retrieve button" on a batch is clicked.
58+
59+
nhs_no_df = (
60+
extract_nhs_no_from_pdf(file)
61+
if file.endswith(".pdf") and get_subjects_from_pdf
62+
else None
63+
)
64+
```

0 commit comments

Comments
 (0)