Skip to content

Commit 135661a

Browse files
Altering docstrings/markdowns and adding a markdown document for the PDF Reader util
1 parent 739e012 commit 135661a

File tree

4 files changed

+80
-21
lines changed

4 files changed

+80
-21
lines changed

docs/utility-guides/BatchProcessing.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The Batch Processing utility allows for the processing of batches on the active
2020

2121
## Functions Overview
2222

23-
For this utility we have the following functions/methods:
23+
For this utility we have the following functions:
2424

2525
- `batch_processing`
2626
- `prepare_and_print_batch`
@@ -34,25 +34,25 @@ This will call the other two functions in order to successfully process a batch.
3434
#### Required Arguments
3535

3636
- `page`:
37-
- Type: **Page**
37+
- Type: `Page`
3838
- This is the playwright page object which is used to tell playwright what page the test is currently on.
3939
- `batch_type`:
40-
- Type: **str**
40+
- Type: `str`
4141
- This is the event code for the batch. For example: **S1** or **A323**
4242
- `batch_description`:
43-
- Type: **str**
43+
- Type: `str`
4444
- This is the description of the batch. For example: **Pre-invitation (FIT)** or **Post-investigation Appointment NOT Required**
4545
- `latest_event_status`:
46-
- Type: **str**
46+
- Type: `str`
4747
- This is the status the subject will get updated to after the batch has been processed. It is used to check that the subject has been updated to the correct status after a batch has been printed
4848

4949
#### Optional Arguments
5050

5151
- `run_timed_events`:
52-
- Type: **bool**
52+
- Type: `bool`
5353
- If this is set to **True**, then bcss_timed_events will be executed against all the subjects found in the batch
5454
- `get_subjects_from_pdf`:
55-
- Type: **bool**
55+
- Type: `bool`
5656
- If this is set to **True**, then the subjects will be retrieved from the downloaded PDF file instead of from the DB
5757

5858
#### How This Function Works
@@ -66,6 +66,7 @@ This will call the other two functions in order to successfully process a batch.
6666
6. After the ID is stored, it clicks on the ID to get to the Manage Active Batch page
6767
7. From Here it calls the `prepare_and_print_batch` function.
6868
1. If `get_subjects_from_pdf` was set to False it calls `get_nhs_no_from_batch_id`, which is imported from *utils.oracle.oracle_specific_functions*, to get the subjects from the batch and stores them as a pandas DataFrame - **nhs_no_df**
69+
2. For more Info on `get_nhs_no_from_batch_id` please look at: [PDFReader](PDFReader.md)
6970
8. Once this is complete it calls the `check_batch_in_archived_batch_list` function
7071
9. Finally, once that function is complete it calls `verify_subject_event_status_by_nhs_no` which is imported from *utils/screening_subject_page_searcher*
7172

@@ -77,13 +78,13 @@ It is in charge of pressing on the following button: **Prepare Batch**, **Retrie
7778
#### Arguments
7879

7980
- `page`:
80-
- Type: **Page**
81+
- Type: `Page`
8182
- This is the playwright page object which is used to tell playwright what page the test is currently on.
8283
- `link_text`:
83-
- Type: **str**
84+
- Type: `str`
8485
- This is the batch ID of the batch currently being processed
8586
- `get_subjects_from_pdf`:
86-
- Type: **bool**
87+
- Type: `bool`
8788
- This is an optional argument and if this is set to **True**, then the subjects will be retrieved from the downloaded PDF file instead of from the DB
8889

8990
#### How This Function Works
@@ -103,10 +104,10 @@ This function checks that the batch that was just prepared and printed is now vi
103104
#### Arguments
104105

105106
- `page`:
106-
- Type: **Page**
107+
- Type: `Page`
107108
- This is the playwright page object which is used to tell playwright what page the test is currently on.
108109
- `link_text`:
109-
- Type: **str**
110+
- Type: `str`
110111
- This is the batch ID of the batch currently being processed
111112

112113
#### How This Function Works

docs/utility-guides/PDFReader.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Utility Guide: PDF Reader
2+
3+
The PDF Reader utility allows for reading of PDF files and performing specific tasks on them.
4+
5+
## Table of Contents
6+
7+
- [Utility Guide: PDF Reader](#utility-guide-pdf-reader)
8+
- [Table of Contents](#table-of-contents)
9+
- [Functions Overview](#functions-overview)
10+
- [Ectract NHS No From PDF](#ectract-nhs-no-from-pdf)
11+
- [Required Arguments](#required-arguments)
12+
- [How This Function Works](#how-this-function-works)
13+
14+
## Functions Overview
15+
16+
For this utility we have the following functions/methods:
17+
18+
- `extract_nhs_no_from_pdf`
19+
20+
### Ectract NHS No From PDF
21+
22+
This is the main function that is called in order to process a batch.
23+
This will call the other two functions in order to successfully process a batch.
24+
25+
#### Required Arguments
26+
27+
- `file`:
28+
- Type: `str`
29+
- This is the file path stored as a string.
30+
31+
#### How This Function Works
32+
33+
1. It starts off by storing the PDF file as a PdfReader object, this is from the `pypdf` package.
34+
2. Then it loops thrpugh each page.
35+
3. If it finds the string *"NHS No"* in the page, it extracts it and removes any whitespaces, then adds it to a pandas DataFrame - `nhs_no_df`

utils/batch_processing.py

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,15 @@ def batch_processing(
2727
get_subjects_from_pdf: bool = False,
2828
) -> None:
2929
"""
30-
This util is used to process batches. It expects the following inputs:
31-
- page: This is playwright page variable
32-
- batch_type: This is the event code of the batch. E.g. S1 or S9
33-
- batch_description: This is the description of the batch. E.g. Pre-invitation (FIT)
34-
- latest_event_status: This is the status the subject will get updated to after the batch has been processed.
35-
- run_timed_events: This is an optional input that executes bcss_timed_events if set to True
36-
- get_subjects_from_pdf: This is an optial input to change the method of retrieving subjects from the batch from the Db to the PDF file.
30+
This is used to process batches.
31+
32+
Args:
33+
page (Page): This is the playwright page object
34+
batch_type (str): The event code of the batch. E.g. S1 or S9
35+
batch_description (str): The description of the batch. E.g. Pre-invitation (FIT)
36+
latest_event_status (str): The status the subject will get updated to after the batch has been processed.
37+
run_timed_events (bool): An optional input that executes bcss_timed_events if set to True
38+
get_subjects_from_pdf (bool): An optial input to change the method of retrieving subjects from the batch from the DB to the PDF file.
3739
"""
3840
logging.info(f"Processing {batch_type} - {batch_description} batch")
3941
BasePage(page).click_main_menu_link()
@@ -86,8 +88,16 @@ def prepare_and_print_batch(
8688
page: Page, link_text: str, get_subjects_from_pdf: bool = False
8789
) -> pd.DataFrame | None:
8890
"""
89-
This method prepares the batch, retreives the files and confirms them as printed
91+
This prepares the batch, retreives the files and confirms them as printed
9092
Once those buttons have been pressed it waits for the message 'Batch Successfully Archived'
93+
94+
Args:
95+
page (Page): This is the playwright page object
96+
link_text (str): The batch ID
97+
get_subjects_from_pdf (bool): An optial input to change the method of retrieving subjects from the batch from the DB to the PDF file.
98+
99+
Returns:
100+
nhs_no_df (pd.DataFrame | None): if get_subjects_from_pdf is True, this is a DataFrame with the column 'subject_nhs_number' and each NHS number being a record, otherwise it is None
91101
"""
92102
ManageActiveBatch(page).click_prepare_button()
93103
page.wait_for_timeout(
@@ -142,7 +152,11 @@ def prepare_and_print_batch(
142152

143153
def check_batch_in_archived_batch_list(page: Page, link_text) -> None:
144154
"""
145-
This method checks the the batch that was just prepared and printed is now visible in the archived batch list
155+
Checks the the batch that was just prepared and printed is now visible in the archived batch list.
156+
157+
Args:
158+
page (Page): This is the playwright page object
159+
link_text (str): The batch ID
146160
"""
147161
BasePage(page).click_main_menu_link()
148162
BasePage(page).go_to_communications_production_page()

utils/pdf_reader.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,15 @@
33

44

55
def extract_nhs_no_from_pdf(file: str) -> pd.DataFrame:
6+
"""
7+
Extracts all of the NHS Numbers in a PDF file and stores them in a pandas DataFrame.
8+
9+
Args:
10+
file (str): The file path stored as a string.
11+
12+
Returns:
13+
nhs_no_df (pd.DataFrame): A DataFrame with the column 'subject_nhs_number' and each NHS number being a record
14+
"""
615
reader = PdfReader(file)
716
nhs_no_df = pd.DataFrame(columns=["subject_nhs_number"])
817
# For loop looping through all pages of the file to find the NHS Number

0 commit comments

Comments
 (0)