-
Notifications
You must be signed in to change notification settings - Fork 2
python_scripts
Andrew edited this page Oct 13, 2023
·
1 revision
The main.py script calls two functions - scrape_days_from_api and refresh_app_db
flowchart LR
A(Parse command line args)-->B(scrape_days_from_api)
B-->C(refresh_app_db)
The main scraper function which results in two parquet files being written to s3:
- a daily
bookingsfile with name in formatyyyy-mm-dd.parquet; and - a single
locationsfile.
flowchart LR
B[Create session]
B--Authenticate-->C[Scrape page]
C--Repeat-->C
C-->D[Extract bookings]
C-->E[Extract locations]
subgraph pd [Process scraped data]
pd1[Normalize json]-->pd2[Rename columns]-->pd3[Impose metadata]-->pd4[Convert to parquet]
end
D-->pd
E-->pd
pd-->F[(s3: Bookings\nDaily file)]
pd-->G[(s3: Locations)]
flowchart LR
A[Drop matrixbooking_app_db] -.-> B[Delete underlying data]
A--->C[Create matrixbooking_app_db]
subgraph book [Create table: bookings]
subgraph book_joined [Create table: Bookings with joined rooms]
book_joined1[(matrix_db.bookings)]--left join--- book_joined2[(matrix_db.joined_rooms)]
end
book_joined-- inner join --- book1[(matrix_db.locations)]
end
subgraph locations [Create table: locations]
locations1[(matrix_db.locations)]-- inner join --- locations2[(occupeye_db_live.sensors)]
end
subgraph sensors [Create table: sensors]
sensors1[(occupeye_db_live.sensor_observations)]-- inner join --- sensors2[( occupeye_db_live.sensors)]
sensors2-- inner join --- sensors3[(matrix_db.locations)]
end
C-->book
C-->locations
C-->sensors
main script(s)
Changes