Skip to content

python_scripts

Andrew edited this page Oct 13, 2023 · 1 revision

main.py

The main.py script calls two functions - scrape_days_from_api and refresh_app_db

flowchart LR
A(Parse command line args)-->B(scrape_days_from_api)
B-->C(refresh_app_db)

Loading

scape_days_from_api

The main scraper function which results in two parquet files being written to s3:

  • a daily bookings file with name in format yyyy-mm-dd.parquet; and
  • a single locations file.
flowchart LR
B[Create session]
B--Authenticate-->C[Scrape page]
C--Repeat-->C
C-->D[Extract bookings]
C-->E[Extract locations]
subgraph pd [Process scraped data]
pd1[Normalize json]-->pd2[Rename columns]-->pd3[Impose metadata]-->pd4[Convert to parquet]
end
D-->pd
E-->pd
pd-->F[(s3: Bookings\nDaily file)]
pd-->G[(s3: Locations)]
Loading

refresh_app_db

flowchart LR
A[Drop matrixbooking_app_db] -.-> B[Delete underlying data]
A--->C[Create matrixbooking_app_db]
subgraph book [Create table: bookings]
    
    subgraph book_joined [Create table: Bookings with joined rooms]
        book_joined1[(matrix_db.bookings)]--left join--- book_joined2[(matrix_db.joined_rooms)]

    end

    book_joined-- inner join --- book1[(matrix_db.locations)]


    
 
end

subgraph locations [Create table: locations]
    
    locations1[(matrix_db.locations)]-- inner join --- locations2[(occupeye_db_live.sensors)]
 
end

subgraph sensors [Create table: sensors]
    
    sensors1[(occupeye_db_live.sensor_observations)]-- inner join --- sensors2[( occupeye_db_live.sensors)] 
    sensors2-- inner join --- sensors3[(matrix_db.locations)]
 
end
C-->book
C-->locations
C-->sensors
Loading

Clone this wiki locally