Skip to content

Latest commit

 

History

History
203 lines (146 loc) · 11.9 KB

File metadata and controls

203 lines (146 loc) · 11.9 KB

BCWAT Architecture

License

The BC Government's BCWAT application is composed of microservices in three groupings:

  • bcwat microservices
  • airflow microservices
  • backend databases

The bcwat services (bcwat openshift helm target) contain:

  • bcwat-nginx : Nginx wrapped application server to serve the frontend
  • bcwat-api : Python API services that retrieves data from the backend

The Airflow services (airflow openshift helm target) contain:

  • airflow scheduler : to schedule all data acquisition (scraper) jobs
  • airflow trigger : module to run all scrapers jobs
  • airflow webserver : user interface dashboard to monitor scrapers

The backend databases contain:

  • bcwat PostGIS database
  • bcwat PostGIS database backup (bcwat-db-repo)
  • bcwat PostGIS database backup job (bcwat-db-backup)

In addition, each scraper pod scheduled by the airflow trigger will appear as a deployed pod, for example: drive-bc-dag-drive-bc-scraper (pod)

Architecture Diagram

bcwat-nginx

Simple Nginx service running in a container pod that serves the ViewJS application to the users' browsers.

Build

See client/src/Dockerfile and client/src/entrypoint.sh to see how the docker image is built.

Deployment

Two environmental variables get injected into the container at runtime. These are:

  • The base URL of the API service (VITE_BASE_API_URL)
  • The mapbox token to generate mapbox maps (VITE_APP_MAPBOX_TOKEN)

Components

The frontend application (bc-wat-app) is a ViewJS (ViewJS 3.x) application that uses the following main libraries:

  • quasar : Developer-oriented, front-end framework with VueJS components for best-in-class high-performance and responsive websites with good support for desktop and mobile browsers

  • d3 : Charting library for custom dynamic visualizations with data features such as selections, scales, shapes, interactions, layouts, geographic mapsmodule for barcharts and graphs

  • mapbox : Client-side JavaScript library for building web maps and web applications with user interactions that allows:

    • Visualizing and displaying geographic data
    • Querying and filtering features on a map
    • Placing data between layers of a Mapbox style
    • Dynamically displaying and styling custom client-side data on a map
    • Data visualizations and animations
    • Adding markers and popups to maps programmatically

bcwat-api

Python API service that provides a REST interface to the frontend application

To start the API, first create a venv:

cd backend
python3 -m venv /path/to/venv/directory
source /path/to/venv/directory/bin/activate
pip install -r requirements.txt

Start the API by running the startup script:

cd backend
chmod +777 ./startup.sh
./startup.sh

Swagger documentation can be found at port 8000 at /docs and conform to OpenAPI Specification 3.0. Routes can be tested by expanding the relevant endpoint name and method, and clicking 'Try it out'. A response body containing the structure of the json will be displayed. This format is used to populate various components on the front end.

bcwat-db

Crunchy Postgres Database with GIS extensions

Back Up

The database has a full back up, and incremental back up schedule:

Back up Type Time UTC Time PST
Full Back Up 11:00 04:00
Incremental Back Up 17:00 and 23:00 10:00 and 16:00

Components

The data base will consist of 3 schemas, bcwat_lic, bcwat_obs, and bcwat_ws. The first will store the information on water licensing data, the second will store the information on water and climate observation collected from stations throught BC, and the last will store the information on watersheds, such as their land cover, water use, etc.

Once a database has been created, it can be populated with the schemas, and all the data that needs to be populated before it can be scraped into. The associated documentation and scripts are located in the database_initialization README.

Airflow scrapers

Airflow is an Apache open-source platform for developing, scheduling, and monitoring batch-oriented workflows.

Each scraper gets it's own Directed Acyclic Graph (DAG) file in AirFlow. The DAG files are located in the airflow/dags directory. Each DAG file is a Python file that contains the definition of the workflow, they can have multiple tasks, but since scraping tasks not complex, it has been combined in to one task, so that there is no intermediate data storing required.For a more detailed description of the DAG files, see the AirFlow documentation in the airflow README.

The following table has the DAG ID, the source that it is scraping from, the description of the data that it is scraping, and the variables that it is scraping.

DAG ID Source Description Variables
asp_dag BC Ministry of Environment Automated Snow Pillow (ASP) data from automated stations.
  • Temperature
  • Precipitation
  • Snow Depth
  • Snow Water Equivalent (SWE)
ec_xml_dag MSC Data Mart MSC Data Mart XML Scraper.
  • Temperature
  • Precipitation
  • Wind
  • Snow Amount
env_aqn_dag BC Ministry of Environment Data from the Ministry of Environment. This data originally came from PCIC.
  • Temperature
  • Precipitation
env_hydro_dag BC Ministry of Environment Water stage and discharge from BC Government.
  • Discharge
  • Level
flnro_wmb_dag BC Ministry of Forest FLNRO-WMB data from the Ministry of Forest. Was originally from PCIC data porta.
  • Temperature
  • Precipitation
flowworks_dag Data from FlowWorks API The access to the FlowWorks API requires an bearer token.
  • Temperature
  • Precipitation
  • Dischage
  • Level
  • Snow Water Equivalent
  • Rainfall
gw_moe_dag BC Ministry of Environment Groundwater data from the Ministry of Environment.
  • Groundwater Level
msp_dag MSC Data Mart BC Ministry of Environment Manual Snow Pillow data from the Ministry of Environment.
wsc_hydro_dag Hydrometric Data from MSC.
  • Discharge
  • Water Level
water_licences_bcer_dag BC-ER ArcGIS Layer Data from an ArcGIS data layer
  • Short Term Approvals
weather_farm_prd_dag BC Peace River Regional District Data Data From BC Peace River Regional District weather stations. Some of the stations are not returning data but some of them work.
  • Temperature
  • Rainfall
wls_water_approval_dag DataBC Data Catalogue Data from DataBC scraped using the bcdata Python package. This scraper scrapes the Water Rights Approval Points
  • Water Rights Approval Points
wra_wrl_dag DataBC Data Catalogue Data from DataBC scraped using the bcdata Python package. This scraper scrapes the Public Water Rights Applications
  • Public Water Rights Applications

Following are the quarterly scrapers that should be run when the new Hydat version is available:

DAG ID Source Description Variables
quarterly_climate_ec_update_dag MSC Data Mart BC Climate daily data from MSC Data Mart
  • Temperature
  • Precipitation
  • Snow Depth
  • Snow Amount
quarterly_gw_moe_dag BC Ministry of Environment Groundwater data from the Ministry of Environment. Similar source to the daily gw_moe scraper, but this takes the average .csv file.
  • Groundwater Level
quarterly_hydat_import_dag Hydat Hydat database which comes in a .zip format. Must be decompressed to be accessed.
  • Water Discharge
  • Water Level
quarterly_water_quality_eccc_dag ECCC Data Catalogue Water quality data from various locations. Gathered via the ECCC Data Catalogue API.
  • Water Quality
quarterly_moe_hydrometric_historic_dag ECCC Data Catalogue Discharge and Stage data from the Ministry of Environment
  • Discharge
  • Stage
quarterly_ems_water_quality_dag BC Data Catalogue Water Quality data from the Government of BC
  • Water Quality

Airflow scheduler

The schedule for each dag is listed below:

DAG ID Run Time UTC Run Time PST/PDT Frequency Notes
asp_dag 08:05 00:05/01:05 Daily
ec_xml_dag 08:00 00:00/01:00 Daily
env_aqn 08:00 00:00/01:00 Daily
env_hydro_dag 08:10 00:10/01:10 Daily
flnro_wmb_dag 08:00 00:00/01:00 Daily
flowworks_dag 08:00 00:00/01:00 Daily
gw_moe_dag 08:15 00:15/01:15 Daily
msp_dag 08:00 00:00/01:00 Daily
weather_farm_prd_dag 08:00 00:00/01:00 Daily
wsc_hydro_dag 08:00 00:00/01:00 Daily
water_licences_bcer_dag 06:00 22:00/23:00 Daily
wls_water_approval_dag 06:00 22:00/23:00 Daily
wra_wrl_dag 06:05 22:05/23:05 Daily
quarterly_climate_ec_update_dag 08:30 00:30/01:30 Quarterly First of the Month
quarterly_gw_moe_dag 09:00 01:00/02:00 Quarterly First of the Month
quarterly_hydat_import_dag 09:30 01:30/02:30 Quarterly 1st and 15th of the Month*
quarterly_water_quality_eccc_dag 10:00 02:00/03:00 Quarterly First of the Month
quarterly_moe_hydrometric_historic_dag 10:15 02:15/03:15 Quarterly First of the Month
quarterly_ems_water_quality_dag 08:30 00:30/01:30 Quarterly Second of the Month
update_sation_year_var_status_dag 13:30 5:30/06:30 Daily

* The HYDAT sqlite3 database only gets updated every quarter, but it does not have a fixed schedule. So by checking if the new data is available, it ensures that the newest data is available in the app. When it attemps to scrape HYDAT, if a new version is not available, it will not scrape.

License

Copyright 2022 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.