BCWAT Architecture

The BC Government's BCWAT application is composed of microservices in three groupings:

bcwat microservices
airflow microservices
backend databases

The bcwat services (bcwat openshift helm target) contain:

bcwat-nginx : Nginx wrapped application server to serve the frontend
bcwat-api : Python API services that retrieves data from the backend

The Airflow services (airflow openshift helm target) contain:

airflow scheduler : to schedule all data acquisition (scraper) jobs
airflow trigger : module to run all scrapers jobs
airflow webserver : user interface dashboard to monitor scrapers

The backend databases contain:

bcwat PostGIS database
bcwat PostGIS database backup (bcwat-db-repo)
bcwat PostGIS database backup job (bcwat-db-backup)

In addition, each scraper pod scheduled by the airflow trigger will appear as a deployed pod, for example: drive-bc-dag-drive-bc-scraper (pod)

bcwat-nginx

Simple Nginx service running in a container pod that serves the ViewJS application to the users' browsers.

Build

See client/src/Dockerfile and client/src/entrypoint.sh to see how the docker image is built.

Deployment

Two environmental variables get injected into the container at runtime. These are:

The base URL of the API service (VITE_BASE_API_URL)
The mapbox token to generate mapbox maps (VITE_APP_MAPBOX_TOKEN)

Components

The frontend application (bc-wat-app) is a ViewJS (ViewJS 3.x) application that uses the following main libraries:

quasar : Developer-oriented, front-end framework with VueJS components for best-in-class high-performance and responsive websites with good support for desktop and mobile browsers
d3 : Charting library for custom dynamic visualizations with data features such as selections, scales, shapes, interactions, layouts, geographic mapsmodule for barcharts and graphs
mapbox : Client-side JavaScript library for building web maps and web applications with user interactions that allows:
- Visualizing and displaying geographic data
- Querying and filtering features on a map
- Placing data between layers of a Mapbox style
- Dynamically displaying and styling custom client-side data on a map
- Data visualizations and animations
- Adding markers and popups to maps programmatically

bcwat-api

Python API service that provides a REST interface to the frontend application

To start the API, first create a venv:

cd backend
python3 -m venv /path/to/venv/directory
source /path/to/venv/directory/bin/activate
pip install -r requirements.txt

Start the API by running the startup script:

cd backend
chmod +777 ./startup.sh
./startup.sh

Swagger documentation can be found at port 8000 at /docs and conform to OpenAPI Specification 3.0. Routes can be tested by expanding the relevant endpoint name and method, and clicking 'Try it out'. A response body containing the structure of the json will be displayed. This format is used to populate various components on the front end.

bcwat-db

Crunchy Postgres Database with GIS extensions

Back Up

The database has a full back up, and incremental back up schedule:

Back up Type	Time UTC	Time PST
Full Back Up	`11:00`	`04:00`
Incremental Back Up	`17:00` and `23:00`	`10:00` and `16:00`

Components

The data base will consist of 3 schemas, bcwat_lic, bcwat_obs, and bcwat_ws. The first will store the information on water licensing data, the second will store the information on water and climate observation collected from stations throught BC, and the last will store the information on watersheds, such as their land cover, water use, etc.

Once a database has been created, it can be populated with the schemas, and all the data that needs to be populated before it can be scraped into. The associated documentation and scripts are located in the database_initialization README.

Airflow scrapers

Airflow is an Apache open-source platform for developing, scheduling, and monitoring batch-oriented workflows.

Each scraper gets it's own Directed Acyclic Graph (DAG) file in AirFlow. The DAG files are located in the airflow/dags directory. Each DAG file is a Python file that contains the definition of the workflow, they can have multiple tasks, but since scraping tasks not complex, it has been combined in to one task, so that there is no intermediate data storing required.For a more detailed description of the DAG files, see the AirFlow documentation in the airflow README.

The following table has the DAG ID, the source that it is scraping from, the description of the data that it is scraping, and the variables that it is scraping.

DAG ID	Source	Description	Variables
`asp_dag`	BC Ministry of Environment	Automated Snow Pillow (ASP) data from automated stations.	Temperature Precipitation Snow Depth Snow Water Equivalent (SWE)
`ec_xml_dag`	MSC Data Mart	MSC Data Mart XML Scraper.	Temperature Precipitation Wind Snow Amount
`env_aqn_dag`	BC Ministry of Environment	Data from the Ministry of Environment. This data originally came from PCIC.	Temperature Precipitation
`env_hydro_dag`	BC Ministry of Environment	Water stage and discharge from BC Government.	Discharge Level
`flnro_wmb_dag`	BC Ministry of Forest	FLNRO-WMB data from the Ministry of Forest. Was originally from PCIC data porta.	Temperature Precipitation
`flowworks_dag`	Data from FlowWorks API	The access to the FlowWorks API requires an bearer token.	Temperature Precipitation Dischage Level Snow Water Equivalent Rainfall
`gw_moe_dag`	BC Ministry of Environment	Groundwater data from the Ministry of Environment.	Groundwater Level
`msp_dag`	MSC Data Mart	BC Ministry of Environment	Manual Snow Pillow data from the Ministry of Environment.
`wsc_hydro_dag`	Hydrometric Data from MSC.	Discharge Water Level
`water_licences_bcer_dag`	BC-ER ArcGIS Layer	Data from an ArcGIS data layer	Short Term Approvals
`weather_farm_prd_dag`	BC Peace River Regional District Data	Data From BC Peace River Regional District weather stations. Some of the stations are not returning data but some of them work.	Temperature Rainfall
`wls_water_approval_dag`	DataBC Data Catalogue	Data from DataBC scraped using the `bcdata` Python package. This scraper scrapes the Water Rights Approval Points	Water Rights Approval Points
`wra_wrl_dag`	DataBC Data Catalogue	Data from DataBC scraped using the `bcdata` Python package. This scraper scrapes the Public Water Rights Applications	Public Water Rights Applications

Following are the quarterly scrapers that should be run when the new Hydat version is available:

DAG ID	Source	Description	Variables
`quarterly_climate_ec_update_dag`	MSC Data Mart	BC Climate daily data from MSC Data Mart	Temperature Precipitation Snow Depth Snow Amount
`quarterly_gw_moe_dag`	BC Ministry of Environment	Groundwater data from the Ministry of Environment. Similar source to the daily `gw_moe` scraper, but this takes the average .csv file.	Groundwater Level
`quarterly_hydat_import_dag`	Hydat	Hydat database which comes in a `.zip` format. Must be decompressed to be accessed.	Water Discharge Water Level
`quarterly_water_quality_eccc_dag`	ECCC Data Catalogue	Water quality data from various locations. Gathered via the ECCC Data Catalogue API.	Water Quality
`quarterly_moe_hydrometric_historic_dag`	ECCC Data Catalogue	Discharge and Stage data from the Ministry of Environment	Discharge Stage
`quarterly_ems_water_quality_dag`	BC Data Catalogue	Water Quality data from the Government of BC	Water Quality

Airflow scheduler

The schedule for each dag is listed below:

DAG ID	Run Time UTC	Run Time PST/PDT	Frequency	Notes
`asp_dag`	`08:05`	`00:05/01:05`	Daily
`ec_xml_dag`	`08:00`	`00:00/01:00`	Daily
`env_aqn`	`08:00`	`00:00/01:00`	Daily
`env_hydro_dag`	`08:10`	`00:10/01:10`	Daily
`flnro_wmb_dag`	`08:00`	`00:00/01:00`	Daily
`flowworks_dag`	`08:00`	`00:00/01:00`	Daily
`gw_moe_dag`	`08:15`	`00:15/01:15`	Daily
`msp_dag`	`08:00`	`00:00/01:00`	Daily
`weather_farm_prd_dag`	`08:00`	`00:00/01:00`	Daily
`wsc_hydro_dag`	`08:00`	`00:00/01:00`	Daily
`water_licences_bcer_dag`	`06:00`	`22:00/23:00`	Daily
`wls_water_approval_dag`	`06:00`	`22:00/23:00`	Daily
`wra_wrl_dag`	`06:05`	`22:05/23:05`	Daily
`quarterly_climate_ec_update_dag`	`08:30`	`00:30/01:30`	Quarterly	First of the Month
`quarterly_gw_moe_dag`	`09:00`	`01:00/02:00`	Quarterly	First of the Month
`quarterly_hydat_import_dag`	`09:30`	`01:30/02:30`	Quarterly	1st and 15th of the Month*
`quarterly_water_quality_eccc_dag`	`10:00`	`02:00/03:00`	Quarterly	First of the Month
`quarterly_moe_hydrometric_historic_dag`	`10:15`	`02:15/03:15`	Quarterly	First of the Month
`quarterly_ems_water_quality_dag`	`08:30`	`00:30/01:30`	Quarterly	Second of the Month
`update_sation_year_var_status_dag`	`13:30`	`5:30/06:30`	Daily

* The HYDAT sqlite3 database only gets updated every quarter, but it does not have a fixed schedule. So by checking if the new data is available, it ensures that the newest data is available in the app. When it attemps to scrape HYDAT, if a new version is not available, it will not scrape.

License

Copyright 2022 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BCWAT Architecture

bcwat-nginx

bcwat-api

bcwat-db

Airflow scrapers

Airflow scheduler

License

FilesExpand file tree

Architecture.md

Latest commit

History

Architecture.md

File metadata and controls

BCWAT Architecture

bcwat-nginx

bcwat-api

bcwat-db

Airflow scrapers

Airflow scheduler

License