The BC Government's BCWAT application is composed of microservices in three groupings:
- bcwat microservices
- airflow microservices
- backend databases
The bcwat services (bcwat openshift helm target) contain:
- bcwat-nginx : Nginx wrapped application server to serve the frontend
- bcwat-api : Python API services that retrieves data from the backend
The Airflow services (airflow openshift helm target) contain:
- airflow scheduler : to schedule all data acquisition (scraper) jobs
- airflow trigger : module to run all scrapers jobs
- airflow webserver : user interface dashboard to monitor scrapers
The backend databases contain:
- bcwat PostGIS database
- bcwat PostGIS database backup (bcwat-db-repo)
- bcwat PostGIS database backup job (bcwat-db-backup)
In addition, each scraper pod scheduled by the airflow trigger will appear as a deployed pod, for example: drive-bc-dag-drive-bc-scraper (pod)
Simple Nginx service running in a container pod that serves the ViewJS application to the users' browsers.
Build
See client/src/Dockerfile and client/src/entrypoint.sh to see how the docker image is built.
Deployment
Two environmental variables get injected into the container at runtime. These are:
- The base URL of the API service (VITE_BASE_API_URL)
- The mapbox token to generate mapbox maps (VITE_APP_MAPBOX_TOKEN)
Components
The frontend application (bc-wat-app) is a ViewJS (ViewJS 3.x) application that uses the following main libraries:
-
quasar : Developer-oriented, front-end framework with VueJS components for best-in-class high-performance and responsive websites with good support for desktop and mobile browsers
-
d3 : Charting library for custom dynamic visualizations with data features such as selections, scales, shapes, interactions, layouts, geographic mapsmodule for barcharts and graphs
-
mapbox : Client-side JavaScript library for building web maps and web applications with user interactions that allows:
- Visualizing and displaying geographic data
- Querying and filtering features on a map
- Placing data between layers of a Mapbox style
- Dynamically displaying and styling custom client-side data on a map
- Data visualizations and animations
- Adding markers and popups to maps programmatically
Python API service that provides a REST interface to the frontend application
To start the API, first create a venv:
cd backend
python3 -m venv /path/to/venv/directory
source /path/to/venv/directory/bin/activate
pip install -r requirements.txt
Start the API by running the startup script:
cd backend
chmod +777 ./startup.sh
./startup.sh
Swagger documentation can be found at port 8000 at /docs and conform to OpenAPI Specification 3.0.
Routes can be tested by expanding the relevant endpoint name and method, and clicking 'Try it out'. A response body containing the
structure of the json will be displayed. This format is used to populate various components on the front end.
Crunchy Postgres Database with GIS extensions
Back Up
The database has a full back up, and incremental back up schedule:
| Back up Type | Time UTC | Time PST |
|---|---|---|
| Full Back Up | 11:00 |
04:00 |
| Incremental Back Up | 17:00 and 23:00 |
10:00 and 16:00 |
Components
The data base will consist of 3 schemas, bcwat_lic, bcwat_obs, and bcwat_ws. The first will store the information on water licensing data, the second will store the information on water and climate observation collected from stations throught BC, and the last will store the information on watersheds, such as their land cover, water use, etc.
Once a database has been created, it can be populated with the schemas, and all the data that needs to be populated before it can be scraped into. The associated documentation and scripts are located in the database_initialization README.
Airflow is an Apache open-source platform for developing, scheduling, and monitoring batch-oriented workflows.
Each scraper gets it's own Directed Acyclic Graph (DAG) file in AirFlow. The DAG files are located in the airflow/dags directory. Each DAG file is a Python file that contains the definition of the workflow, they can have multiple tasks, but since scraping tasks not complex, it has been combined in to one task, so that there is no intermediate data storing required.For a more detailed description of the DAG files, see the AirFlow documentation in the airflow README.
The following table has the DAG ID, the source that it is scraping from, the description of the data that it is scraping, and the variables that it is scraping.
| DAG ID | Source | Description | Variables |
|---|---|---|---|
asp_dag |
BC Ministry of Environment | Automated Snow Pillow (ASP) data from automated stations. |
|
ec_xml_dag |
MSC Data Mart | MSC Data Mart XML Scraper. |
|
env_aqn_dag |
BC Ministry of Environment | Data from the Ministry of Environment. This data originally came from PCIC. |
|
env_hydro_dag |
BC Ministry of Environment | Water stage and discharge from BC Government. |
|
flnro_wmb_dag |
BC Ministry of Forest | FLNRO-WMB data from the Ministry of Forest. Was originally from PCIC data porta. |
|
flowworks_dag |
Data from FlowWorks API | The access to the FlowWorks API requires an bearer token. |
|
gw_moe_dag |
BC Ministry of Environment | Groundwater data from the Ministry of Environment. |
|
msp_dag |
MSC Data Mart | BC Ministry of Environment | Manual Snow Pillow data from the Ministry of Environment. |
wsc_hydro_dag |
Hydrometric Data from MSC. |
|
|
water_licences_bcer_dag |
BC-ER ArcGIS Layer | Data from an ArcGIS data layer |
|
weather_farm_prd_dag |
BC Peace River Regional District Data | Data From BC Peace River Regional District weather stations. Some of the stations are not returning data but some of them work. |
|
wls_water_approval_dag |
DataBC Data Catalogue | Data from DataBC scraped using the bcdata Python package. This scraper scrapes the Water Rights Approval Points |
|
wra_wrl_dag |
DataBC Data Catalogue | Data from DataBC scraped using the bcdata Python package. This scraper scrapes the Public Water Rights Applications |
|
Following are the quarterly scrapers that should be run when the new Hydat version is available:
| DAG ID | Source | Description | Variables |
|---|---|---|---|
quarterly_climate_ec_update_dag |
MSC Data Mart | BC Climate daily data from MSC Data Mart |
|
quarterly_gw_moe_dag |
BC Ministry of Environment | Groundwater data from the Ministry of Environment. Similar source to the daily gw_moe scraper, but this takes the average .csv file. |
|
quarterly_hydat_import_dag |
Hydat | Hydat database which comes in a .zip format. Must be decompressed to be accessed. |
|
quarterly_water_quality_eccc_dag |
ECCC Data Catalogue | Water quality data from various locations. Gathered via the ECCC Data Catalogue API. |
|
quarterly_moe_hydrometric_historic_dag |
ECCC Data Catalogue | Discharge and Stage data from the Ministry of Environment |
|
quarterly_ems_water_quality_dag |
BC Data Catalogue | Water Quality data from the Government of BC |
|
The schedule for each dag is listed below:
| DAG ID | Run Time UTC | Run Time PST/PDT | Frequency | Notes |
|---|---|---|---|---|
asp_dag |
08:05 |
00:05/01:05 |
Daily | |
ec_xml_dag |
08:00 |
00:00/01:00 |
Daily | |
env_aqn |
08:00 |
00:00/01:00 |
Daily | |
env_hydro_dag |
08:10 |
00:10/01:10 |
Daily | |
flnro_wmb_dag |
08:00 |
00:00/01:00 |
Daily | |
flowworks_dag |
08:00 |
00:00/01:00 |
Daily | |
gw_moe_dag |
08:15 |
00:15/01:15 |
Daily | |
msp_dag |
08:00 |
00:00/01:00 |
Daily | |
weather_farm_prd_dag |
08:00 |
00:00/01:00 |
Daily | |
wsc_hydro_dag |
08:00 |
00:00/01:00 |
Daily | |
water_licences_bcer_dag |
06:00 |
22:00/23:00 |
Daily | |
wls_water_approval_dag |
06:00 |
22:00/23:00 |
Daily | |
wra_wrl_dag |
06:05 |
22:05/23:05 |
Daily | |
quarterly_climate_ec_update_dag |
08:30 |
00:30/01:30 |
Quarterly | First of the Month |
quarterly_gw_moe_dag |
09:00 |
01:00/02:00 |
Quarterly | First of the Month |
quarterly_hydat_import_dag |
09:30 |
01:30/02:30 |
Quarterly | 1st and 15th of the Month* |
quarterly_water_quality_eccc_dag |
10:00 |
02:00/03:00 |
Quarterly | First of the Month |
quarterly_moe_hydrometric_historic_dag |
10:15 |
02:15/03:15 |
Quarterly | First of the Month |
quarterly_ems_water_quality_dag |
08:30 |
00:30/01:30 |
Quarterly | Second of the Month |
update_sation_year_var_status_dag |
13:30 |
5:30/06:30 |
Daily |
* The HYDAT sqlite3 database only gets updated every quarter, but it does not have a fixed schedule. So by checking if the new data is available, it ensures that the newest data is available in the app. When it attemps to scrape HYDAT, if a new version is not available, it will not scrape.
Copyright 2022 Province of British Columbia
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.