|
3 | 3 | Monitors can be defined alongside derived datasets in bigquery-etl. Monitoring in Bigeye for a specific table can be enabled by adding `monitoring` metadata to the `metadata.yaml` file: |
4 | 4 |
|
5 | 5 | ```yaml |
6 | | -friendly_name: Some Table |
| 6 | +friendly_name: Some Table [warn] |
7 | 7 | monitoring: |
8 | 8 | enabled: true # Enables monitoring for the table in Bigeye and deploys freshness and volume metrics |
9 | 9 | collection: Test # An existing collection these monitors should be part of in Bigeye |
10 | 10 | ``` |
11 | 11 |
|
12 | 12 | Enabling monitoring for a table automatically deploys freshness and volume metrics for this table. |
13 | 13 |
|
| 14 | +Bigeye monitors are triggered automatically via Airflow for queries that have `monitoring` set to `enabled: true`. The checks are executed after the ETL run for the table has been completed. |
| 15 | + |
| 16 | +To indicate whether a failing check should block any downstream Airflow tasks, a `[warn]` or `[fail]` can be added to the name of the Bigeye metric. By default, all metrics that do not have either of those tags specified are considered as `[warn]`. These metrics won't be blocking any downstream Airflow tasks when checks fail, but any failing check will appear in the Bigeye dashboard. Metrics that have `[fail]` specified in their names will block the execution of downstream Airflow tasks in the event of a check failing. |
| 17 | + |
14 | 18 | ## Bigconfig |
15 | 19 |
|
16 | 20 | Additional and custom monitors can be defined in a [Bigconfig](https://docs.bigeye.com/docs/bigconfig#example-template) `bigconfig.yml` file that is stored in the same directory as the table query. Bigconfig allows users to deploy other pre-defined monitors, such as row counts or null checks on a table or column level. |
17 | 21 |
|
| 22 | +## Custom SQL Rules |
| 23 | + |
| 24 | +> > This is a temporary workaround until custom SQL rules are supported in Bigconfig, which is currently being worked on. |
| 25 | + |
| 26 | +Custom SQL rules can be configured in a separate `bigeye_custom_rules.sql` file alongside the query. This file can contain various rules: |
| 27 | + |
| 28 | +```sql |
| 29 | +-- { |
| 30 | +-- "name": "Fenix releases version format", |
| 31 | +-- "alert_conditions": "value", |
| 32 | +-- "range": { |
| 33 | +-- "min": 0, |
| 34 | +-- "max": 1 |
| 35 | +-- }, |
| 36 | +-- "collections": ["Test"], |
| 37 | +-- "owner": "", |
| 38 | +-- "schedule": "Default Schedule - 13:00 UTC" |
| 39 | +-- } |
| 40 | +SELECT |
| 41 | + ROUND((COUNTIF(NOT REGEXP_CONTAINS(version, r"^[0-9]+\..+$"))) / COUNT(*) * 100, 2) AS perc |
| 42 | +FROM |
| 43 | + `{{ project_id }}.{{ dataset_id }}.{{ table_name }}`; |
| 44 | + |
| 45 | +-- { |
| 46 | +-- "name": "Fenix releases product check", |
| 47 | +-- "alert_conditions": "value", |
| 48 | +-- "range": { |
| 49 | +-- "min": 0, |
| 50 | +-- "max": 1 |
| 51 | +-- }, |
| 52 | +-- "collections": ["Test"], |
| 53 | +-- "owner": "", |
| 54 | +-- "schedule": "Default Schedule - 13:00 UTC" |
| 55 | +-- } |
| 56 | +SELECT |
| 57 | + ROUND((COUNTIF(product != "fenix")) / COUNT(*) * 100, 2) AS perc |
| 58 | +FROM |
| 59 | + `{{ project_id }}.{{ dataset_id }}.{{ table_name }}`; |
| 60 | +``` |
| 61 | + |
| 62 | +The SQL comment before the rule SQL has to be a JSON object that contains the configuration parameters for this rule: |
| 63 | + |
| 64 | +- `name`: the name of the SQL rule. Specify `[warn]` or `[fail]` to indicate whether a rule failure should block downstream Airflow tasks |
| 65 | +- `alert_conditions`: one of `value` (alerts based on the returned value) or `count` (alerts based on whether the query returns rows) |
| 66 | +- `collections`: list of collections this rule should be a part of |
| 67 | +- `owner`: email address of the rule owner |
| 68 | +- `schedule`: optional schedule of when this rule should be triggered. The rule will also get triggered as part of Airflow |
| 69 | +- `range`: optional range of allowed values when `"alert_conditions": "value"` |
| 70 | + |
18 | 71 | ## Deployment |
19 | 72 |
|
20 | 73 | To generate a `bigconfig.yml` file with the default metrics when monitoring is enabled run: `bqetl monitoring update [PATH]`. |
21 | 74 | The created file can be manually edited. For tables that do not have a `bigconfig.yml` checked into the repository, the file will get generated automatically before deployment to Bigeye. Files only need to be checked in if there are some customizations. |
22 | 75 |
|
23 | | -To manually deploy a Bigconfig file run: `bqetl monitoring deploy [PATH]`. The environment variable `BIGEYE_API_KEY` needs to be set to a valid API token that can be [created in Bigeye](https://docs.bigeye.com/docs/using-api-keys). |
24 | | - |
25 | | -The deployment of Bigconfig files also runs automatically as part of the [artifact deployment process](../../concepts/pipeline/artifact_deployment.md), after tables and views have been deployed. |
| 76 | +The deployment of Bigconfig files runs automatically as part of the [artifact deployment process](../../concepts/pipeline/artifact_deployment.md), after tables and views have been deployed. |
0 commit comments