You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-85Lines changed: 34 additions & 85 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -275,17 +275,16 @@ also use the DOT user interface for tests, for more details please see section s
275
275
The DOT will run tests against user-defined views onto the underlying data. These views are called "entities" and defined in table `dot.configured_entities`:
276
276
277
277
278
-
| Column | Description |
279
-
| :----------- | :----------- |
280
-
| entity_id | UUID of the entity |
281
-
| entity_name | Name of the entity e.g. ancview_danger_sign |
| entity_id | Name of the entity e.g. ancview_danger_sign |
282
281
| entity_category | Category of the entity e.g. anc => needs to be in `dot.entity_categories`|
283
-
| entity_definition | String for the SQL query that defines the entity |
282
+
| entity_definition | String for the SQL query that defines the entity |
284
283
285
284
For example, this would be an insert command to create `ancview_danger_sign`:
286
285
287
286
```postgres-sql
288
-
INSERT INTO dot.configured_entities VALUES('b05f1f9c-2176-46b0-8e8f-d6690f696b9b',
287
+
INSERT INTO dot.configured_entities (project_id,entity_id,entity_category,entity_definition,date_added,date_modified,last_updated_by) VALUES('Project1',
INSERT INTO dot.configured_tests VALUES(TRUE, 'ScanProject1', '3081f033-e8f4-4f3b-aea8-36f8c5df05dc', 'INCONSISTENT-1', 3,
431
-
'Wrong treatment/dosage arising from wrong age of children (WT-1)', '', '', 'baf349c9-c919-40ff-a611-61ddc59c2d52',
428
+
'Wrong treatment/dosage arising from wrong age of children (WT-1)', '', '', 'ancview_pregnancy',
432
429
'expression_is_true', '', '',
433
430
$${"name": "t_under_24_months_wrong_dosage", "expression": "malaria_act_dosage is not null", "condition": "(patient_age_in_months<24) and (malaria_give_act is not null)"}$$,
Custom SQL queries require special case because they must have `primary_table` and `primary_table_id_field` specified within the SQL query as shown below:
439
436
```
440
-
INSERT INTO dot.configured_tests VALUES(TRUE, 'ScanProject1', 'c4a3da8f-32f4-4e9b-b135-354de203ca90', 'TREAT-1', 6, 'Test for new family planning method (NFP-1)', '', '', '95bd0f60-ab59-48fc-a62e-f256f5f3e6de', 'custom_sql', '', '',
437
+
INSERT INTO dot.configured_tests VALUES(TRUE, 'ScanProject1', 'c4a3da8f-32f4-4e9b-b135-354de203ca90', 'TREAT-1', 6, 'Test for new family planning method (NFP-1)', '', '', 'ancview_pregnancy', 'custom_sql', '', '',
441
438
format('{%s: %s}',
442
439
to_json('query'::text),
443
440
to_json($query$
@@ -527,72 +524,7 @@ custom SQL query. Given this, there is a useful Postgres function which will ret
527
524
see 'Seeing the raw data for failed tests' above.
528
525
529
526
530
-
## More complex configuration options
531
-
532
-
All the configuration files must be located under the [config](dot/config) folder of the DOT.
533
-
534
-
### Main config file
535
-
536
-
The main config file must be called `dot_config.yml` and located at the top [config](dot/config) folder. Note that
537
-
this file will be ignored for version control. You may use the [example dot_config yaml](dot/config/example/dot_config.yml)
538
-
as a template.
539
-
540
-
Besides the DOT DB connection in the paragraph above, see below for additional config options.
541
-
542
-
#### Connection parameters for each of the projects to run
543
-
544
-
For each of the projects you would like to run, add a key to the DOT config yaml with the following structure:
545
-
```
546
-
<project_name>_db:
547
-
type: connection type e.g. postgres
548
-
host: host
549
-
user: username
550
-
pass: password
551
-
port: port number e.g 5432
552
-
dbname: database name
553
-
schema: schema name, e.g. public
554
-
threads: nubmer of threads for DBT, e.g. 4
555
-
```
556
-
557
-
#### Output schema suffix
558
-
559
-
The DOT generates 2 kind of database objects:
560
-
- Entities of the models that are being tested, e.g. assessments, follow ups, patients
561
-
- Results of the failing tests
562
-
563
-
If nothing is done, these objects would be created in the same schema as the original data for the project
564
-
(thus polluting the DB). If the key `output_schema_suffix` is added, its value will be added as a suffix; i.e. if the
565
-
project data is stored in a certain schema, the output objects will go to `<project_schema>_<schema_suffix>`
566
-
(e.g. to `public_tests` if the project schema is `public` and the suffix is set to `tests` in the lines above).
567
-
568
-
Note that this mechanism uses a DBT feature, and that the same applies to the GE tests.
569
-
570
-
#### Save passed tests
571
-
572
-
The key `save_passed_tests` accepts boolean values. If set to true, tha results of the passing tests will be also stored
573
-
to the DOT DB. If not, only the results of failing tests will be stored.
574
-
575
-
### Other config file locations
576
-
Optional configuration for DBT and Great Expectations can be added, per project, in a structure as follows.
577
-
578
-
```bash
579
-
|____config
580
-
| |____<project_name>
581
-
| | |____dbt
582
-
| | | |____profiles.yml
583
-
| | | |____dbt_project.yml
584
-
| | |____ge
585
-
| | | |____great_expectations.yml
586
-
| | | |____config_variables.yml
587
-
| | | |____batch_config.json
588
-
```
589
-
In general these customizations will not be needed, but only in some scenarios with particular requirements; these
590
-
require a deeper knowledge of the DOT and of either DBT and/or Great Expectations.
591
-
592
-
There are examples for all the files above under [this folder](dot/config/example/project_name). For each of the
593
-
files you want to customize, you may copy and adapt the examples provided following the directory structure above.
594
-
595
-
More details in the [config README](dot/config/README.md).
527
+
### Please refer to [CONTRIBUTING.md](./CONTRIBUTING.md) for information on more complex configuration options.
596
528
597
529
## How to visualize the results using Superset
598
530
@@ -706,7 +638,7 @@ NOTE: You might need to use docker-compose on some hosts.
706
638
707
639
`docker compose -f docker-compose-with-airflow.yml down -v`
708
640
709
-
### Running the DOT in Airflow
641
+
### Running the DOT in Airflow (Demo)
710
642
711
643
A DAG has been included which copies data from the uploaded DB dump into the DOT DB 'data_ScanProject1' schema, and then runs
712
644
the toolkit against this data. To do this ...
@@ -733,6 +665,23 @@ Or to run just DOT stage ...
733
665
734
666
`airflow tasks test run_dot_project run_dot 2022-03-01`
735
667
668
+
669
+
### Running the DOT in Airflow (Connecting to external databases)
670
+
671
+
The following instructions illustrate how to use a local airflow environment, connecting with external databases for the data and DOT.
672
+
673
+
**NOTE:** These are for illustrative purposes only. If using Airflow in production it's important that it is set up correctly
674
+
and does not expose a http connection to the internet, and also has adequate network security (firewal, strong password, etc)
675
+
676
+
1. Edit [./dot/dot_config.yml] and set the correct parameters for your external dot_db
677
+
2. Create a section for your data databases and set connection parameters
678
+
3. If you have a DAG json file `dot_projects.json` already, deploy it into `./airflow/dags`
679
+
4. Run steps 1-11 in [Configuring/Building Airflow Docker environment](#Configuring/Building Airflow Docker environment)
680
+
5. Run steps 12 and 13, but use the values for your external databases you configured in `dot_config.yml`
681
+
682
+
You will need to configure DOT tests and the DAG json file appropriately for your installation.
683
+
684
+
736
685
#### Adding more projects
737
686
738
687
If configuring Airflow in production, you will need to adjust `./docker/dot/dot_config.yml` accordingly. You can also
0 commit comments