|
2 | 2 | title: "R Data Validation: pointblank"
|
3 | 3 | ---
|
4 | 4 |
|
5 |
| -TBD |
| 5 | +- See this workshop repo for reference: <https://github.com/posit-conf-2024/ds-workflows-r> |
| 6 | +- Specifically this slide deck: <https://posit-conf-2024.github.io/ds-workflows-r/slides/03_data_validation_and_alerting.html#/> |
6 | 7 |
|
7 |
| -See this workshop repo for reference: <https://github.com/posit-conf-2024/ds-workflows-r> |
| 8 | + |
| 9 | +```{r} |
| 10 | +library(pointblank) |
| 11 | +data(small_table) |
| 12 | +small_table |
| 13 | +``` |
| 14 | + |
| 15 | +## Validation Rules |
| 16 | + |
| 17 | +All the validation rule functions begin with `col_*()`: |
| 18 | +<https://rstudio.github.io/pointblank/reference/index.html#validation-expectation-and-test-functions> |
| 19 | + |
| 20 | +Here we want to check that all values in the `a` column are less than `10`. |
| 21 | +We can use the `col_vals_lt()` |
| 22 | + |
| 23 | + |
| 24 | +```{r} |
| 25 | +small_table |> |
| 26 | + col_vals_lt(a, value = 10) |
| 27 | +``` |
| 28 | + |
| 29 | +If the table passes the rule, you get a table back. |
| 30 | +Otherwise you will get an error: |
| 31 | + |
| 32 | +```{r} |
| 33 | +#| error: true |
| 34 | +
|
| 35 | +
|
| 36 | +small_table |> |
| 37 | + col_vals_lt(a, value = 5) |
| 38 | +``` |
| 39 | + |
| 40 | +This allows you to chain validation rules. |
| 41 | + |
| 42 | +```{r} |
| 43 | +#| error: true |
| 44 | +
|
| 45 | +
|
| 46 | +small_table |> |
| 47 | + col_vals_lt(a, value = 10) |> |
| 48 | + col_vals_between(d, left = 0, right = 5000) |> |
| 49 | + col_vals_in_set(f, set = c("low", "mid", "high")) |> |
| 50 | + col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$") |
| 51 | +``` |
| 52 | + |
| 53 | +We can fix this by either fixing the data, |
| 54 | +or the actual test. |
| 55 | + |
| 56 | +```{r} |
| 57 | +small_table |> |
| 58 | + col_vals_lt(a, value = 10) |> |
| 59 | + col_vals_between(d, left = 0, right = 10000) |> |
| 60 | + col_vals_in_set(f, set = c("low", "mid", "high")) |> |
| 61 | + col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$") |
| 62 | +``` |
| 63 | + |
| 64 | +## Validation Table |
| 65 | + |
| 66 | +From the docs: |
| 67 | + |
| 68 | +> There are three things that should be noted here: |
| 69 | +> |
| 70 | +> - Validation steps: each step is a separate test on the table, focused on a certain aspect of the table. |
| 71 | +> - Validation rules: the validation type is provided here along with key constraints. |
| 72 | +> - Validation results: interrogation results are provided here, with a breakdown of test units (total, passing, and failing), threshold flags, and more. |
| 73 | +
|
| 74 | +Create the `agent`, apply validation rules, then `interrogate()` it. |
| 75 | + |
| 76 | +```{r} |
| 77 | +agent <- small_table |> |
| 78 | + create_agent() |> |
| 79 | + col_vals_lt(a, value = 10) |> |
| 80 | + col_vals_between(d, left = 0, right = 5000) |> |
| 81 | + col_vals_in_set(f, set = c("low", "mid", "high")) |> |
| 82 | + col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$") |
| 83 | +``` |
| 84 | + |
| 85 | +Running the `interrogate()` on the `agent`, will print out the validation table, |
| 86 | +but also |
| 87 | + |
| 88 | +```{r} |
| 89 | +agent |> |
| 90 | + interrogate() |
| 91 | +``` |
| 92 | + |
| 93 | +## Post-interrogation |
| 94 | + |
| 95 | +There are a few |
| 96 | +[post-interrogation](https://rstudio.github.io/pointblank/reference/index.html#post-interrogation) |
| 97 | +steps you can do with your agents. |
| 98 | +One of the more useful ones may be separately looking at the passing and failing data. |
| 99 | + |
| 100 | +The `get_sundered_data()` and `get_data_extracts()` are a few useful functions to accomplish this goal. |
| 101 | + |
| 102 | +Here is our agent that has failing validation checks. |
| 103 | + |
| 104 | +:::{.callout-important} |
| 105 | +Don't forget to `interrogate()` your `agent` before running post-interrogation functions. |
| 106 | +::: |
| 107 | + |
| 108 | +```{r} |
| 109 | +agent <- small_table |> |
| 110 | + create_agent() |> |
| 111 | + col_vals_lt(a, value = 10) |> |
| 112 | + col_vals_between(d, left = 0, right = 5000) |> |
| 113 | + col_vals_in_set(f, set = c("low", "mid", "high")) |> |
| 114 | + col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$") |> |
| 115 | + interrogate() |
| 116 | +``` |
| 117 | + |
| 118 | +We can get a separate or combined version of our passing and failing observations. |
| 119 | + |
| 120 | + |
| 121 | +```{r} |
| 122 | +get_sundered_data(agent, type = "pass") |
| 123 | +``` |
| 124 | + |
| 125 | +```{r} |
| 126 | +get_sundered_data(agent, type = "fail") |
| 127 | +``` |
| 128 | + |
| 129 | +The `combined` option creates a `.ph_combined` column that you can use in a downstream process. |
| 130 | + |
| 131 | +```{r} |
| 132 | +get_sundered_data(agent, type = "combined") |
| 133 | +``` |
| 134 | + |
| 135 | +You can also use the `get_data_extracts()` function to get the values in a list. |
| 136 | + |
| 137 | + |
| 138 | +```{r} |
| 139 | +get_data_extracts(agent) |
| 140 | +``` |
| 141 | + |
| 142 | +## Pipeline Data Validation |
| 143 | + |
| 144 | +When you want to run your validation checks non-interactively, |
| 145 | +you may want to use `{pointblank}` in the |
| 146 | +[Pipeline Data Validation Workflow](https://rstudio.github.io/pointblank/articles/VALID-II.html). |
| 147 | + |
| 148 | +In this workflow we do not need to create an `agent` object, |
| 149 | +and rely on the actual warning or failures from the validation checks. |
| 150 | + |
| 151 | +```{r} |
| 152 | +#| error: true |
| 153 | +
|
| 154 | +
|
| 155 | +small_table %>% |
| 156 | + col_is_posix(date_time) %>% |
| 157 | + col_vals_in_set(f, set = c("low", "mid", "high")) %>% |
| 158 | + col_vals_lt(a, value = 10) %>% |
| 159 | + col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$") %>% |
| 160 | + col_vals_between(d, left = 0, right = 5000) |
| 161 | +``` |
| 162 | + |
| 163 | +You can also set thresholds for when validation checks throw errors or warnings: |
| 164 | +<https://rstudio.github.io/pointblank/articles/VALID-II.html#using-warn_on_fail-and-stop_on_fail-functions-to-generate-simple-action_levels> |
0 commit comments