Skip to content

Commit a04e203

Browse files
committed
pointblank r content
1 parent 9ef2df5 commit a04e203

File tree

2 files changed

+165
-8
lines changed

2 files changed

+165
-8
lines changed

book/lectures/131-data_validation-r-pointblank.qmd

Lines changed: 159 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,163 @@
22
title: "R Data Validation: pointblank"
33
---
44

5-
TBD
5+
- See this workshop repo for reference: <https://github.com/posit-conf-2024/ds-workflows-r>
6+
- Specifically this slide deck: <https://posit-conf-2024.github.io/ds-workflows-r/slides/03_data_validation_and_alerting.html#/>
67

7-
See this workshop repo for reference: <https://github.com/posit-conf-2024/ds-workflows-r>
8+
9+
```{r}
10+
library(pointblank)
11+
data(small_table)
12+
small_table
13+
```
14+
15+
## Validation Rules
16+
17+
All the validation rule functions begin with `col_*()`:
18+
<https://rstudio.github.io/pointblank/reference/index.html#validation-expectation-and-test-functions>
19+
20+
Here we want to check that all values in the `a` column are less than `10`.
21+
We can use the `col_vals_lt()`
22+
23+
24+
```{r}
25+
small_table |>
26+
col_vals_lt(a, value = 10)
27+
```
28+
29+
If the table passes the rule, you get a table back.
30+
Otherwise you will get an error:
31+
32+
```{r}
33+
#| error: true
34+
35+
36+
small_table |>
37+
col_vals_lt(a, value = 5)
38+
```
39+
40+
This allows you to chain validation rules.
41+
42+
```{r}
43+
#| error: true
44+
45+
46+
small_table |>
47+
col_vals_lt(a, value = 10) |>
48+
col_vals_between(d, left = 0, right = 5000) |>
49+
col_vals_in_set(f, set = c("low", "mid", "high")) |>
50+
col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$")
51+
```
52+
53+
We can fix this by either fixing the data,
54+
or the actual test.
55+
56+
```{r}
57+
small_table |>
58+
col_vals_lt(a, value = 10) |>
59+
col_vals_between(d, left = 0, right = 10000) |>
60+
col_vals_in_set(f, set = c("low", "mid", "high")) |>
61+
col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$")
62+
```
63+
64+
## Validation Table
65+
66+
From the docs:
67+
68+
> There are three things that should be noted here:
69+
>
70+
> - Validation steps: each step is a separate test on the table, focused on a certain aspect of the table.
71+
> - Validation rules: the validation type is provided here along with key constraints.
72+
> - Validation results: interrogation results are provided here, with a breakdown of test units (total, passing, and failing), threshold flags, and more.
73+
74+
Create the `agent`, apply validation rules, then `interrogate()` it.
75+
76+
```{r}
77+
agent <- small_table |>
78+
create_agent() |>
79+
col_vals_lt(a, value = 10) |>
80+
col_vals_between(d, left = 0, right = 5000) |>
81+
col_vals_in_set(f, set = c("low", "mid", "high")) |>
82+
col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$")
83+
```
84+
85+
Running the `interrogate()` on the `agent`, will print out the validation table,
86+
but also
87+
88+
```{r}
89+
agent |>
90+
interrogate()
91+
```
92+
93+
## Post-interrogation
94+
95+
There are a few
96+
[post-interrogation](https://rstudio.github.io/pointblank/reference/index.html#post-interrogation)
97+
steps you can do with your agents.
98+
One of the more useful ones may be separately looking at the passing and failing data.
99+
100+
The `get_sundered_data()` and `get_data_extracts()` are a few useful functions to accomplish this goal.
101+
102+
Here is our agent that has failing validation checks.
103+
104+
:::{.callout-important}
105+
Don't forget to `interrogate()` your `agent` before running post-interrogation functions.
106+
:::
107+
108+
```{r}
109+
agent <- small_table |>
110+
create_agent() |>
111+
col_vals_lt(a, value = 10) |>
112+
col_vals_between(d, left = 0, right = 5000) |>
113+
col_vals_in_set(f, set = c("low", "mid", "high")) |>
114+
col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$") |>
115+
interrogate()
116+
```
117+
118+
We can get a separate or combined version of our passing and failing observations.
119+
120+
121+
```{r}
122+
get_sundered_data(agent, type = "pass")
123+
```
124+
125+
```{r}
126+
get_sundered_data(agent, type = "fail")
127+
```
128+
129+
The `combined` option creates a `.ph_combined` column that you can use in a downstream process.
130+
131+
```{r}
132+
get_sundered_data(agent, type = "combined")
133+
```
134+
135+
You can also use the `get_data_extracts()` function to get the values in a list.
136+
137+
138+
```{r}
139+
get_data_extracts(agent)
140+
```
141+
142+
## Pipeline Data Validation
143+
144+
When you want to run your validation checks non-interactively,
145+
you may want to use `{pointblank}` in the
146+
[Pipeline Data Validation Workflow](https://rstudio.github.io/pointblank/articles/VALID-II.html).
147+
148+
In this workflow we do not need to create an `agent` object,
149+
and rely on the actual warning or failures from the validation checks.
150+
151+
```{r}
152+
#| error: true
153+
154+
155+
small_table %>%
156+
col_is_posix(date_time) %>%
157+
col_vals_in_set(f, set = c("low", "mid", "high")) %>%
158+
col_vals_lt(a, value = 10) %>%
159+
col_vals_regex(b, regex = "^[0-9]-[a-z]{3}-[0-9]{3}$") %>%
160+
col_vals_between(d, left = 0, right = 5000)
161+
```
162+
163+
You can also set thresholds for when validation checks throw errors or warnings:
164+
<https://rstudio.github.io/pointblank/articles/VALID-II.html#using-warn_on_fail-and-stop_on_fail-functions-to-generate-simple-action_levels>

renv.lock

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -408,7 +408,7 @@
408408
},
409409
"knitr": {
410410
"Package": "knitr",
411-
"Version": "1.49",
411+
"Version": "1.50",
412412
"Source": "Repository",
413413
"Repository": "CRAN",
414414
"Requirements": [
@@ -420,7 +420,7 @@
420420
"xfun",
421421
"yaml"
422422
],
423-
"Hash": "9fcb189926d93c636dea94fbe4f44480"
423+
"Hash": "5a07d8ec459d7b80bd4acca5f4a6e062"
424424
},
425425
"labeling": {
426426
"Package": "labeling",
@@ -739,7 +739,7 @@
739739
},
740740
"testthat": {
741741
"Package": "testthat",
742-
"Version": "3.2.1.1",
742+
"Version": "3.2.3",
743743
"Source": "Repository",
744744
"Repository": "CRAN",
745745
"Requirements": [
@@ -764,7 +764,7 @@
764764
"waldo",
765765
"withr"
766766
],
767-
"Hash": "3f6e7e5e2220856ff865e4834766bf2b"
767+
"Hash": "42f889439ccb14c55fc3d75c9c755056"
768768
},
769769
"tibble": {
770770
"Package": "tibble",
@@ -874,7 +874,7 @@
874874
},
875875
"xfun": {
876876
"Package": "xfun",
877-
"Version": "0.49",
877+
"Version": "0.51",
878878
"Source": "Repository",
879879
"Repository": "CRAN",
880880
"Requirements": [
@@ -883,7 +883,7 @@
883883
"stats",
884884
"tools"
885885
],
886-
"Hash": "8687398773806cfff9401a2feca96298"
886+
"Hash": "e1a3c06389a46d065c18bd4bbc27c64c"
887887
},
888888
"yaml": {
889889
"Package": "yaml",

0 commit comments

Comments
 (0)