@@ -121,7 +121,9 @@ rules <- validator( r1 = age > 0
121121With ` validate::confront ` we can see that rule ` r2 ` is violated (record 2).
122122
123123``` {r}
124- summary(confront(d, rules))
124+ d |>
125+ confront(rules) |>
126+ summary()
125127```
126128
127129
@@ -143,7 +145,7 @@ With `replace_errors` we can remove the errors (which still need to be imputed).
143145
144146``` {r}
145147d_fixed <- replace_errors(d, le)
146- summary(confront( d_fixed, rules))
148+ d_fixed |> confront( rules) |>summary( )
147149```
148150In which ` replace_errors ` set all faulty values to ` NA ` .
149151
@@ -153,9 +155,10 @@ d_fixed
153155
154156### Weights
155157
156- ` locate_errors ` allows for supplying weigths for the variables.
158+ ` locate_errors ` allows for supplying weights for the variables.
157159It is common that the quality of the observed variables differs.
158- When we have more trust in ` age ` we can give it more weight so it chooses
160+ When we have more trust in ` age ` because it was retrieved from the
161+ official population register, we can give it more weight so it chooses
159162income when it has to decide between the two (record 2):
160163
161164``` {r}
@@ -184,12 +187,12 @@ For example given the rule:
184187and the following data:
185188
186189``` {r, echo=FALSE}
187- d <- data.frame(age = 4, married = TRUE)
188- d
190+ d2 <- data.frame(age = 4, married = TRUE)
191+ d2
189192```
190193Then either ` age ` or ` married ` can be considered faulty.
191194When no weights are specified, ` locate_errors ` will randomly choose one of the two.
192- It does this by adding a small amount of random noise to the weights internally .
195+ It does this by adding internally a small amount of random noise to the weights.
193196To make sure that the results are reproducible, it is good practice to use ` set.seed `
194197before calling ` locate_errors ` .
195198
@@ -210,6 +213,8 @@ indicates the time spent to find a solution for each record. This can
210213be restricted using the argument ` timeout ` (s).
211214
212215``` {r}
213- # duration is in seconds.
216+ # restrict time per record to max 30 seconds
217+ le <- locate_errors(d, rules, timeout=30)
218+ # duration is in seconds.
214219le$duration
215220```
0 commit comments