Skip to content

Commit fbcf9dc

Browse files
committed
updating documentation
1 parent 3d30581 commit fbcf9dc

File tree

4 files changed

+37
-10
lines changed

4 files changed

+37
-10
lines changed

R/locate-errors.R

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
#'
33
#' Find out which fields in a data.frame are "faulty" using validation rules
44
#' This method returns found errors, according to the specified method `x`.
5-
#' Use method [replace_errors()], to automatically remove these errors.
5+
#' Use method [replace_errors()], to automatically remove these errors. Use
6+
#' `[base::set.seed()]` beforehand to make the function call reproducible.
67
#' `
78
#'
89
#' Use an `Inf` `weight` specification to fixate variables that can not be changed.

examples/locate_errors.R

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ data <- data.frame( profit = 755
66
, cost = 125
77
, turnover = 200
88
)
9+
10+
# use set.seed to maake results reproducible
11+
set.seed(42)
912
le <- locate_errors(data, rules)
1013

1114
print(le)
@@ -29,13 +32,17 @@ v_logical <- validator( citizen %in% c(TRUE, FALSE)
2932
)
3033

3134
data <- data.frame(voted = TRUE, citizen = FALSE)
35+
36+
set.seed(42)
3237
locate_errors(data, v_logical, weight=c(2,1))$errors
3338

3439
# try a condinational rule
3540
v <- validator( married %in% c(TRUE, FALSE)
3641
, if (married==TRUE) age >= 17
3742
)
3843
data <- data.frame( married = TRUE, age = 16)
44+
45+
set.seed(42)
3946
locate_errors(data, v, weight=c(married=1, age=2))$errors
4047

4148

@@ -52,9 +59,12 @@ weight <- read.csv(text=
5259
2, 1
5360
", strip.white = TRUE)
5461

62+
set.seed(42)
5563
locate_errors(data, v, weight = weight)$errors
5664

5765
# fixate / exclude a variable from error localiziation
5866
# using an Inf weight
5967
weight <- c(age = Inf)
68+
69+
set.seed(42)
6070
locate_errors(data, v, weight = weight)$errors

man/locate_errors.Rd

Lines changed: 12 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/errorlocate.Rmd

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,9 @@ rules <- validator( r1 = age > 0
121121
With `validate::confront` we can see that rule `r2` is violated (record 2).
122122

123123
```{r}
124-
summary(confront(d, rules))
124+
d |>
125+
confront(rules) |>
126+
summary()
125127
```
126128

127129

@@ -143,7 +145,7 @@ With `replace_errors` we can remove the errors (which still need to be imputed).
143145

144146
```{r}
145147
d_fixed <- replace_errors(d, le)
146-
summary(confront(d_fixed, rules))
148+
d_fixed |> confront(rules) |>summary()
147149
```
148150
In which `replace_errors` set all faulty values to `NA`.
149151

@@ -153,9 +155,10 @@ d_fixed
153155

154156
### Weights
155157

156-
`locate_errors` allows for supplying weigths for the variables.
158+
`locate_errors` allows for supplying weights for the variables.
157159
It is common that the quality of the observed variables differs.
158-
When we have more trust in `age` we can give it more weight so it chooses
160+
When we have more trust in `age` because it was retrieved from the
161+
official population register, we can give it more weight so it chooses
159162
income when it has to decide between the two (record 2):
160163

161164
```{r}
@@ -184,12 +187,12 @@ For example given the rule:
184187
and the following data:
185188

186189
```{r, echo=FALSE}
187-
d <- data.frame(age = 4, married = TRUE)
188-
d
190+
d2 <- data.frame(age = 4, married = TRUE)
191+
d2
189192
```
190193
Then either `age` or `married` can be considered faulty.
191194
When no weights are specified, `locate_errors` will randomly choose one of the two.
192-
It does this by adding a small amount of random noise to the weights internally.
195+
It does this by adding internally a small amount of random noise to the weights.
193196
To make sure that the results are reproducible, it is good practice to use `set.seed`
194197
before calling `locate_errors`.
195198

@@ -210,6 +213,8 @@ indicates the time spent to find a solution for each record. This can
210213
be restricted using the argument `timeout` (s).
211214

212215
```{r}
213-
# duration is in seconds.
216+
# restrict time per record to max 30 seconds
217+
le <- locate_errors(d, rules, timeout=30)
218+
# duration is in seconds.
214219
le$duration
215220
```

0 commit comments

Comments
 (0)