Skip to content

Commit ddc4207

Browse files
simplify explanation of constituent tables
1 parent e1409e8 commit ddc4207

File tree

1 file changed

+21
-23
lines changed

1 file changed

+21
-23
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 21 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ It assumes familiarity with the `data.table` syntax. If that is not the case, pl
3838

3939
## 1. Defining example data
4040

41-
To illustrate how to use the method available with real life examples, let's simulate a **normalized database** from a little supermarket by performing the following steps:
41+
To illustrate how to use the method available with real life examples, let's simulate a **normalized database** from a little supermarket by defining the following tables in a database:
4242

43-
1. Defining a `data.table` where each product is represented by a row with some qualities, but leaving one product without `id` to show how the framework deals with ***missing values***.
43+
1. `Products`, a table with rows giving characteristics of various products. To show how the framework deals with ***missing values***, one `id` is `NA`.
4444

45-
```{r}
45+
```{r define_products}
4646
Products = data.table(
4747
id = c(1:4,
4848
NA_integer_),
@@ -68,9 +68,9 @@ Products = data.table(
6868
Products
6969
```
7070

71-
2. Defining a `data.table` showing the proportion of taxes to be applied for processed products based on their units.
71+
2. `NewTax`, a table with rows defining some taxes associated with processed products based on their units.
7272

73-
```{r}
73+
```{r define_new_tax}
7474
NewTax = data.table(
7575
unit = c("unit","ounce"),
7676
type = "processed",
@@ -81,38 +81,36 @@ NewTax
8181
```
8282

8383

84-
3. Defining a `data.table` simulating the products received every Monday with a `product_id` that is not present in the `Products` table.
84+
3. `ProductReceived`, a table with rows simulating weekly incoming inventory.
8585

86-
```{r}
86+
```{r define_product_received}
8787
set.seed(2156)
8888
89+
# NB: Jan 8, 2024 is a Monday.
90+
receipt_dates = seq(from=as.IDate("2024-01-08"), length.out=10L, by="week")
91+
8992
ProductReceived = data.table(
90-
id = 1:10,
91-
date = seq(from = as.IDate("2024-01-08"), length.out = 10L, by = "week"),
92-
product_id = sample(c(NA_integer_, 1:3, 6L), size = 10L, replace = TRUE),
93-
count = sample(c(50L, 100L, 150L), size = 10L, replace = TRUE)
93+
id=1:10, # unique identifier for an supply transaction
94+
date=receipt_dates,
95+
product_id=sample(c(NA, 1:3, 6L), size=10L, replace=TRUE), # NB: product '6' is not recorded in Products above.
96+
count=sample(c(50L, 100L, 150L), size=10L, replace=TRUE)
9497
)
9598
9699
ProductReceived
97100
```
98101

99-
4. Defining a `data.table` to show some sales that can take place on weekdays with another `product_id` that is not present in the `Products` table.
100-
101-
```{r}
102-
sample_date = function(from, to, size, ...){
103-
all_days = seq(from = from, to = to, by = "day")
104-
weekdays = all_days[wday(all_days) %in% 2:6]
105-
days_sample = sample(weekdays, size, ...)
106-
days_sample_desc = sort(days_sample)
107-
days_sample_desc
108-
}
102+
4. `ProductSales`, a table with rows simulating customer transactions.
109103

104+
```{r define_product_sales}
110105
set.seed(5415)
111106
107+
# Monday-Friday (4 days later) for each of the weeks present in ProductReceived
108+
possible_weekdays <- as.IDate(sapply(receipt_dates, `+`, 0:4))
109+
112110
ProductSales = data.table(
113111
id = 1:10,
114-
date = ProductReceived[, sample_date(min(date), max(date), 10L)],
115-
product_id = sample(c(1:3, 7L), size = 10L, replace = TRUE),
112+
date = sort(sample(possible_weekdays, 10L)),
113+
product_id = sample(c(1:3, 7L), size = 10L, replace = TRUE), # NB: product '7' is in neither Products nor ProductReceived.
116114
count = sample(c(50L, 100L, 150L), size = 10L, replace = TRUE)
117115
)
118116

0 commit comments

Comments
 (0)