Skip to content

Commit e18779a

Browse files
simplify explanation of constituent tables (#7084)
1 parent dd94b92 commit e18779a

File tree

1 file changed

+21
-23
lines changed

1 file changed

+21
-23
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 21 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ It assumes familiarity with the `data.table` syntax. If that is not the case, pl
3838

3939
## 1. Defining example data
4040

41-
To illustrate how to use the method available with real life examples, let's simulate a **normalized database** from a little supermarket by performing the following steps:
41+
To illustrate how to use the method available with real life examples, let's simulate a **normalized database** from a little supermarket by defining the following tables in a database:
4242

43-
1. Defining a `data.table` where each product is represented by a row with some qualities, but leaving one product without `id` to show how the framework deals with ***missing values***.
43+
1. `Products`, a table with rows giving characteristics of various products. To show how the framework deals with ***missing values***, one `id` is `NA`.
4444

45-
```{r}
45+
```{r, define_products}
4646
Products = rowwiseDT(
4747
id=, name=, price=, unit=, type=,
4848
1L, "banana", 0.63, "unit", "natural",
@@ -53,9 +53,9 @@ Products = rowwiseDT(
5353
)
5454
```
5555

56-
2. Defining a `data.table` showing the proportion of taxes to be applied for processed products based on their units.
56+
2. `NewTax`, a table with rows defining some taxes associated with processed products based on their units.
5757

58-
```{r}
58+
```{r define_new_tax}
5959
NewTax = data.table(
6060
unit = c("unit", "ounce"),
6161
type = "processed",
@@ -66,38 +66,36 @@ NewTax
6666
```
6767

6868

69-
3. Defining a `data.table` simulating the products received every Monday with a `product_id` that is not present in the `Products` table.
69+
3. `ProductReceived`, a table with rows simulating weekly incoming inventory.
7070

71-
```{r}
71+
```{r define_product_received}
7272
set.seed(2156)
7373
74+
# NB: Jan 8, 2024 is a Monday.
75+
receipt_dates = seq(from=as.IDate("2024-01-08"), length.out=10L, by="week")
76+
7477
ProductReceived = data.table(
75-
id = 1:10,
76-
date = seq(from = as.IDate("2024-01-08"), length.out = 10L, by = "week"),
77-
product_id = sample(c(NA_integer_, 1:3, 6L), size = 10L, replace = TRUE),
78-
count = sample(c(50L, 100L, 150L), size = 10L, replace = TRUE)
78+
id=1:10, # unique identifier for an supply transaction
79+
date=receipt_dates,
80+
product_id=sample(c(NA, 1:3, 6L), size=10L, replace=TRUE), # NB: product '6' is not recorded in Products above.
81+
count=sample(c(50L, 100L, 150L), size=10L, replace=TRUE)
7982
)
8083
8184
ProductReceived
8285
```
8386

84-
4. Defining a `data.table` to show some sales that can take place on weekdays with another `product_id` that is not present in the `Products` table.
85-
86-
```{r}
87-
sample_date = function(from, to, size, ...){
88-
all_days = seq(from = from, to = to, by = "day")
89-
weekdays = all_days[wday(all_days) %in% 2:6]
90-
days_sample = sample(weekdays, size, ...)
91-
days_sample_desc = sort(days_sample)
92-
days_sample_desc
93-
}
87+
4. `ProductSales`, a table with rows simulating customer transactions.
9488

89+
```{r define_product_sales}
9590
set.seed(5415)
9691
92+
# Monday-Friday (4 days later) for each of the weeks present in ProductReceived
93+
possible_weekdays <- as.IDate(sapply(receipt_dates, `+`, 0:4))
94+
9795
ProductSales = data.table(
9896
id = 1:10,
99-
date = ProductReceived[, sample_date(min(date), max(date), 10L)],
100-
product_id = sample(c(1:3, 7L), size = 10L, replace = TRUE),
97+
date = sort(sample(possible_weekdays, 10L)),
98+
product_id = sample(c(1:3, 7L), size = 10L, replace = TRUE), # NB: product '7' is in neither Products nor ProductReceived.
10199
count = sample(c(50L, 100L, 150L), size = 10L, replace = TRUE)
102100
)
103101

0 commit comments

Comments
 (0)