Skip to content

Commit dac4255

Browse files
#2611 Update week() calculation so week 1 contains ydays 1–7 (#7380)
* accomodated breaking change * sequential and legacy * updated test * Update NEWS.md --------- Co-authored-by: Benjamin Schwendinger <[email protected]>
1 parent b3acf2a commit dac4255

File tree

3 files changed

+51
-4
lines changed

3 files changed

+51
-4
lines changed

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,11 @@
3838
3939
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
4040
41+
2. The behavior of `week()` will be changed in a future release to calculate weeks sequentially (days 1-7 as week 1), which is a potential breaking change. For now, the current "legacy" behavior, where week numbers advance every 7th day of the year (e.g., day 7 starts week 2), remains the default, and a deprecation warning will be issued when the old and new behaviors differ. Users can control this behavior with the temporary option `options(datatable.week = "...")`:
42+
* `"sequential"`: Opt-in to the new, sequential behavior (no warning).
43+
* `"legacy"`: Continue using the legacy behavior but suppress the deprecation warning.
44+
See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. Thanks @MichaelChirico for the report and @venom1204 for the implementation.
45+
4146
### NEW FEATURES
4247
4348
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also matches `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.

inst/tests/tests.Rraw

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18384,7 +18384,7 @@ x = c("1111-11-11", "2019-01-01", "2019-02-28", "2019-03-01", "2019-12-31", "202
1838418384
test(2236.1, yday(x), c(315L, 1L, 59L, 60L, 365L, 60L, 61L, 366L, 1L, 366L, 60L, NA))
1838518385
test(2236.2, mday(x), c(11L, 1L, 28L, 1L, 31L, 29L, 1L, 31L, 1L, 31L, 1L, NA))
1838618386
test(2236.3, wday(x), c(7L, 3L, 5L, 6L, 3L, 7L, 1L, 5L, 1L, 2L, 2L, NA))
18387-
test(2236.4, week(x), c(46L, 1L, 9L, 9L, 53L, 9L, 9L, 53L, 1L, 53L, 9L, NA))
18387+
test(2236.4, options = c(datatable.week = "legacy"), week(x), c(46L, 1L, 9L, 9L, 53L, 9L, 9L, 53L, 1L, 53L, 9L, NA))
1838818388
test(2236.5, month(x), c(11L, 1L, 2L, 3L, 12L, 2L, 3L, 12L, 1L, 12L, 3L, NA))
1838918389
test(2236.6, quarter(x), c(4L, 1L, 1L, 1L, 4L, 1L, 1L, 4L, 1L, 4L, 1L, NA))
1839018390
test(2236.7, year(x), c(1111L, 2019L, 2019L, 2019L, 2019L, 2020L, 2020L, 2020L, 2040L, 2040L, 2100L, NA))
@@ -21815,3 +21815,12 @@ test(2341.24, fread('a
2181521815
# leading cmnt
2181621816
b
2181721817
', comment.char = '#', strip.white = FALSE, sep = ","), data.table(a=c(" ", "b")))
21818+
21819+
# week() sequential numbering fix tests #2611
21820+
test(2342.1, options = c(datatable.week = "sequential"), week(as.IDate("1970-01-01") + 0:7), c(1L,1L,1L,1L,1L,1L,1L,2L))
21821+
test(2342.2, options = c(datatable.week = "sequential"), week(as.IDate(c("2012-02-28","2012-02-29","2012-03-01"))), c(9L,9L,9L))
21822+
test(2342.3, options = c(datatable.week = "sequential"), week(as.IDate(c("2019-12-31","2020-01-01"))), c(53L,1L))
21823+
test(2342.4, options = c(datatable.week = "sequential"), week(as.IDate(c("2020-12-31","2021-01-01"))), c(53L,1L))
21824+
test(2342.5, options = c(datatable.week = "sequential"), week(as.IDate("2021-01-06") + 0:6), c(1L,1L,2L,2L,2L,2L,2L))
21825+
test(2342.6, options = c(datatable.week = "sequential"), week(as.IDate(c("2016-02-27","2016-02-28","2016-02-29","2016-03-01","2016-03-02"))), c(9L,9L,9L,9L,9L))
21826+
test(2342.7, options = c(datatable.week = "default"), week(as.IDate("1970-01-07")), 2L, warning = "The default behavior of week() is changing")

src/idatetime.c

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,11 @@ void convertSingleDate(int x, datetype type, void *out)
5858

5959
int leap = !years1 && (years4 || !years100);
6060

61-
if (type == YDAY || type == WEEK) {
61+
if (type == YDAY) {
6262
int yday = days + 31 + 28 + leap;
6363
if (yday >= YEARS1 + leap)
6464
yday -= YEARS1 + leap;
6565
*(int *)out = ++yday;
66-
if (type == WEEK)
67-
*(int *)out = (*(int *)out / 7) + 1;
6866
return;
6967
}
7068

@@ -143,6 +141,41 @@ SEXP convertDate(SEXP x, SEXP type)
143141
else if (!strcmp(ctype_str, "yearqtr")) { ctype = YEARQTR; ansint = false; }
144142
else internal_error(__func__, "invalid type, should have been caught before"); // # nocov
145143

144+
if (ctype == WEEK) {
145+
SEXP ans = PROTECT(allocVector(INTSXP, n));
146+
int *ansp = INTEGER(ans);
147+
148+
SEXP opt = GetOption(install("datatable.week"), R_NilValue);
149+
const char *mode = isString(opt) && length(opt) == 1 ? CHAR(STRING_ELT(opt, 0)) : "default";
150+
151+
bool use_sequential = !strcmp(mode, "sequential");
152+
bool use_legacy = !strcmp(mode, "legacy");
153+
bool can_warn = !use_sequential && !use_legacy;
154+
155+
for (int i = 0; i < n; i++) {
156+
if (ix[i] == NA_INTEGER) {
157+
ansp[i] = NA_INTEGER;
158+
continue;
159+
}
160+
int yday;
161+
convertSingleDate(ix[i], YDAY, &yday);
162+
int new_week = ((yday - 1) / 7) + 1;
163+
164+
if (use_sequential) {
165+
ansp[i] = new_week;
166+
} else {
167+
int old_week = (yday / 7) + 1;
168+
ansp[i] = old_week;
169+
if (can_warn && new_week != old_week) {
170+
warning(_("The default behavior of week() is changing. Previously ('legacy' mode), week numbers advanced every 7th day of the year. The new 'sequential' mode ensures the first week always has 7 days. For example, as.IDate('2023-01-07') returns week 2 in legacy mode but week 1 in sequential mode (week 2 starts on '2023-01-08'). To adopt the new behavior now, set options(datatable.week = 'sequential'). To keep the old results and silence this warning, set options(datatable.week = 'legacy'). See https://github.com/Rdatatable/data.table/issues/2611"));
171+
can_warn = false;
172+
}
173+
}
174+
}
175+
UNPROTECT(1);
176+
return ans;
177+
}
178+
146179
if (ansint) {
147180
SEXP ans = PROTECT(allocVector(INTSXP, n));
148181
int *ansp = INTEGER(ans);

0 commit comments

Comments
 (0)