Skip to content

Commit 16d1d65

Browse files
Merge branch 'master' into new_branch
2 parents 26c40bf + c0469f4 commit 16d1d65

File tree

10 files changed

+169
-11
lines changed

10 files changed

+169
-11
lines changed

.github/CODE_OF_CONDUCT.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
2+
3+
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, or nationality.
4+
5+
Examples of unacceptable behavior by participants include:
6+
7+
* The use of sexualized language or imagery
8+
* Personal attacks
9+
* Trolling or insulting/derogatory comments
10+
* Public or private harassment
11+
* Publishing other's private information, such as physical or electronic addresses, without explicit permission
12+
* Other unethical or unprofessional conduct
13+
14+
Project members with the Committer role have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
15+
16+
By adopting this Code of Conduct, project members commit themselves to fairly and consistently apply these principles to every aspect of managing this project. Project maintainers who do not follow or enforce the Code of Conduct may be permanently removed from the project team.
17+
18+
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community.
19+
20+
21+
## Reporting
22+
23+
Project members with the Committer role or the CRAN Maintainer role are pledged to promptly address any reported issues. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to any individual with this role.
24+
25+
Those who prefer to report in a way that is independent of the current Committers and Maintainer may instead contact the Community Engagement Coordinator by e-mailing [r.data.table\@gmail.com](mailto:[email protected]). Messages sent to this e-mail address will be visible only to the current Community Engagement Coordinator, a position always held by an individual who is not a Committer or CRAN Maintainer of the package.
26+
27+
The current Committers are Toby Dylan Hocking (@tdhock), Matt Dowle (@mattdowle), Arun Srinivasan (@arunsrinivasan), Jan Gorecki (@jangorecki), Michael Chirico (@MichaelChirico), and Benjamin Schwendinger (@ben-schwen).
28+
29+
The current CRAN Maintainer is Tyson Barrett (@tysonstanley).
30+
31+
The current Community Engagement Coordinator is Kelly Bodwin (@kbodwin).
32+
33+
All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. Complaint respondents are obligated to maintain confidentiality with regard to the reporter of an incident.
34+
35+
This Code of Conduct is adapted from the [Contributor Covenant, version 1.3.0](https://www.contributor-covenant.org/version/1/3/0/code-of-conduct/), available at [https://www.contributor-covenant.org/version/1/3/0/](https://www.contributor-covenant.org/version/1/3/0/), and the Swift Code of Conduct.

GOVERNANCE.md

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,14 @@ Functionality that is out of current scope:
7272
* How to obtain this role: (1) merge into master a PR adding role="cre" to DESCRIPTION, and (2) submit updated package to CRAN (previous CRAN Maintainer will have to confirm change by email to CRAN).
7373
* How this role is recognized: credited via role="cre" in DESCRIPTION, so they appear as Maintainer on CRAN.
7474

75+
## Community Engagement Coordinator
76+
77+
* Definition: An individual who is involved in the project but does **not** also occupy the Committer or CRAN Maintainer role. In charge of maintaining The Raft blog, preparing Seal of Approval Applications, addressing Code of Conduct violations, and planning social or community events.
78+
79+
* How to obtain this role: At the discretion of the current Community Engagement Coordinator(s) in conversation with the current Committers.
80+
81+
* How this role is recognized: Holds the Owner role in the [rdatatable-community organization](https://github.com/orgs/rdatatable-community/people) on GitHub.
82+
7583
# Decision-making processes
7684

7785
## Definition of Consensus
@@ -98,19 +106,35 @@ There is no special process for changing this document. Submit a PR and ask for
98106

99107
Please also make a note in the change log under [`# Governance history`](#governance-history)
100108

101-
# Code of conduct
109+
# Finances and Funding
110+
111+
There is currently no mechanism for the data.table project to receive funding as an entity.
112+
113+
Funding support for this project therefore may come in two forms:
114+
115+
## Individual external funding
102116

103-
As contributors of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
117+
Any individual developer or community member of data.table may apply for and receive funding for their work on the project. Individuals or groups seeking funding support are strongly encouraged to consult directly with the data.table Project Members (by initiating an Issue on GitHub) to ensure funds are used meaningfully. Formally, however, decisions about use of funds are governed by the individual grantee(s) and their contract with the funding agency.
104118

105-
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, etc.
119+
There is no guarantee that funded work will be incorporated into the data.table package; any contributions, whether funded or unfunded, are subject to the same review process as outlined above.
106120

107-
Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
121+
## Direct donations
108122

109-
Committers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. A person with special roles who does not follow the Code of Conduct may have their roles revoked.
123+
Direct donations to the project may be made via GitHub Sponsorships, which allow individuals to fund a specific developer. If the current CRAN Maintainer offers a personal sponsorship option, donations may be made to them to support the project in general.
110124

111-
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or emailing one or more of the Committers.
125+
## Decision-making for future opportunities
112126

113-
This Code of Conduct is adapted from Tidyverse code of conduct.
127+
We here outline a procedure for disbursing funds, should this project in the future become a directly fundable entity (e.g. an LLC or a subsidiary of an umbrella LLC).
128+
129+
Funds acquired by the data.table project will be disbursed at the discretion of the **Committers**, defined as above. The **CRAN Maintainer** will have authority to make final decisions in the event that no consensus is reached among committers prior to deadlines for use of funds, and will be responsible for disbursement logistics.
130+
131+
# Code of conduct
132+
133+
The full Code of Conduct can be found [here](CODE_OF_CONDUCT.md), including details for reporting violations.
134+
135+
## Reporting Responsibility
136+
137+
Committers and the Community Engagement Coordinator pledge to address any publicly posted issues or privately sent messages regarding Code of Conduct violations in a respectful and timely manner.
114138

115139
# Version numbering
116140

@@ -124,6 +148,8 @@ data.table Version line in DESCRIPTION typically has the following meanings
124148

125149
# Governance history
126150

151+
Feb 2025: add Finances and Funding section, update Code of Conduct section to be a brief summary and reference the broader CoC document.
152+
127153
Jan 2025: clarify that edits to governance should notify all committers, and that role names are proper nouns (i.e., upper-case) throughout.
128154

129155
Feb 2024: change team name/link maintainers to committers, to be consistent with role defined in governance.

NAMESPACE

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,3 +206,6 @@ S3method(format_list_item, data.frame)
206206

207207
export(fdroplevels, setdroplevels)
208208
S3method(droplevels, data.table)
209+
210+
# sort_by added in R 4.4.0, #6662, https://stat.ethz.ch/pipermail/r-announce/2024/000701.html
211+
if (getRversion() >= "4.4.0") S3method(sort_by, data.table)

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22

33
# data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35) (in development)
44

5+
## NEW FEATURES
56

7+
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
68

79
# data.table [v1.17.0](https://github.com/Rdatatable/data.table/milestone/34) (20 Feb 2025)
810

R/data.table.R

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2454,7 +2454,7 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
24542454
# same as split.data.frame - handling all exceptions, factor orders etc, in a single stream of processing was a nightmare in factor and drop consistency
24552455
# evaluate formula mirroring split.data.frame #5392. Mimics base::.formula2varlist.
24562456
if (inherits(f, "formula"))
2457-
f = eval(attr(terms(f), "variables"), x, environment(f))
2457+
f = formula_vars(f, x)
24582458
# be sure to use x[ind, , drop = FALSE], not x[ind], in case downstream methods don't follow the same subsetting semantics (#5365)
24592459
return(lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), function(ind) x[ind, , drop = FALSE]))
24602460
}
@@ -2526,6 +2526,18 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
25262526
}
25272527
}
25282528

2529+
sort_by.data.table <- function(x, y, ...)
2530+
{
2531+
if (!cedta()) return(NextMethod()) # nocov
2532+
if (inherits(y, "formula"))
2533+
y <- formula_vars(y, x)
2534+
if (!is.list(y))
2535+
y <- list(y)
2536+
# use forder instead of base 'order'
2537+
o <- do.call(forder, c(unname(y), list(...)))
2538+
x[o, , drop=FALSE]
2539+
}
2540+
25292541
# TO DO, add more warnings e.g. for by.data.table(), telling user what the data.table syntax is but letting them dispatch to data.frame if they want
25302542

25312543
copy = function(x) {

R/utils.R

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,3 +212,11 @@ rss = function() { #5515 #5517
212212
round(ans / 1024.0, 1L) # return MB
213213
# nocov end
214214
}
215+
216+
formula_vars = function(f, x) { # .formula2varlist is not API and seems to have appeared after R-4.2, #6841
217+
terms <- terms(f)
218+
setNames(
219+
eval(attr(terms, "variables"), x, environment(f)),
220+
attr(terms, "term.labels")
221+
)
222+
}

Seal_of_Approval.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,4 @@ Translates `data.table` syntax to a different syntax, or provides helper functio
3131
Not necessarily directly connected to `data.table`, but deliberately follows the [core philosophies of `data.table`](https://github.com/Rdatatable/data.table/blob/master/GOVERNANCE.md#the-r-package).
3232

3333
- [collapse](https://github.com/SebKrantz/collapse): Advanced and Fast Data Transformation in R.
34+

inst/tests/tests.Rraw

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21041,3 +21041,31 @@ test(2304.100, set(copy(DT), i=2L, j=c("L1", "L2"), value=list(list(NULL), list(
2104121041

2104221042
# the integer overflow in #6729 is only noticeable with UBSan
2104321043
test(2305, { fread(testDir("issue_6729.txt.bz2")); TRUE })
21044+
21045+
if (exists("sort_by", "package:base")) {
21046+
# sort_by.data.table
21047+
DT1 = data.table(a=c(1, 3, 2, NA, 3), b=4:0)
21048+
DT2 = data.table(a=c("c", "a", "B")) # data.table uses C-locale and should sort_by if cedta()
21049+
DT3 = data.table(a=c(1, 2, 3), b=list(c("a", "b", "", NA), c(1, 3, 2, 0), c(TRUE, TRUE, FALSE, NA))) # list column
21050+
21051+
# sort_by.data.table: basics
21052+
test(2306.01, sort_by(DT1, ~a + b), data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
21053+
test(2306.02, sort_by(DT1, ~I(a + b)), data.table(a=c(3, 2, 1, 3, NA), b=c(0L, 2L, 4L, 3L, 1L)))
21054+
test(2306.03, sort_by(DT2, ~a), data.table(a=c("B", "a", "c")))
21055+
21056+
# sort_by.data.table: list columns.
21057+
# NOTE 1: .formula2varlist works well with list columns.
21058+
# NOTE 2: 4 elem in DT of 3 row because forderv takes a list column as a DT.
21059+
test(2306.04, sort_by(DT3, ~b), DT3[order(b)]) # should be consistent.
21060+
21061+
# sort_by.data.table: additional C-locale sorting
21062+
test(2306.10, DT2[, sort_by(.SD, a)], data.table(a=c("B", "a", "c")))
21063+
test(2306.11, DT2[, sort_by(.SD, ~a)], data.table(a=c("B", "a", "c")))
21064+
21065+
# sort_by.data.table: various working interfaces
21066+
test(2306.20, sort_by(DT1, list(DT1$a, DT1$b)), data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
21067+
test(2306.21, sort_by(DT1, DT1[, .(a, b)]), data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
21068+
test(2306.22, DT1[, sort_by(.SD, .(a, b))], data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
21069+
test(2306.23, DT1[, sort_by(.SD, ~a + b)], data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
21070+
test(2306.24, DT1[, sort_by(.SD, ~.(a, b))], data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
21071+
}

man/setorder.Rd

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
\alias{fastorder}
66
\alias{forder}
77
\alias{forderv}
8+
\alias{sort_by}
89

910
\title{Fast row reordering of a data.table by reference}
1011
\description{
@@ -32,6 +33,7 @@ setorderv(x, cols = colnames(x), order=1L, na.last=FALSE)
3233
# optimised to use data.table's internal fast order
3334
# x[order(., na.last=TRUE)]
3435
# x[order(., decreasing=TRUE)]
36+
# sort_by(x, ., na.last=TRUE, decreasing=FALSE) # R >= 4.4.0
3537
}
3638
\arguments{
3739
\item{x}{ A \code{data.table}. }
@@ -46,7 +48,7 @@ when \code{b} is of type \code{character} as well. }
4648
\code{order} must be either \code{1} or equal to that of \code{cols}. If
4749
\code{length(order) == 1}, it is recycled to \code{length(cols)}. }
4850
\item{na.last}{ \code{logical}. If \code{TRUE}, missing values in the data are placed last; if \code{FALSE}, they are placed first; if \code{NA} they are removed.
49-
\code{na.last=NA} is valid only for \code{x[order(., na.last)]} and its
51+
\code{na.last=NA} is valid only for \code{x[order(., na.last)]} and related \code{sort_by(x, .)} (\eqn{\R \ge 4.4.0}) and its
5052
default is \code{TRUE}. \code{setorder} and \code{setorderv} only accept
5153
\code{TRUE}/\code{FALSE} with default \code{FALSE}. }
5254
}
@@ -71,8 +73,8 @@ sets the \code{sorted} attribute.
7173

7274
\code{na.last} argument, by default, is \code{FALSE} for \code{setorder} and
7375
\code{setorderv} to be consistent with \code{data.table}'s \code{setkey} and
74-
is \code{TRUE} for \code{x[order(.)]} to be consistent with \code{base::order}.
75-
Only \code{x[order(.)]} can have \code{na.last = NA} as it is a subset operation
76+
is \code{TRUE} for \code{x[order(.)]} and \code{sort_by(x, .)} (\eqn{\R \ge 4.4.0}) to be consistent with \code{base::order}.
77+
Only \code{x[order(.)]} (and related \code{sort_by(x, .)}) can have \code{na.last = NA} as it is a subset operation
7678
as opposed to \code{setorder} or \code{setorderv} which reorders the data.table
7779
by reference.
7880
@@ -96,6 +98,11 @@ was started in. By contrast, \code{"america" < "BRAZIL"} is always \code{FALSE}
9698
9799
If \code{setorder} results in reordering of the rows of a keyed \code{data.table},
98100
then its key will be set to \code{NULL}.
101+
102+
Starting from \R 4.4.0, \code{sort_by(x, y, \dots)} is the S3 method for the generic \code{sort_by} for \code{data.table}'s.
103+
It uses the same formula or list interfaces as data.frame's \code{sort_by} but internally uses \code{data.table}'s fast ordering,
104+
hence it behaves the same as \code{x[order(.)]} and takes the same optional named arguments and their defaults.
105+
99106
}
100107
\value{
101108
The input is modified by reference, and returned (invisibly) so it can be used

vignettes/datatable-joins.Rmd

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -569,6 +569,42 @@ ProductReceivedProd2[ProductSalesProd2,
569569
nomatch = NULL]
570570
```
571571

572+
### 4.1 Output column names in non-equi joins
573+
574+
When performing non-equi joins (`<`, `>`, `<=`, `>=`), column names are assigned as follows:
575+
576+
- The left operand (`x` column) determines the column name in the result.
577+
- The right operand (`i` column) contributes values but does not retain its original name.
578+
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579+
580+
In non-equi joins, the left side of the operator (e.g., `x_int` in `x_int >= i_int`) must be a column from `x`, while the right side (e.g., `i_int`) must be a column from `i`.
581+
582+
Non-equi joins do not currently support arbitrary expressions (but see [#1639](https://github.com/Rdatatable/data.table/issues/1639)). For example, `on = .(x_int >= i_int)` is valid, but `on = .(x_int >= i_int + 1L)` is not. To perform such a non-equi join, first add the expression as a new column, e.g. `i[, i_int_plus_one := i_int + 1L]`, then do `.on(x_int >= i_int_plus_one)`.
583+
584+
```{r non_equi_join_example}
585+
x <- data.table(x_int = 2:4, lower = letters[1:3])
586+
i <- data.table(i_int = c(2L, 4L, 5L), UPPER = LETTERS[1:3])
587+
x[i, on = .(x_int >= i_int)]
588+
```
589+
590+
Key Takeaways:
591+
- The name of the output column (`x_int`) comes from `x`, but the values come from `i_int` in `i`.
592+
- The last row contains `NA` because no rows in `x` match the last row in `i` (`UPPER == "C"`).
593+
- Multiple rows in `x` are returned to match the first row in `i` with `UPPER == "A"`.
594+
595+
If you want to keep the `i_int` column from `i`, you need to explicitly select it in the result:
596+
597+
```{r retain_i_column}
598+
x[i, on = .(x_int >= i_int), .(i_int = i.i_int, x_int = x.x_int, lower, UPPER)]
599+
```
600+
601+
Using prefixes (`x.` and `i.`) is not strictly necessary in this case since the names are unambiguous, but using them ensures the output clearly distinguishes `i_int` (from `i`) and `x_int` (from `x`).
602+
603+
If you want to exclude unmatched rows (an _inner join_), use `nomatch = NULL`:
604+
605+
```{r retain_i_column_inner_join}
606+
x[i, on = .(x_int >= i_int), .(i_int = i.i_int, x_int = x.x_int, lower, UPPER), nomatch = NULL]
607+
```
572608

573609
## 5. Rolling join
574610

0 commit comments

Comments
 (0)