Fix incorrect keying after merge of keyed, non-alphabetic `factor` and `character` columns #5362

ben-schwen · 2022-04-06T21:05:51Z

Implements option 1 of #5361 (comment) (Mentioned problems 2+3 still exist but need additional is.sorted check)

codecov · 2022-04-06T21:19:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.47%. Comparing base (c16f320) to head (a1cbe53).
Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5362   +/-   ##
=======================================
  Coverage   98.47%   98.47%           
=======================================
  Files          81       81           
  Lines       15005    15019   +14     
=======================================
+ Hits        14776    14790   +14     
  Misses        229      229

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

NEWS.md

R/data.table.R

MichaelChirico

Thanks!

NEWS.md

R/data.table.R

github-actions · 2024-09-09T20:45:26Z

No obvious timing issues in HEAD=merge_factor_char_key

Generated via commit a1cbe53

Download link for the artifact containing the test results: ↓ atime-results.zip

Task	Duration
R setup and installing dependencies	2 minutes and 53 seconds
Installing different package versions	38 seconds
Running and plotting the test cases	2 minutes and 26 seconds

R/data.table.R

NEWS.md

MichaelChirico · 2025-06-30T16:22:27Z

I think this is ready to merge, WDYT @ben-schwen?

MichaelChirico · 2025-06-30T18:26:08Z

R/data.table.R

+    return(NULL)
+
+  ## check key on i as well!
+  if (is.logical(i))


this is.logical(i) check has been there since initial check-in in 2008 (2ec50ec; was is.logical(irows) then), not sure it's possible to reach it.

For queries like DT[<logical subset>], we return here:

data.table/R/data.table.R

Lines 681 to 684 in 6029f2f

if (!length(leftcols)) {

# basic x[i] subset, #2951

if (is.null(irows)) return(shallow(x)) # e.g. DT[TRUE] (#3214); otherwise CsubsetDT would materialize a deep copy

else return(.Call(CsubsetDT, x, irows, seq_along(x)) )

For queries like DT[<logical subset>, .(key, other)], the key is retained & the value returned here:

data.table/R/data.table.R

Lines 1459 to 1467 in 6029f2f

if (is.null(irows) && !is.null(shared_keys)) {

setattr(jval, 'sorted', shared_keys)

# potentially inefficient backup -- check if jval is sorted by key(x)

} else if (haskey(x) && all(key(x) %chin% names(jval)) && is.sorted(jval, by=key(x))) {

setattr(jval, 'sorted', key(x))

}

if (any(vapply_1b(jval, is.null))) internal_error("j has created a data.table result containing a NULL column") # nocov

}

return(jval)

So I'm pretty sure it's not possible to reach this. We can see if revdeps turn anything up.

MichaelChirico

@ben-schwen feel free to merge once you've reviewed my own edits.

ancient

ben-schwen added 4 commits April 6, 2022 22:46

add fix

34a744e

add test

80b0b91

add more tests

1f6c41e

add NEWS

4ab82c1

MichaelChirico added 2 commits April 6, 2022 21:40

typo

5a3e3c7

sentence structure

8d70e6d

MichaelChirico reviewed Apr 7, 2022

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

MichaelChirico reviewed Apr 7, 2022

View reviewed changes

R/data.table.R Outdated Show resolved Hide resolved

ben-schwen added 4 commits April 7, 2022 10:18

state bug more precisely

b1f4892

unwield lengthy if with ws

2183cbb

more tests

f42d498

extend NEWS for mirror casse

96437fb

MichaelChirico approved these changes Apr 10, 2022

View reviewed changes

mattdowle previously requested changes May 17, 2022

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

update NEWS

f45564a

MichaelChirico reviewed Sep 9, 2024

View reviewed changes

R/data.table.R Outdated Show resolved Hide resolved

ben-schwen added 2 commits September 9, 2024 22:23

vapply on single columns instead of whole subset

9ce6737

Merge branch 'master' into merge_factor_char_key

b7bbb5a

MichaelChirico reviewed Sep 9, 2024

View reviewed changes

R/data.table.R Outdated Show resolved Hide resolved

use .shallow

608599c

MichaelChirico changed the title ~~Merge on factor and character returns wrongly keyed data.table~~ Fix incorrect keying after merge of keyed, non-alphabetic factor and character columns Jun 30, 2025

MichaelChirico reviewed Jun 30, 2025

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

MichaelChirico added 5 commits June 30, 2025 16:23

Merge branch 'master' into merge_factor_char_key

fedde2d

suggested NEWS wording

0d5f5a5

add OPs original example more exactly to the regression test

c204850

add some tests of multiple join columns

6750e27

avoid using column x in table x

49a8f1a

MichaelChirico and others added 6 commits June 30, 2025 10:00

attempt to refactor into huge helper (🤞)

c9a33af

trailing ws

07e3acc

fix (?) tests

fa3be64

need to pass 'ans' too

dc95644

don't reuse overloaded name 'let'

5ad17ad

typo

7c8d2f7

MichaelChirico reviewed Jun 30, 2025

View reviewed changes

remove apparently vestigial check

088f876

MichaelChirico approved these changes Jun 30, 2025

View reviewed changes

Merge remote-tracking branch 'origin/master' into merge_factor_char_key

dda8722

MichaelChirico mentioned this pull request Jun 30, 2025

Add options= to test(), convert most Rraw scripts #5845

Draft

Merge branch 'master' into merge_factor_char_key

a1cbe53

ben-schwen merged commit c806849 into master Jul 2, 2025
12 checks passed

MichaelChirico deleted the merge_factor_char_key branch July 8, 2025 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect keying after merge of keyed, non-alphabetic `factor` and `character` columns #5362

Fix incorrect keying after merge of keyed, non-alphabetic `factor` and `character` columns #5362

Uh oh!

ben-schwen commented Apr 6, 2022 •

edited by MichaelChirico

Loading

Uh oh!

codecov bot commented Apr 6, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

MichaelChirico left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

MichaelChirico commented Jun 30, 2025

Uh oh!

MichaelChirico Jun 30, 2025 •

edited

Loading

Uh oh!

MichaelChirico Jun 30, 2025

Uh oh!

MichaelChirico left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (!length(leftcols)) {
	# basic x[i] subset, #2951
	if (is.null(irows)) return(shallow(x)) # e.g. DT[TRUE] (#3214); otherwise CsubsetDT would materialize a deep copy
	else return(.Call(CsubsetDT, x, irows, seq_along(x)) )

	if (is.null(irows) && !is.null(shared_keys)) {
	setattr(jval, 'sorted', shared_keys)
	# potentially inefficient backup -- check if jval is sorted by key(x)
	} else if (haskey(x) && all(key(x) %chin% names(jval)) && is.sorted(jval, by=key(x))) {
	setattr(jval, 'sorted', key(x))
	}
	if (any(vapply_1b(jval, is.null))) internal_error("j has created a data.table result containing a NULL column") # nocov
	}
	return(jval)

Fix incorrect keying after merge of keyed, non-alphabetic factor and character columns #5362

Fix incorrect keying after merge of keyed, non-alphabetic factor and character columns #5362

Uh oh!

Conversation

ben-schwen commented Apr 6, 2022 • edited by MichaelChirico Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

MichaelChirico left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaelChirico commented Jun 30, 2025

Uh oh!

MichaelChirico Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelChirico left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix incorrect keying after merge of keyed, non-alphabetic `factor` and `character` columns #5362

Fix incorrect keying after merge of keyed, non-alphabetic `factor` and `character` columns #5362

ben-schwen commented Apr 6, 2022 •

edited by MichaelChirico

Loading

codecov bot commented Apr 6, 2022 •

edited

Loading

github-actions bot commented Sep 9, 2024 •

edited

Loading

MichaelChirico Jun 30, 2025 •

edited

Loading