Consistent Replacement of List Column with NULL#6167
Conversation
|
Generated via commit e14d93c Download link for the artifact containing the test results: ↓ atime-results.zip
|
tdhock
left a comment
There was a problem hiding this comment.
these changes and tests look good, can you please add a NEWS item.
Anirban166
left a comment
There was a problem hiding this comment.
Thanks for adding the NEWS entry and the changes look good to me as well!
As for the tests they are good too and work (just tested) but I think it might be neat to comment or separate them out a bit for one to quickly see what each test is doing and how is it different from the other. For e.g., for your tests from top to bottom in order, it could be a comment that conveys that you replaced a list column with standard assignment to NULL, did the same but using the := syntax or modified in-place, compared with another data.table, replaced multiple elements with NULL and then followed up with tests similar to the single element replacement case.
|
I know I'm quite late to the party, but in my opinion, ideally, we would bring the |
|
Hm.. not sure I understand what you mean here, I thought the simplest fix would be in |
|
this may be a breaking change (revdep checks could fail as a result) |
Currently, there are multiple ways to alter/add new columns to a |
|
Oh I see. Do you propose that we try to include the changes in this PR, or is it worth filing a separate issue? |
|
There are already multiple issues about the divergence of set and :=. It does not have to be this PR, I just thought that this might be an interesting topic to work on in GSOC (maybe as stacked PR) |
@tdhock Reference Semantics Vignette
This vignette implies that the result of the two forms exist, with the primary difference being syntax and the functional form being more chatty. IMO, it implies(?) that the two are the same. Assignment by reference doc
I think this documentation also implies that the different usages both work. It does state that let and functional form are equivalent. So, I will add some tests in this PR to check that using Although these two documentations imply that the results of either form are largely the same, I haven't found anywhere in the documentation that says it is always guaranteed to be the same. While searching this up on google, I found this stack overflow thread talking about different results when using functional form and assigning by reference: https://stackoverflow.com/questions/44067091/different-results-for-standard-form-and-functional-form-of-data-table-assigne-by Jan explained here that there are slight differences in how RHS is handled causing a difference in output between the two forms depending on whether the data we are assigning is a vector or a list: dt <- data.table(a = c('a','b','c'))
l <- list(v)
print(copy(dt)[, new := l])
print(copy(dt)[, `:=` (new = l)])
a new
<char> <char>
1: a A
2: b B
3: c C
a new
<char> <list>
1: a A,B,C
2: b A,B,C
3: c A,B,CThis is still true as of current master (just tested), so I believe we shouldn't explicitly state that the results will be the exact same. But we should note that in most cases, the two forms are the same, which I believe the current documentation implies. |
This was my interpretation when I commented on the issue! |
|
would be good to clarify the docs, explicitly write they they should be the same, and when they are expected to be different |
|
TBH @tdhock I'm still a little confused on the exact differences between standard and functional form of assigning by reference. I want to ask for some of Jan's (and others) input to help me understand it. Plus it'll keep the logs on this PR a little clearer, as this PR didn't intend to fix documentation but is only slightly related, WDYT about filing a separate issue for that? Otherwise, if you think the vignette update is clear enough then we could keep it in this PR, however I'm having trouble reasoning why exactly the above behavior happens. My line of thinking at the moment is that because dt[, `:=`(new = list(1:3))]it is essentially equivalent to: dt[, new := list(new = list(1:3))]Since this is true (just tried), I wonder how the wrapping of RHS by list in standard form vs not wrapping in functional form is relevant |
|
Hmm.. It seems that there's been an oversight on my end. While revisiting the code/documentation change again, I realized that since we know that the functional form wraps > DT = data.table(L = list('A'), i = 1)
> DT[, `:=`(L = NULL)]
> DT
# L i
# <list> <num>
# 1: [NULL] 1I think this can be fixed, but I'll need some time to think of a good solution, suggestions are welcome. I'll be reorganizing the unit tests to be more comprehensive and use all forms of assignment to thoroughly test. Thanks for everyone's patience! |
|
Organized and added some tests, changed list wrapping behavior of |
|
looks good to me, thanks for the extensive tests |
|
This is a truly excellent PR, sorry it took so long to review! I tidied things up very slightly, and added one more set of tests to cover one more situation: sub-assignment (i.e., cases where only some but not all rows are edited). |
|
I suspect this will cause some revdep breakages. We should think of if it's possible to retain the old behavior. I think some inconsistency here is impossible to avoid. We've gone from apparent inconsistency: DT[, list_col := list(NULL)] # delete
DT[, char_col := 2L] # overwriteTo apparent inconsistency: DT[, one_col := list(NULL)] # overwrite
DT[, (two_cols) := list(NULL, NULL)] # deleteAll of this is ultimately a consequence of the convenience to have "naked" RHS of |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6167 +/- ##
=======================================
Coverage 98.62% 98.62%
=======================================
Files 79 79
Lines 14642 14645 +3
=======================================
+ Hits 14441 14444 +3
Misses 201 201 ☔ View full report in Codecov by Sentry. |
No worries! This one is quite a lot of reading so I appreciate the time you took to review it. I am always leaning towards the side that we should make users' experience as smooth as possible, and if revdeps are a concern then I'm always happy to stick with current behavior. With that being said, IMO this behavior: DT[, one_col := list(NULL)] # overwrite
DT[, (two_cols) := list(NULL, NULL)] # deletedoesn't look too inconsistent to me. This is mainly because when I see multiple entries on the LHS then I'd expect that each column on the LHS be assigned a corresponding value from the RHS. In this case both columns specified in the LHS would be assigned a value of NULL each, hence deleting both and that makes sense to me. |
|
this definitely fixes my original issue thanks! |
| names(jsub)="" | ||
| jsub[[1L]]=as.name("list") | ||
| # dont wrap the RHS in list if it is a singular NULL and if not creating a new column | ||
| if (length(jsub[-1L]) == 1L && as.character(jsub[-1L]) == 'NULL' && all(lhs %chin% names_x)) jsub[[1L]]=as.name("identity") else jsub[[1L]]=as.name("list") |
There was a problem hiding this comment.
@joshhwuu reading a bit more carefully here, part of the issue is relying on literal NULL being used (as opposed to using null_variable where null_variable=NULL). At a minimum, we should check the actual value of j[[1L]], see markfairbanks/tidytable#831.
* revert #6167 (new rules on list(NULL) assignment) * restore last missing line * restore tests to working state on current master

Closes #5558
Previous behavior
From @tdhock:
This was reported to be inconsistent with column replacement with more than one row, see:
Additionally, there was this inconsistency as well:
Changes
In assign.c, add a new check to see if passed in
valuesislist(NULL). If so, replace the list column with a list of NULL(s) of the same length.This is the new behavior:
We no longer delete the column, instead replace the column rows with NULLs.
This PR also changes behavior when doing more than one row, to be more consistent with
data.framereplacement:Of course, this works with the other assignment methods.
Had to change one old test,
test(2058.20)to reflect the new behavior as well.