-
Notifications
You must be signed in to change notification settings - Fork 1k
closes #4519 added a reference page for .internal.selfref #6696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6696 +/- ##
=======================================
Coverage 98.61% 98.61%
=======================================
Files 79 79
Lines 14553 14553
=======================================
Hits 14352 14352
Misses 201 201 ☔ View full report in Codecov by Sentry. |
|
I introduced the changes requested by you |
|
Thank you very much for the fixes! When I mentioned linking to |
|
Yha the link will not be working from .md file to .rd file that whi I provided the directory path for the internal.selfref . |
|
Writing good documentation is hard. When reading the help pages for
data.table, where would the user benefit the most from knowing about
the .internal.selfref attribute? I think it would be appropriate to
link to the new help page from data.table.Rd, assign.Rd, and copy.Rd.
|
aitap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better, thank you for the improvements. A few more changes and then we'll ask someone with commit rights to merge.
man/assign.Rd
Outdated
| DT = data.table(a = LETTERS[c(3L,1:3)], b = 4:7) | ||
| DT[, c := 8] # add a numeric column, 8 for all rows | ||
| DT[, d := 9L] # add an integer column, 9L for all rows | ||
| DT[, d := 9L] # add an integer column, 9L for all rows\code{\link{.Last.updated}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax doesn't work inside \examples. Unfortunately, the Rd format has a lot of little details. Try rendering the help pages using R CMD Rdconv or installing the package and browsing its help pages before pushing commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanx for the guidance i will introduce all the changes and submit it till toady or tomorrow evening
NEWS.0.md
Outdated
| query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T for reporting [here](https://stackoverflow.com/questions/20349159/memory-leak-in-data-table-grouped-assignment-by-reference) and [here](https://stackoverflow.com/questions/15651515/slow-memory-leak-in-data-table-when-returning-named-lists-in-j-trying-to-reshap) on SO. | ||
| 2. In long running computations where data.table is called many times repetitively the following error could sometimes occur, #2647: *"Internal error: .internal.selfref prot is not itself an extptr"*. Now fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples [here](https://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r). | ||
| for more info about internal.selfref Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revert this one.
NEWS.md
Outdated
| 27. `as.data.frame(DT)`, `setDF(DT)` and `as.list(DT)` now remove the `"index"` attribute which contains any indices (a.k.a. secondary keys), as they already did for other `data.table`-only attributes such as the primary key stored in the `"sorted"` attribute. When indices were left intact, a subsequent subset, assign, or reorder of the `data.frame` by `data.frame`-code in base R or other packages would not update the indices, causing incorrect results if then converted back to `data.table`, [#4889](https://github.com/Rdatatable/data.table/issues/4889). Thanks @OfekShilon for the report and the PR. | ||
| 28. `dplyr::arrange(DT)` uses `vctrs::vec_slice` which retains `data.table`'s class but uses C to bypass `[` method dispatch and does not adjust `data.table`'s attributes containing the index row numbers, [#5042](https://github.com/Rdatatable/data.table/issues/5042). `data.table`'s long-standing `.internal.selfref` mechanism to detect such operations by other packages was not being checked by `data.table` when using indexes, causing `data.table` filters and joins to use invalid indexes and return incorrect results after a `dplyr::arrange(DT)`. Thanks to @Waldi73 for reporting; @avimallu, @tlapak, @MichaelChirico, @jangorecki and @hadley for investigating and suggestions; and @mattdowle for the PR. The intended way to use `data.table` is `data.table::setkey(DT, col1, col2, ...)` which reorders `DT` by reference in parallel, sets the primary key for automatic use by subsequent `data.table` queries, and permits rowname-like usage such as `DT["foo",]` which returns the now-contiguous-in-memory block of rows where the first column of `DT`'s key contains `"foo"`. Multi-column-rownames (i.e. a primary key of more than one column) can be looked up using `DT[.("foo",20210728L), ]`. Using `==` in `i` is also optimized to use the key or indices, if you prefer using column names explicitly and `==`. An alternative to `setkey(DT)` is returning a new ordered result using `DT[order(col1, col2, ...), ]`. | ||
| Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also need to be reverted.
man/internal.selfref.rd
Outdated
| library(data.table) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add library(data.table) to the examples. It's okay to assume that the package is already attached. Both R CMD check and example() will attach the package for us.
man/assign.Rd
Outdated
| \code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}). | ||
| } | ||
| \seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}} | ||
| \seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}},\code{\link{internal.selfref}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When linking to a help page, use the \alias{...} name of the help page (i.e. .internal.selfref here), not its file name (i.e. not internal.selfref). All other occurrences will need to be fixed too.
man/internal.selfref.rd
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with other help files, let's rename the file to internal.selfref.Rd, with capital R in the extension.
man/internal.selfref.rd
Outdated
| \alias{.internal.selfref} | ||
| \title{Internal Self-Reference Attribute in data.table} | ||
| \description{ | ||
| The \code{.internal.selfref} attribute is an internal mechanism used by \code{data.table} to optimize memory management and performance. It acts as a pointer that allows \code{data.table} objects to reference their own memory location. While the \code{.internal.selfref} attribute may appear to always point to \code{NULL} when inspected directly, this is a result of its implementation in R's memory management system. The true significance of this attribute lies in its role in supporting reference semantics, which enables efficient in-place modification of \code{data.table} objects without unnecessary copying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that the external pointer appears to point at NULL; that's done deliberately so that identical() would be TRUE for such attributes from two different data.tables. (Why is that needed? So that identical() would work on data.tables with otherwise identical contents.) The actual self-reference pointer lives in the prot part of the external pointer, visible using .Internal(inspect(x)), wrapped into another external pointer to avoid creating a reference loop visible to R. Once the data.table is duplicated, its address will be different from what is stored in the self-reference attribute, making it possible to detect such copies.
Try summarising the comments and the code of _selfrefok to document what it actually does.
|
@aitap can you now review this and plss tell me if you have any more suggestions. |
|
Thanks for all this! Honestly, I am not sure it's the best idea to write this page. To me, Perhaps a better approach here is to completely remove mention of WDYT @jangorecki / @tdhock / @ben-schwen? *Well, certainly there are already downstreams making assumptions about these internal details, but I'm more comfortable with breaking changes of undocumented code. |
|
I agree with Michael that a manpage actually puts more burden on us in maintaining the described functionality than in helping the average user so in favor of closing #4519 and replacing the mentioning of However, I think that this implementation detail is still worth either a wiki entry or a blog post on the Raft for the interested reader (maybe as a part of a broader topic like reference semantics). |
|
I defer to @MichaelChirico and @ben-schwen. Sorry for making you do extra work, @venom1204. |
let's not document internal implementation details
|
@aitap No worries! I’ve gained a lot of valuable knowledge throughout this process, and I truly appreciate the guidance and insights you’ve provided. Thank you! |
|
Thanks for your cooperation @venom1204! Glad you found the process nevertheless insightful & hope you could learn about data.table more deeply --> more contributions in the future :) BTW, it can't hurt to check in on an issue's discussion, especially for older issues, e.g. "I am thinking of working on this". That might save effort if we're able to raise concerns about whether our minds have changed about actually fixing a given issue :) (the issue backlog is quite large and old, so certainly there are some issues in there that should just be closed 😢) |
closes #4519
i added a seperate file internal.selfref.rd
containing its
@MichaelChirico can you please review it.