Skip to content

Conversation

@venom1204
Copy link
Contributor

closes #4519
i added a seperate file internal.selfref.rd
containing its

  1. Purpose and Functionality
  2. Behavior and Use
  3. User Interaction
  4. Examples
  5. Reference to Related Functions

@MichaelChirico can you please review it.

@codecov
Copy link

codecov bot commented Dec 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.61%. Comparing base (70c64ac) to head (9ccbede).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6696   +/-   ##
=======================================
  Coverage   98.61%   98.61%           
=======================================
  Files          79       79           
  Lines       14553    14553           
=======================================
  Hits        14352    14352           
  Misses        201      201           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@venom1204
Copy link
Contributor Author

I introduced the changes requested by you
and for the reference of the internal.selfref i have added its guide path in the NEWS.md and the NEWS0.md
as they were the places where i find the internal.selfref .
is it fine now @aitap @MichaelChirico can you review it or there are any more changes i need to do ?

@aitap
Copy link
Member

aitap commented Dec 28, 2024

Thank you very much for the fixes!

When I mentioned linking to internal.selfref from other help pages, I meant the help pages in man/*.Rd, not inst/NEWS*md. I don't think that a link from NEWS.md to an Rd file will work in an installed R package; but \link{...}s will work if you run (for example) help(data.table, data.table, help_type='html'). The idea is to make the new help page a part of the same graph, reachable by links from other help pages.

@venom1204
Copy link
Contributor Author

Yha the link will not be working from .md file to .rd file that whi I provided the directory path for the internal.selfref .
And I did not include it in the man.r as I did not find the use of internal.selfref much .
Should I make change again ?

@aitap
Copy link
Member

aitap commented Dec 28, 2024 via email

aitap
aitap previously requested changes Dec 29, 2024
Copy link
Member

@aitap aitap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better, thank you for the improvements. A few more changes and then we'll ask someone with commit rights to merge.

man/assign.Rd Outdated
DT = data.table(a = LETTERS[c(3L,1:3)], b = 4:7)
DT[, c := 8] # add a numeric column, 8 for all rows
DT[, d := 9L] # add an integer column, 9L for all rows
DT[, d := 9L] # add an integer column, 9L for all rows\code{\link{.Last.updated}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax doesn't work inside \examples. Unfortunately, the Rd format has a lot of little details. Try rendering the help pages using R CMD Rdconv or installing the package and browsing its help pages before pushing commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanx for the guidance i will introduce all the changes and submit it till toady or tomorrow evening

NEWS.0.md Outdated
query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T for reporting [here](https://stackoverflow.com/questions/20349159/memory-leak-in-data-table-grouped-assignment-by-reference) and [here](https://stackoverflow.com/questions/15651515/slow-memory-leak-in-data-table-when-returning-named-lists-in-j-trying-to-reshap) on SO.
2. In long running computations where data.table is called many times repetitively the following error could sometimes occur, #2647: *"Internal error: .internal.selfref prot is not itself an extptr"*. Now fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples [here](https://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r).
for more info about internal.selfref Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert this one.

NEWS.md Outdated
27. `as.data.frame(DT)`, `setDF(DT)` and `as.list(DT)` now remove the `"index"` attribute which contains any indices (a.k.a. secondary keys), as they already did for other `data.table`-only attributes such as the primary key stored in the `"sorted"` attribute. When indices were left intact, a subsequent subset, assign, or reorder of the `data.frame` by `data.frame`-code in base R or other packages would not update the indices, causing incorrect results if then converted back to `data.table`, [#4889](https://github.com/Rdatatable/data.table/issues/4889). Thanks @OfekShilon for the report and the PR.
28. `dplyr::arrange(DT)` uses `vctrs::vec_slice` which retains `data.table`'s class but uses C to bypass `[` method dispatch and does not adjust `data.table`'s attributes containing the index row numbers, [#5042](https://github.com/Rdatatable/data.table/issues/5042). `data.table`'s long-standing `.internal.selfref` mechanism to detect such operations by other packages was not being checked by `data.table` when using indexes, causing `data.table` filters and joins to use invalid indexes and return incorrect results after a `dplyr::arrange(DT)`. Thanks to @Waldi73 for reporting; @avimallu, @tlapak, @MichaelChirico, @jangorecki and @hadley for investigating and suggestions; and @mattdowle for the PR. The intended way to use `data.table` is `data.table::setkey(DT, col1, col2, ...)` which reorders `DT` by reference in parallel, sets the primary key for automatic use by subsequent `data.table` queries, and permits rowname-like usage such as `DT["foo",]` which returns the now-contiguous-in-memory block of rows where the first column of `DT`'s key contains `"foo"`. Multi-column-rownames (i.e. a primary key of more than one column) can be looked up using `DT[.("foo",20210728L), ]`. Using `==` in `i` is also optimized to use the key or indices, if you prefer using column names explicitly and `==`. An alternative to `setkey(DT)` is returning a new ordered result using `DT[order(col1, col2, ...), ]`.
Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also need to be reverted.

Comment on lines 23 to 24
library(data.table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not add library(data.table) to the examples. It's okay to assume that the package is already attached. Both R CMD check and example() will attach the package for us.

man/assign.Rd Outdated
\code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}).
}
\seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}}
\seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}},\code{\link{internal.selfref}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When linking to a help page, use the \alias{...} name of the help page (i.e. .internal.selfref here), not its file name (i.e. not internal.selfref). All other occurrences will need to be fixed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other help files, let's rename the file to internal.selfref.Rd, with capital R in the extension.

\alias{.internal.selfref}
\title{Internal Self-Reference Attribute in data.table}
\description{
The \code{.internal.selfref} attribute is an internal mechanism used by \code{data.table} to optimize memory management and performance. It acts as a pointer that allows \code{data.table} objects to reference their own memory location. While the \code{.internal.selfref} attribute may appear to always point to \code{NULL} when inspected directly, this is a result of its implementation in R's memory management system. The true significance of this attribute lies in its role in supporting reference semantics, which enables efficient in-place modification of \code{data.table} objects without unnecessary copying.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that the external pointer appears to point at NULL; that's done deliberately so that identical() would be TRUE for such attributes from two different data.tables. (Why is that needed? So that identical() would work on data.tables with otherwise identical contents.) The actual self-reference pointer lives in the prot part of the external pointer, visible using .Internal(inspect(x)), wrapped into another external pointer to avoid creating a reference loop visible to R. Once the data.table is duplicated, its address will be different from what is stored in the self-reference attribute, making it possible to detect such copies.

Try summarising the comments and the code of _selfrefok to document what it actually does.

@venom1204
Copy link
Contributor Author

@aitap can you now review this and plss tell me if you have any more suggestions.

@MichaelChirico
Copy link
Member

Thanks for all this!

Honestly, I am not sure it's the best idea to write this page. To me, .internal.selfref is really an private implementation detail of the data.table class; over-documenting it effectively turns it into a public part of the API & prevents us from making back-end changes*

Perhaps a better approach here is to completely remove mention of .internal.selfref from user-facing messages. We can describe what's gone wrong without anchoring on .internal.selfref per se. I reckon doing so would also make the messages more helpful as an average user seeing .internal.selfref is more likely to be confused by this obscurely-named object, even if ?.internal.selfref works.

WDYT @jangorecki / @tdhock / @ben-schwen?

*Well, certainly there are already downstreams making assumptions about these internal details, but I'm more comfortable with breaking changes of undocumented code.

@ben-schwen
Copy link
Member

I agree with Michael that a manpage actually puts more burden on us in maintaining the described functionality than in helping the average user so in favor of closing #4519 and replacing the mentioning of .internal.selfref with more user-appropriate information.

However, I think that this implementation detail is still worth either a wiki entry or a blog post on the Raft for the interested reader (maybe as a part of a broader topic like reference semantics).

@aitap
Copy link
Member

aitap commented Dec 31, 2024

I defer to @MichaelChirico and @ben-schwen. Sorry for making you do extra work, @venom1204.

@aitap aitap dismissed their stale review December 31, 2024 08:33

let's not document internal implementation details

@venom1204
Copy link
Contributor Author

@aitap No worries! I’ve gained a lot of valuable knowledge throughout this process, and I truly appreciate the guidance and insights you’ve provided. Thank you!

@MichaelChirico
Copy link
Member

Thanks for your cooperation @venom1204! Glad you found the process nevertheless insightful & hope you could learn about data.table more deeply --> more contributions in the future :)

BTW, it can't hurt to check in on an issue's discussion, especially for older issues, e.g. "I am thinking of working on this". That might save effort if we're able to raise concerns about whether our minds have changed about actually fixing a given issue :) (the issue backlog is quite large and old, so certainly there are some issues in there that should just be closed 😢)

@venom1204 venom1204 closed this by deleting the head repository Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need a reference page for .internal.selfref

5 participants