Skip to content

Commit 99e7898

Browse files
committed
corrected the error
1 parent b3efcdb commit 99e7898

File tree

12 files changed

+61
-56
lines changed

12 files changed

+61
-56
lines changed

NEWS.0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1072,7 +1072,7 @@
10721072
query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T for reporting [here](https://stackoverflow.com/questions/20349159/memory-leak-in-data-table-grouped-assignment-by-reference) and [here](https://stackoverflow.com/questions/15651515/slow-memory-leak-in-data-table-when-returning-named-lists-in-j-trying-to-reshap) on SO.
10731073
10741074
2. In long running computations where data.table is called many times repetitively the following error could sometimes occur, #2647: *"Internal error: .internal.selfref prot is not itself an extptr"*. Now fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples [here](https://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r).
1075-
for more info about internal.selfref Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information.
1075+
for more info about internal.selfref.
10761076
10771077
10781078
3. If `fread` returns a data error (such as no closing quote on a quoted field) it now closes the file first rather than holding a lock open, a Windows only problem.

NEWS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -744,7 +744,7 @@ rowwiseDT(
744744
27. `as.data.frame(DT)`, `setDF(DT)` and `as.list(DT)` now remove the `"index"` attribute which contains any indices (a.k.a. secondary keys), as they already did for other `data.table`-only attributes such as the primary key stored in the `"sorted"` attribute. When indices were left intact, a subsequent subset, assign, or reorder of the `data.frame` by `data.frame`-code in base R or other packages would not update the indices, causing incorrect results if then converted back to `data.table`, [#4889](https://github.com/Rdatatable/data.table/issues/4889). Thanks @OfekShilon for the report and the PR.
745745
746746
28. `dplyr::arrange(DT)` uses `vctrs::vec_slice` which retains `data.table`'s class but uses C to bypass `[` method dispatch and does not adjust `data.table`'s attributes containing the index row numbers, [#5042](https://github.com/Rdatatable/data.table/issues/5042). `data.table`'s long-standing `.internal.selfref` mechanism to detect such operations by other packages was not being checked by `data.table` when using indexes, causing `data.table` filters and joins to use invalid indexes and return incorrect results after a `dplyr::arrange(DT)`. Thanks to @Waldi73 for reporting; @avimallu, @tlapak, @MichaelChirico, @jangorecki and @hadley for investigating and suggestions; and @mattdowle for the PR. The intended way to use `data.table` is `data.table::setkey(DT, col1, col2, ...)` which reorders `DT` by reference in parallel, sets the primary key for automatic use by subsequent `data.table` queries, and permits rowname-like usage such as `DT["foo",]` which returns the now-contiguous-in-memory block of rows where the first column of `DT`'s key contains `"foo"`. Multi-column-rownames (i.e. a primary key of more than one column) can be looked up using `DT[.("foo",20210728L), ]`. Using `==` in `i` is also optimized to use the key or indices, if you prefer using column names explicitly and `==`. An alternative to `setkey(DT)` is returning a new ordered result using `DT[order(col1, col2, ...), ]`.
747-
Refer to [internal.selfref](../man/internal.selfref.Rd) for additional information.
747+
748748
749749
29. A segfault occurred when `nrow/throttle < nthread`, [#5077](https://github.com/Rdatatable/data.table/issues/5077). With the default throttle of 1024 rows (see `?setDTthreads`), at least 64 threads would be needed to trigger the segfault since there needed to be more than 65,535 rows too. It occurred on a server with 256 logical cores where `data.table` uses 128 threads by default. Thanks to Bennet Becker for reporting, debugging at C level, and fixing. It also occurred when the throttle was increased so as to use fewer threads; e.g. at the limit `setDTthreads(throttle=nrow(DT))`.
750750

man/assign.Rd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,12 +92,12 @@ Since \code{[.data.table} incurs overhead to check the existence and type of arg
9292
\value{
9393
\code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}).
9494
}
95-
\seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}},\code{\link{internal.selfref}}
95+
\seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{set}}, \code{\link{.Last.updated}},\code{\link{.internal.selfref}}
9696
}
9797
\examples{
9898
DT = data.table(a = LETTERS[c(3L,1:3)], b = 4:7)
9999
DT[, c := 8] # add a numeric column, 8 for all rows
100-
DT[, d := 9L] # add an integer column, 9L for all rows\code{\link{.Last.updated}}
100+
DT[, d := 9L] # add an integer column, 9L for all rows
101101
DT[, c := NULL] # remove column c
102102
DT[2, d := -8L] # subassign by reference to d; 2nd row is -8L now
103103
DT # DT changed by reference

man/copy.Rd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ A \code{copy()} may be required when doing \code{dt_names = names(DT)}. Due to R
2424
Returns a copy of the object.
2525
}
2626
\seealso{
27-
\code{\link{data.table}}, \code{\link{address}}, \code{\link{setkey}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}} \code{\link{:=}}, \code{\link{setorder}}, \code{\link{setattr}}, \code{\link{setnames}},\code{\link{internal.selfref}}
27+
\code{\link{data.table}}, \code{\link{address}}, \code{\link{setkey}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{.internal.selfref}}
2828
}
2929
\examples{
3030
# Type 'example(copy)' to run these at prompt and browse output

man/data.table-class.Rd

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,8 @@
1717

1818
\author{ Steve Lianoglou }
1919
\seealso{
20-
\code{\link{data.table}},\code{\link{internal.selfref}}
20+
\code{\link{data.table}}, \code{\link{tables}}, \code{\link{J}}, \code{\link[base:order]{sort.list}}, \code{\link{copy}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{chorder}}, \code{\link{setNumericRounding}}, \code{\link{.internal.selfref}}
2121
}
22-
2322
\examples{
2423
## Used in inheritance.
2524
setClass('SuperDataTable', contains='data.table')

man/datatable-optimize.Rd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,9 @@ Auto indexing can be switched off with the global option
102102
\code{options(datatable.auto.index = FALSE)}. To switch off using existing
103103
indices set global option \code{options(datatable.use.index = FALSE)}.
104104
}
105-
\seealso{ \code{\link{setNumericRounding}}, \code{\link{getNumericRounding}},\code{\link{internal.selfref}} }
105+
\seealso{
106+
\code{\link{setNumericRounding}}, \code{\link{getNumericRounding}}, \code{\link{.internal.selfref}}
107+
}
106108
\examples{
107109
\dontrun{
108110
old = options(datatable.optimize = Inf)

man/internal.selfref.Rd

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
\name{.internal.selfref}
2+
\alias{.internal.selfref}
3+
\title{Internal Self-Reference Attribute in data.table}
4+
\description{
5+
The \code{.internal.selfref} attribute is an internal mechanism used by \code{data.table} to optimize memory management and performance. It acts as a pointer that allows \code{data.table} objects to reference their own memory location. While the \code{.internal.selfref} attribute may appear to always point to \code{NULL} when inspected directly, this is a result of its implementation in R's memory management system. The true significance of this attribute lies in its role in supporting reference semantics, which enables efficient in-place modification of \code{data.table} objects without unnecessary copying.
6+
7+
The \code{.internal.selfref} attribute is deliberately structured so that \code{identical()} checks return \code{TRUE} for two \code{data.table} objects with identical contents, even when their attributes point to the same memory address. This behavior is achieved by storing the actual self-reference pointer in the \code{prot} part of an external pointer, wrapped in another external pointer to avoid creating visible reference loops. When a \code{data.table} is duplicated, its memory address changes, making it possible to detect the copy and handle it accordingly.
8+
}
9+
\details{
10+
The \code{.internal.selfref} attribute is a pointer that ensures that \code{data.table} objects can be modified by reference without redundant memory allocation. This avoids copying when performing in-place modifications such as adding or updating columns, filtering rows, or performing joins.
11+
12+
Key details about the \code{.internal.selfref} attribute:
13+
\itemize{
14+
\item \code{p=NULL} is used instead of \code{R_NilValue}, allowing \code{data.table} to detect objects loaded from disk and ensure correct behavior.
15+
\item Wrapping the self-reference in another external pointer prevents infinite loops during \code{object.size} calculations.
16+
\item If the attribute is removed or corrupted, the next operation involving \code{:=} triggers a warning and creates a new self-reference after copying.
17+
}
18+
19+
The \code{_selfrefok} function verifies the validity of the \code{.internal.selfref} attribute. It checks whether the attribute correctly references the current \code{data.table} object by comparing memory addresses. If the attribute is invalidated (e.g., due to duplication or corruption), \code{_selfrefok} triggers a repair mechanism to restore reference semantics, ensuring that in-place operations remain efficient.
20+
}
21+
\value{
22+
The \code{.internal.selfref} attribute is an internal implementation detail and does not produce a value that users would typically interact with. It is invisible during regular \code{data.table} operations.
23+
}
24+
\seealso{
25+
\code{\link{data.table}}, \code{\link{setkey}}, \code{\link{merge}}, \code{\link{[.data.table}}
26+
}
27+
\examples{
28+
# Create a data.table
29+
dt <- data.table(A = 1:5, B = letters[1:5])
30+
31+
# Trace memory to check for reference semantics
32+
tracemem(dt) # Outputs the memory address of the data.table
33+
34+
# Perform an in-place operation
35+
dt[, C := A * 2] # Add a new column in place
36+
37+
# Verify no copying has occurred
38+
# (The output of tracemem should show no memory change)
39+
40+
# Example of losing .internal.selfref (hypothetical, for illustration)
41+
dt_copy <- copy(dt) # Copy the data.table
42+
.Internal(inspect(dt_copy)) # Shows .internal.selfref attribute no longer matches
43+
}
44+
\keyword{internal}

man/internal.selfref.rd

Lines changed: 0 additions & 37 deletions
This file was deleted.

man/setDT.Rd

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ setDT(x, keep.rownames=FALSE, key=NULL, check.names=FALSE)
2525
The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., \code{setDT(X)[, sum(B), by=A]}. If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See \code{?copy}.
2626
}
2727
28-
\seealso{ \code{\link{data.table}}, \code{\link{as.data.table}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}},\code{\link{internal.selfref}}
28+
\seealso{
29+
\code{\link[base]{transform}}, \code{\link[base:with]{within}}, \code{\link{:=}}, \code{\link{.internal.selfref}}
2930
}
3031
\examples{
3132

man/setkey.Rd

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -107,11 +107,8 @@ reference.
107107
\url{https://cran.r-project.org/package=bit64}\cr
108108
\url{https://github.com/Rdatatable/data.table/wiki/Presentations}
109109
}
110-
\seealso{ \code{\link{data.table}}, \code{\link{tables}}, \code{\link{J}},
111-
\code{\link[base:order]{sort.list}}, \code{\link{copy}}, \code{\link{setDT}},
112-
\code{\link{setDF}}, \code{\link{set}} \code{\link{:=}}, \code{\link{setorder}},
113-
\code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}},
114-
\code{\link{chorder}}, \code{\link{setNumericRounding}},\code{\link{internal.selfref}}
110+
\seealso{
111+
\code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setNumericRounding}}, \code{\link{.internal.selfref}}
115112
}
116113
\examples{
117114
# Type 'example(setkey)' to run these at the prompt and browse output

0 commit comments

Comments
 (0)