-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
I ran into this when converting a nested tibble (i.e. with a list column of tibbles) to a data.table. Normally, list columns print with something like <tibble[3x1]> in data.table; however, these columns printed as a string displaying the entirety of the nested tibbles contents (in my case, ~20k rows of data per tibble). This appears to be due to a check for the existence of a format() method for the column type in data.table:::format_col(). In this case, the list columns were vctrs_list_of class, which implements its own format.vctrs_list_of() method. See below reprex for an example.
Fixing this is easy enough (see below reprex for proposed solution), but it would change the default printing behavior of list-cols. Any list subclass would then have to implement a format_list_item() method to get special treatment. On the other hand, any list subclass could then implement that method and get special treatment.
To me, defaulting to the standard list print behavior is an improvement, given that format() methods for list-like classes are generally not going to product output suitable for a column in a data.table (which is why format_list_item() is needed in the first place).
Reprex is below, along with proposed solution. I'm happy to file a PR but wanted to make sure the change in defaults is acceptable in principle first. Thanks!
# Setup -----------------------------------------------------------------------
# Use development version of {data.table}
data.table::update_dev_pkg()
#> R data.table package is up-to-date at 8f8ef9343dcabefa9e4cb0af4251cbb74dae9f55 (1.15.99)
# Attach internals
ns <- asNamespace("data.table")
name <- format(ns)
attach(ns, name = name, warn.conflicts = FALSE)
# Create example data
dt <- data.table(
list_col = list(data.table(a = 1:3), data.table(a = 4:6)),
list_of_col = vctrs::list_of(data.table(a = 1:3), data.table(a = 4:6))
)
# Problem ---------------------------------------------------------------------
# `format_col()` does not format `list_of` columns from {vctrs} as lists
# Print `data.table`
print(dt)
#> list_col list_of_col
#> <list> <vctrs_list_of>
#> 1: <data.table[3x1]> 1, 2, 3
#> 2: <data.table[3x1]> 4, 5, 6
# Format individually
format_col(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
format_col(dt$list_of_col)
#> [1] "1, 2, 3" "4, 5, 6"
# Solution --------------------------------------------------------------------
# Current function definition:
# format_col.default = function(x, ...) {
# if (!is.null(dim(x)))
# "<multi-column>"
# else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
# formatted
# else if (is.list(x))
# vapply_1c(x, format_list_item, ...)
# else
# format(char.trunc(x), ...)
# }
# Swapping the order of the `else if` statements in `format_col()` will fix the issue
# `format_col_updated()` formats `list_of` columns from {vctrs} as lists
format_col_updated = function(x, ...) {
if (!is.null(dim(x)))
"<multi-column>"
else if (is.list(x)) # Now comes before `has_format_method()` check
vapply_1c(x, format_list_item, ...)
else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
formatted
else
format(char.trunc(x), ...)
}
# Replace as default method
registerS3method("format_col", "default", format_col_updated)
# Print `data.table`
print(dt)
#> list_col list_of_col
#> <list> <vctrs_list_of>
#> 1: <data.table[3x1]> <data.table[3x1]>
#> 2: <data.table[3x1]> <data.table[3x1]>
# Print individually
# Still formatted as list
format_col_updated(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
# Now also formatted as list
format_col_updated(dt$list_of_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
# Clean up --------------------------------------------------------------------
detach(name, character.only = TRUE)
rm(list = ls())Created on 2024-02-21 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.2 (2023-10-31 ucrt)
#> os Windows 11 x64 (build 22631)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz America/Chicago
#> date 2024-02-21
#> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
#> data.table 1.15.99 2024-02-21 [1] local
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.2)
#> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
#> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
#> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.2)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1)
#> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.2)
#> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
#> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
#> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.2)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2)
#> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.2)
#>
#> [1] D:/ProgramFiles/R/R-4.3.2/library
#>
#> ──────────────────────────────────────────────────────────────────────────────