Skip to content

list sub-class with format() method prints full contents #5948

@jesse-smith

Description

@jesse-smith

I ran into this when converting a nested tibble (i.e. with a list column of tibbles) to a data.table. Normally, list columns print with something like <tibble[3x1]> in data.table; however, these columns printed as a string displaying the entirety of the nested tibbles contents (in my case, ~20k rows of data per tibble). This appears to be due to a check for the existence of a format() method for the column type in data.table:::format_col(). In this case, the list columns were vctrs_list_of class, which implements its own format.vctrs_list_of() method. See below reprex for an example.

Fixing this is easy enough (see below reprex for proposed solution), but it would change the default printing behavior of list-cols. Any list subclass would then have to implement a format_list_item() method to get special treatment. On the other hand, any list subclass could then implement that method and get special treatment.

To me, defaulting to the standard list print behavior is an improvement, given that format() methods for list-like classes are generally not going to product output suitable for a column in a data.table (which is why format_list_item() is needed in the first place).

Reprex is below, along with proposed solution. I'm happy to file a PR but wanted to make sure the change in defaults is acceptable in principle first. Thanks!

# Setup -----------------------------------------------------------------------

# Use development version of {data.table}
data.table::update_dev_pkg()
#> R data.table package is up-to-date at 8f8ef9343dcabefa9e4cb0af4251cbb74dae9f55 (1.15.99)
# Attach internals
ns <- asNamespace("data.table")
name <- format(ns)
attach(ns, name = name, warn.conflicts = FALSE)

# Create example data
dt <- data.table(
  list_col = list(data.table(a = 1:3), data.table(a = 4:6)),
  list_of_col = vctrs::list_of(data.table(a = 1:3), data.table(a = 4:6))
)


# Problem ---------------------------------------------------------------------
# `format_col()` does not format `list_of` columns from {vctrs} as lists

# Print `data.table`
print(dt)
#>             list_col     list_of_col
#>               <list> <vctrs_list_of>
#> 1: <data.table[3x1]>         1, 2, 3
#> 2: <data.table[3x1]>         4, 5, 6

# Format individually
format_col(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
format_col(dt$list_of_col)
#> [1] "1, 2, 3" "4, 5, 6"


# Solution --------------------------------------------------------------------

# Current function definition:
# format_col.default = function(x, ...) {
#   if (!is.null(dim(x)))
#     "<multi-column>"
#   else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
#     formatted
#   else if (is.list(x))
#     vapply_1c(x, format_list_item, ...)
#   else
#     format(char.trunc(x), ...)
# }

# Swapping the order of the `else if` statements in `format_col()` will fix the issue
# `format_col_updated()` formats `list_of` columns from {vctrs} as lists
format_col_updated = function(x, ...) {
  if (!is.null(dim(x)))
    "<multi-column>"
  else if (is.list(x)) # Now comes before `has_format_method()` check
    vapply_1c(x, format_list_item, ...)
  else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
    formatted
  else
    format(char.trunc(x), ...)
}

# Replace as default method
registerS3method("format_col", "default", format_col_updated)

# Print `data.table`
print(dt)
#>             list_col       list_of_col
#>               <list>   <vctrs_list_of>
#> 1: <data.table[3x1]> <data.table[3x1]>
#> 2: <data.table[3x1]> <data.table[3x1]>

# Print individually
# Still formatted as list
format_col_updated(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
# Now also formatted as list
format_col_updated(dt$list_of_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"

# Clean up --------------------------------------------------------------------
detach(name, character.only = TRUE)
rm(list = ls())

Created on 2024-02-21 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31 ucrt)
#>  os       Windows 11 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/Chicago
#>  date     2024-02-21
#>  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.2)
#>  data.table    1.15.99 2024-02-21 [1] local
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.2)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.2)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.2)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.2)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.2)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.2)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.2)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.1)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.1)
#>  R.utils       2.12.3  2023-11-18 [1] CRAN (R 4.3.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.2)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.2)
#>  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.2)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
#>  styler        1.10.2  2023-08-29 [1] CRAN (R 4.3.2)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.2)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.2)
#>  xfun          0.41    2023-11-01 [1] CRAN (R 4.3.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.2)
#> 
#>  [1] D:/ProgramFiles/R/R-4.3.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions