Skip to content

Conversation

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Oct 8, 2025

Closes #2055

This feels quite useful to me, and took enough time thinking about the edge cases that I feel like it deserves its own function, even if it is quite a short wrapper around list_interleave() and vec_chop() at this point.

Caution

I'm a little worried about the name clashing with purrr::list_transpose(), but I'm not entirely sure what to do about that. It has different invariants than the purrr function, and I think it is a little closer to purer vctrs principles.


For the problem of list(integer(), integer()) that we talked about over slack, I've realized that the correct thing for vctrs to do is to expose both size and ptype arguments. These control "the expected size and type of each element in the list".

I think these very elegantly let you recover the original object when you need to double-transpose in a generic way and have to worry about retaining the original size and type, even in the empty object case.

I think this is better than trying out some new rectangular_list_of object. I know it means that the object itself can't carry around the size and type info, but hopefully that's okay in terms of how people might do a double-transpose.

x <- list(integer(), integer())

# Input:
# - List size 2
# - Element size 0
# - Element type integer
# Output:
# - List size 0
# - Element size 2 (but no elements)
# - Element type integer (but no elements)
out <- list_transpose(x)
out
#> list()

# So then when you transpose again

# Input:
# - List size 0
# - Element size 0 (inferred)
# - Element type unspecified (inferred)
# Output:
# - List size 0
# - Element size 0
# - Element type unspecified
list_transpose(out)
#> list()

# Manually provide the original `size` and `ptype` to restore the original object

# Input:
# - List size 0
# - Element size 2 (provided)
# - Element type integer (provided)
# Output:
# - List size 2
# - Element size 0
# - Element type integer
list_transpose(out, size = vec_size(x), ptype = vec_ptype_common(!!!x))
#> [[1]]
#> integer(0)
#> 
#> [[2]]
#> integer(0)

The size argument is also nice when you have an expected element size, but each element is recyclable to that expected size and you happen to have all size 1 elements

x <- list(1, 2)

list_transpose(x, size = 3)
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] 1 2

@DavisVaughan
Copy link
Member Author

I know that purrr::list_transpose() wasn't necessarily designed for performance, but man this vctrs one is fast! We might consider seeing if we can find some way to use this in purrr, wrapping it in whatever way we'd have to to retain purrr's semantics.

# Makes 100,000 size 3 list elements
x <- list(
  sample(1e5),
  sample(1e5),
  sample(1e5)
)

bench::mark(
  list_transpose(x),
  purrr::list_transpose(x)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 list_transpose(x)          4.85ms   5.46ms   176.       3.43MB     0   
#> 2 purrr::list_transpose(x)    4.03s    4.03s     0.248    32.8MB     1.24

# Makes 10 million size 3 list elements
x <- list(
  sample(1e7),
  sample(1e7),
  sample(1e7)
)

# Takes way too long with purrr
bench::mark(list_transpose(x), iterations = 10)
#> # A tibble: 1 × 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 list_transpose(x)    486ms    488ms      2.05     343MB     9.22

@DavisVaughan DavisVaughan requested a review from lionel- October 8, 2025 17:35
Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Approving but I would expect the ptype of the outer list to be preserved, for instance with list_of(). If you disagree, I'd like to talk this over together before merging.

Comment on lines 104 to 105
# Disallow `NULL` entirely. These would break `vec_size()` invariants of
# `list_transpose()` if we simply drop them via `list_interleave()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an argument could be made for coercing to an unspecified vector, but I don't mind the stricter behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought long and hard about this today and decided to add a null argument

  • By default, we error on NULL elements
  • If null is supplied, it replaces every NULL element with null
    • It must be size 1
    • It participates in common type determination alongside the elements of x and ptype

The thing that took me the longest to figure out was the restriction that null must be size 1 (as opposed to recycling alongside all of the elements of x and size). Without that, I kept running into various scenarios that were hard to explain and didn't make any sense. I have added some tests that call these out to remind us of why we have this restriction. In the end I think it makes sense, you can think of it like:

null allows you to treat NULL as a size 1 element of some type

Which ends up being what you want for any use case I can think of.

library(vctrs)

# Error by default
try(list_transpose(list(1, NULL, 2:3, NULL)))
#> Error in list_transpose(list(1, NULL, 2:3, NULL)) : 
#>   `list(1, NULL, 2:3, NULL)[[2]]` must be a vector, not `NULL`.

list_transpose(list(1, NULL, 2:3, NULL), null = NA)
#> [[1]]
#> [1]  1 NA  2 NA
#> 
#> [[2]]
#> [1]  1 NA  3 NA

list_transpose(list(1, NULL, 2:3, NULL), null = 0L)
#> [[1]]
#> [1] 1 0 2 0
#> 
#> [[2]]
#> [1] 1 0 3 0

# Invariants on `null`:

# Must be size 1!
try(list_transpose(list(1, NULL, 2:3), null = 1:2))
#> Error in list_transpose(list(1, NULL, 2:3), null = 1:2) : 
#>   `null` must have size 1, not size 2.

# Does contribute to common type!
typeof(list_transpose(list(1L, NULL, 2L), null = 3)[[1]])
#> [1] "double"

# Some edge cases:

# With empty list
list_transpose(list(), null = 0L)
#> list()

# Input:
# - List size 0
# - Element size 2 (supplied)
# - Element type integer (from `null`)
# Output:
# - List size 2
# - Element size 0
# - Element type integer
list_transpose(list(), null = 0L, size = 2)
#> [[1]]
#> integer(0)
#> 
#> [[2]]
#> integer(0)

# Input:
# - List size 2
# - Element size 1 (`NULL` treated as size 1 when `null` is supplied)
# - Element type logical (from `null`)
# Output:
# - List size 1
# - Element size 2
# - Element type logical
list_transpose(list(NULL, NULL), null = NA)
#> [[1]]
#> [1] NA NA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'm curious what are the problematic cases with null a size n vector, where n is the result of common size determination? Do these occur when null is taken into account for the common size?

Base automatically changed from feature/interleave-improvements to main October 14, 2025 15:07
@DavisVaughan DavisVaughan force-pushed the feature/list-transpose branch from 0ff20ff to d9b0bc2 Compare October 14, 2025 15:08
Comment on lines 127 to 134
if (is.object(x)) {
# The list input type should not affect the transposition process in any
# way. In particular, supplying a list subclass that doesn't have a
# `vec_cast.subclass.list` method shouldn't prevent the insertion of
# `list(null)` before the transposition. The fact that we must insert
# `list(null)` should be considered an internal detail.
x <- unclass(x)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tests for this, but basically I want this to work

x <- structure(list(1, NULL), class = c("my_list", "list"))
list_transpose(x, null = 0)

But it doesn't due to null handling because we have to do this

vec_assign(x, vec_detect_missing(x), list(0))
#> Error in `vec_assign()`:
#> ! Can't convert <list> to <my_list>.

which doesn't work unless you bypass the class

We use obj_check_list() to check that it's a list, and after that we don't care about the class type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that suggests that a "list" class should have a coercion method from list?

Comment on lines 236 to 267
# Computes the `ptype` incorporating both `x` and `null`
#
# Like `ptype_finalize()` in `vec_recode_values()` and `vec_if_else()`
list_transpose_ptype_common <- function(
x,
null,
ptype,
x_arg,
error_call
) {
if (!is_null(ptype)) {
# Validate and return user specified `ptype`
ptype <- vec_ptype(ptype, x_arg = "ptype", call = error_call)
return(vec_ptype_finalise(ptype))
}

# Compute from `x`
ptype <- vec_ptype_common(!!!x, .arg = x_arg, .call = error_call)

if (!is_null(null)) {
# Layer in `null`
ptype <- vec_ptype2(
x = null,
y = ptype,
x_arg = "null",
y_arg = "",
call = error_call
)
}

ptype
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same trick used in vec_recode_values() and vec_if_else()

Basically I really want to avoid vec_ptype_common(!!!x, null = null) because that materializes a new list, and if x is really long then that is needlessly expensive.

Comment on lines 316 to 324
test_that("`null` can't result in recycle to size 0", {
# This is one reason we force `null` to be size 1.
# If it participated in common size determination it would result in `list()`
# by forcing the elements to recycle to size 0 first.
x <- list(1L, 2L)
expect_snapshot(error = TRUE, {
list_transpose(x, null = integer())
})
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reason why null is size 1

Comment on lines 379 to 390
# This is one reason we force `null` to be size 1.
# If it participated in common size determination, it would result in an
# element size of 2, and then an output list size of 2, giving us
# `list(integer(), integer())` which would be very odd.
#
# Input
# - List size 0
# - Element size 0 (inferred from list)
# - Element type integer (inferred from `null`)
expect_snapshot(error = TRUE, {
list_transpose(list(), null = 1:2)
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reason why null is size 1

@DavisVaughan DavisVaughan requested a review from lionel- October 14, 2025 21:53
Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss the list type preservation issue.

Comment on lines 104 to 105
# Disallow `NULL` entirely. These would break `vec_size()` invariants of
# `list_transpose()` if we simply drop them via `list_interleave()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'm curious what are the problematic cases with null a size n vector, where n is the result of common size determination? Do these occur when null is taken into account for the common size?

Comment on lines 127 to 134
if (is.object(x)) {
# The list input type should not affect the transposition process in any
# way. In particular, supplying a list subclass that doesn't have a
# `vec_cast.subclass.list` method shouldn't prevent the insertion of
# `list(null)` before the transposition. The fact that we must insert
# `list(null)` should be considered an internal detail.
x <- unclass(x)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that suggests that a "list" class should have a coercion method from list?

@DavisVaughan DavisVaughan force-pushed the feature/list-transpose branch from 536209a to fbda296 Compare October 16, 2025 13:14
@DavisVaughan
Copy link
Member Author

DavisVaughan commented Oct 24, 2025

@lionel- and I created a nice plan from this that we both are on board with

  • list_of should be altered to be a generic container for restricting both the element size and the element ptype. We want it to handle all of these cases:
    • Known size, Known ptype
    • Known size, Unknown ptype
    • Unknown size, Known ptype
  • list is then
    • Unknown size, Unknown ptype

If we can do that, we can say that list_transpose() is really a function that wants to have a signature of enum(list, list_of<size, type>) -> list_of<size, type>.

When size or ptype information is provided by the list itself, we use that.

When size or ptype information is missing, we compute it as the common type and common size of the inputs, HOWEVER, we require that the list have at least 1 non-NULL element in it so that we don't have to "guess" a size of 0 or ptype of unspecified / NULL in those cases. This goes along with the fact that list_of() currently errors if it can't figure out the common ptype.

It would return a list_of<size, ptype> where both of those are known (because we restricted the inputs in such a way that these are known even in the empty list cases). That would allow perfect roundtripping of transposition.

I think we'd remove the size and ptype arguments of list_transpose() and instead say you need to go through list_of(size =, ptype =) if you want to lock down the size or type in some way.

The null argument then is straightforward. It recycles to the size and casts to the ptype. Since neither of those can be guessed, all ambiguities are removed.

We'd be treating list_tranpose() as a function that could live in a package outside of vctrs where list_of() would also live. i.e. this extends our idea that list_of itself should be able to be implemented in its own package, and this is just a utility that would also be implemented there. That's why treating list-of's specially (both in terms of plucking attributes off the inputs and returning one as an output) is okay here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Export list_transpose()?

2 participants