Skip to content

Commit 1556345

Browse files
authored
Improve handling for nested objects (#774)
* Let `convert_from_type()` created packed data frames * Document that `type_object()` supports dynamic dots * Add some advice
1 parent 5f66181 commit 1556345

File tree

6 files changed

+41
-17
lines changed

6 files changed

+41
-17
lines changed

DESCRIPTION

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ Imports:
3333
promises (>= 1.3.1),
3434
R6,
3535
rlang (>= 1.1.0),
36-
S7 (>= 0.2.0)
36+
S7 (>= 0.2.0),
37+
vctrs
3738
Suggests:
3839
connectcreds,
3940
curl (>= 6.0.1),

R/chat-structured.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ convert_from_type <- function(x, type) {
5151
convert_from_type(vals, type_array(type@items@properties[[name]]))
5252
})
5353
names(cols) <- names(type@items@properties)
54-
list2DF(cols)
54+
vctrs::new_data_frame(cols)
5555
}
5656
} else {
5757
x

R/types.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -180,8 +180,8 @@ type_array <- function(items, description = NULL, required = TRUE) {
180180
TypeArray(items = items, description = description, required = required)
181181
}
182182

183-
#' @param ... Name-type pairs defining the components that the object must
184-
#' possess.
183+
#' @param ... <[`dynamic-dots`][rlang::dyn-dots]> Name-type pairs defining
184+
#' the components that the object must possess.
185185
#' @param .additional_properties Can the object have arbitrary additional
186186
#' properties that are not explicitly listed? Only supported by Claude.
187187
#' @export

man/type_boolean.Rd

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/testthat/test-chat-structured.R

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,26 @@ test_that("can convert arrays of objects to data frames", {
197197
)
198198
})
199199

200+
test_that("array of object with nested objects becomes packed data frame", {
201+
type <- type_array(
202+
type_object(
203+
x = type_object(a = type_integer()),
204+
y = type_object(a = type_integer())
205+
)
206+
)
207+
208+
data <- list(
209+
list(x = list(a = 1), y = list(a = 3)),
210+
list(x = list(a = 5), y = list(a = 7))
211+
)
212+
213+
out <- convert_from_type(data, type)
214+
expect_equal(nrow(out), 2)
215+
expect_named(out, c("x", "y"))
216+
expect_equal(out$x, data.frame(a = c(1, 5)))
217+
expect_equal(out$y, data.frame(a = c(3, 7)))
218+
})
219+
200220
test_that("can recursively convert objects contents", {
201221
expect_equal(
202222
convert_from_type(

vignettes/structured-data.Rmd

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -93,28 +93,28 @@ To define your desired type specification (also known as a schema), you use the
9393
* **Arrays** represent a vector of values of the same type. They are created with `type_array()` and require the `item` argument which specifies the type of each element. Arrays of scalars are very similar to R's atomic vectors:
9494

9595
```{r}
96-
type_logical_vector <- type_array(type_boolean())
97-
type_integer_vector <- type_array(type_integer())
98-
type_double_vector <- type_array(type_number())
99-
type_character_vector <- type_array(type_string())
96+
type_logical_vector <- type_array(type_boolean())
97+
type_integer_vector <- type_array(type_integer())
98+
type_double_vector <- type_array(type_number())
99+
type_character_vector <- type_array(type_string())
100100
```
101101

102102
You can also have arrays of arrays resemble lists with well defined structures:
103103

104104
```{r}
105-
list_of_integers <- type_array(type_integer_vector)
105+
list_of_integers <- type_array(type_integer_vector)
106106
```
107107

108108
Arrays of objects (described next) are equivalent to data frames.
109109

110110
* **Objects** represent a collection of named values. They are created with `type_object()`. Objects can contain any number of scalars, arrays, and other objects. They are similar to named lists in R.
111111

112112
```{r}
113-
type_person2 <- type_object(
114-
name = type_string(),
115-
age = type_integer(),
116-
hobbies = type_array(type_string())
117-
)
113+
type_person2 <- type_object(
114+
name = type_string(),
115+
age = type_integer(),
116+
hobbies = type_array(type_string())
117+
)
118118
```
119119

120120
Under the hood, these type specifications ensures that the LLM returns correctly structured JSON. But ellmer goes one step further and converts the JSON to the closest R analog. This means:
@@ -223,7 +223,10 @@ chat$chat_structured(prompt, type = type_people)
223223

224224
Now ellmer knows what you want and gives you a data frame.
225225

226-
If you're familiar with the terms row-oriented and column-oriented data frames, this is the same idea. Since most languages don't possess vectorisation like R, row-oriented data frames are more common.
226+
If you're familiar with the terms row-oriented and column-oriented data frames, this is the same idea. Since most languages don't possess
227+
vectorisation like R, row-oriented data frames are more common.
228+
229+
Note that you'll generally wanted to avoid nesting objects in inside of objects as this will generate a data frame where each column is itself a data frame. This is a valid data structure in R, but you'll find it easier to understand if you convert it to a tibble, since the tibble print method makes it more clear what's going on. You can also use `tidyr::unpack()` to unpack these df-columns back into a regular flat data frame.
227230

228231
## Examples
229232

0 commit comments

Comments
 (0)