You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/structured-data.Rmd
+14-11Lines changed: 14 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -93,28 +93,28 @@ To define your desired type specification (also known as a schema), you use the
93
93
***Arrays** represent a vector of values of the same type. They are created with `type_array()` and require the `item` argument which specifies the type of each element. Arrays of scalars are very similar to R's atomic vectors:
Arrays of objects (described next) are equivalent to data frames.
109
109
110
110
***Objects** represent a collection of named values. They are created with `type_object()`. Objects can contain any number of scalars, arrays, and other objects. They are similar to named lists in R.
111
111
112
112
```{r}
113
-
type_person2 <- type_object(
114
-
name = type_string(),
115
-
age = type_integer(),
116
-
hobbies = type_array(type_string())
117
-
)
113
+
type_person2 <- type_object(
114
+
name = type_string(),
115
+
age = type_integer(),
116
+
hobbies = type_array(type_string())
117
+
)
118
118
```
119
119
120
120
Under the hood, these type specifications ensures that the LLM returns correctly structured JSON. But ellmer goes one step further and converts the JSON to the closest R analog. This means:
@@ -223,7 +223,10 @@ chat$chat_structured(prompt, type = type_people)
223
223
224
224
Now ellmer knows what you want and gives you a data frame.
225
225
226
-
If you're familiar with the terms row-oriented and column-oriented data frames, this is the same idea. Since most languages don't possess vectorisation like R, row-oriented data frames are more common.
226
+
If you're familiar with the terms row-oriented and column-oriented data frames, this is the same idea. Since most languages don't possess
227
+
vectorisation like R, row-oriented data frames are more common.
228
+
229
+
Note that you'll generally wanted to avoid nesting objects in inside of objects as this will generate a data frame where each column is itself a data frame. This is a valid data structure in R, but you'll find it easier to understand if you convert it to a tibble, since the tibble print method makes it more clear what's going on. You can also use `tidyr::unpack()` to unpack these df-columns back into a regular flat data frame.
0 commit comments