-
Notifications
You must be signed in to change notification settings - Fork 0
Description
@SeraphineM here's a thought, as I digest the rapidly developing materials at https://ellmer.tidyverse.org/index.html but especially https://ellmer.tidyverse.org/articles/structured-data.html. What if we scaled this way back to realise that our main contribution is twofold.
- Providing a wrapper around ellmer functions, especially
chat_structured(), that works for the examples in the article above? These include:
-
sentiment analysis
-
scale questions
-
summaries
-
information extraction (named entities, for instance)
-
classification
We could provide these through predefined schema (as
type_objecttypes) that work directly with ellmer.
-
Providing a library of tested system prompts that accompany each
type_objectwrapper. I think we could do a lot better than the very simple, generic one we use currently.
Line 18 in 0fb5dc5
global_system_prompt = "You are an expert tasked with reading the supplied documents carefully and objectively." -
Making the llm calls work in a way that returns as a data.frame, which can be mutated into (a) new variable(s) or into corpus docvars. This includes:
- Providing a loop that works on character vectors and is robust to errors, can restart where it left off, etc. We do this through environments now, although it could be done in other ways. This is a big help and avoids some of the headaches that would confront anyone trying to apply a set of instructions to get a result.
- Flattening the list results into a rectangular object, when the length is not variable (as it would be for named entities for instance).
So perhaps we should have just one or two functions and a set of modules that consist of pre-defined combinations of type_object and system prompts? Users could then extend with their own or modify the supplied ones.