Skip to content

Consider a totally different approach to the package #12

@kbenoit

Description

@kbenoit

@SeraphineM here's a thought, as I digest the rapidly developing materials at https://ellmer.tidyverse.org/index.html but especially https://ellmer.tidyverse.org/articles/structured-data.html. What if we scaled this way back to realise that our main contribution is twofold.

  1. Providing a wrapper around ellmer functions, especially chat_structured(), that works for the examples in the article above? These include:
  • sentiment analysis

  • scale questions

  • summaries

  • information extraction (named entities, for instance)

  • classification

    We could provide these through predefined schema (as type_object types) that work directly with ellmer.

  1. Providing a library of tested system prompts that accompany each type_object wrapper. I think we could do a lot better than the very simple, generic one we use currently.

    global_system_prompt = "You are an expert tasked with reading the supplied documents carefully and objectively."

  2. Making the llm calls work in a way that returns as a data.frame, which can be mutated into (a) new variable(s) or into corpus docvars. This includes:

    • Providing a loop that works on character vectors and is robust to errors, can restart where it left off, etc. We do this through environments now, although it could be done in other ways. This is a big help and avoids some of the headaches that would confront anyone trying to apply a set of instructions to get a result.
    • Flattening the list results into a rectangular object, when the length is not variable (as it would be for named entities for instance).

So perhaps we should have just one or two functions and a set of modules that consist of pre-defined combinations of type_object and system prompts? Users could then extend with their own or modify the supplied ones.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions