Skip to content

Commit 7cd5df2

Browse files
karthinkkhinshankhan
authored andcommitted
gptel: Add JSON response schema parsing and preprocessing
Add experimental support for structured outputs to `gptel-request' via the :schema argument. It is now possible to force an LLM to respond in JSON conforming to the provided schema. Note: This commit only adds the infrastructure required for the feature! No backend currently respects :schema. Support for all backends will be added in the next commit. There are several caveats with this feature in its current form: 1. Not all providers support it, but the major backends do: OpenAI, Anthropic, Gemini, llama-cpp and Ollama. Support for structured outputs among other "OpenAI-compatible" backends is flaky. 2. `gptel-send' does not yet support structured outputs, as it is intended to be a general chat command. Only the `gptel-request' API does. (Schema support for `gptel-send' can be added if there is sufficient demand.) 3. Schemas whose root elements are of type array are not supported by most APIs. In this case the schema is wrapped in an object with one field and it is the caller's responsibility to extract the array elements from it. 4. The JSON schema has to be supplied in one of two ways: - As an elisp object consisting of nested plists, similar to how arguments in gptel-tool definitions are provided. - As a JSON schema serialized to a string. While expressive, both formats are cumbersome for quick use, so support for other short hand specifications is planned. * gptel.el (gptel--with-buffer-copy-internal): Copy `gptel--schema' as well. (gptel--schema, gptel-request): Add :schema argument, use the internal variable `gptel--schema' to communicate this to the payload builders (primarily `gptel--request-data'). The docstring for :schema is inadequate, but it will require too many lines in an already long description. This will be updated after adding other ways to specify the schema. (gptel--parse-schema): Generic function to parse a provided schema into a backend-appropriate format. (gptel--preprocess-schema, gptel--dispatch-schema-type): Utility functions to sanitize provided schemas. The former is required to convert all symbols in the spec to strings (see `gptel--preprocess-tool-args' for why). The latter handles schemas provided as serialized JSON, and wraps a root-level array specification in an object. This wrapping is needed since most APIs require an object type at the schema root. (gptel--tool-use-p, gptel--tool-result-p): Don't check for `:tools' in INFO, as the Anthropic API uses an ersatz tool to provide JSON output as tool call arguments. This ersatz tool is not defined by the user and not included in `:tools'. (gptel--handle-tool-use): Handle JSON output masquerading as a tool call. This is for the Anthropic API only.
1 parent 6ab3baf commit 7cd5df2

File tree

1 file changed

+77
-7
lines changed

1 file changed

+77
-7
lines changed

gptel.el

Lines changed: 77 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -914,6 +914,11 @@ These parameters are combined with model-specific and backend-specific
914914
incompatible with the active backend can break gptel. Do not use this
915915
variable unless you know what you're doing!")
916916

917+
(defconst gptel--ersatz-json-tool "response_json"
918+
"Name of ersatz tool used to force JSON output.
919+
920+
Some APIs, like Anthropic, use a tool to produce structured JSON output.")
921+
917922

918923
;;; Utility functions
919924

@@ -1092,7 +1097,7 @@ For BUF, START, END and BODY-THUNK see `gptel--with-buffer-copy'."
10921097
(with-current-buffer temp-buffer
10931098
(dolist (sym '( gptel-backend gptel--system-message gptel-model
10941099
gptel-mode gptel-track-response gptel-track-media
1095-
gptel-use-tools gptel-tools gptel-use-curl
1100+
gptel-use-tools gptel-tools gptel-use-curl gptel--schema
10961101
gptel-use-context gptel--num-messages-to-send
10971102
gptel-stream gptel-include-reasoning gptel--request-params
10981103
gptel-temperature gptel-max-tokens gptel-cache))
@@ -1576,6 +1581,60 @@ file."
15761581

15771582
(declare-function gptel-context--wrap "gptel-context")
15781583

1584+
1585+
;;; Structured output
1586+
(defvar gptel--schema nil
1587+
"Response output schema for backends that support it.")
1588+
1589+
(cl-defgeneric gptel--parse-schema (_backend _schema)
1590+
"Parse JSON schema in a backend-appropriate way.")
1591+
1592+
(defun gptel--dispatch-schema-type (schema)
1593+
"Convert SCHEMA to a valid elisp representation."
1594+
(when (stringp schema)
1595+
(setq schema (gptel--json-read-string schema)))
1596+
;; The OpenAI and Anthropic APIs don't allow arrays at the root of the schema.
1597+
;; Work around this by wrapping it in an object with the field "items".
1598+
;; TODO(schema): Find some way to strip this extra layer from the response.
1599+
(if (member (plist-get schema :type) '("array" array))
1600+
(list :type "object"
1601+
:properties (list :items schema)
1602+
:required ["items"]
1603+
:additionalProperties :json-false)
1604+
schema))
1605+
1606+
(defun gptel--preprocess-schema (spec)
1607+
"Set additionalProperties for objects in SPEC destructively.
1608+
1609+
Convert symbol :types to strings."
1610+
;; NOTE: Do not use `sequencep' here, as that covers strings too and breaks
1611+
;; things.
1612+
(when (or (listp spec) (vectorp spec))
1613+
(cond
1614+
((vectorp spec)
1615+
(cl-loop for element across spec
1616+
for idx upfrom 0
1617+
do (aset spec idx (gptel--preprocess-schema element))))
1618+
((keywordp (car spec))
1619+
(let ((tail spec))
1620+
(while tail
1621+
(when (eq (car tail) :type)
1622+
(when (symbolp (cadr tail)) ;Convert symbol :type to string
1623+
(setcar (cdr tail) (symbol-name (cadr tail))))
1624+
(when (equal (cadr tail) "object") ;Add additional object fields
1625+
(plist-put tail :additionalProperties :json-false)
1626+
(let ((props
1627+
(cl-loop for prop in (plist-get tail :properties) by #'cddr
1628+
collect (substring (symbol-name prop) 1))))
1629+
(plist-put tail :required (vconcat props)))))
1630+
(when (or (listp (cadr tail)) (vectorp (cadr tail)))
1631+
(gptel--preprocess-schema (cadr tail)))
1632+
(setq tail (cddr tail)))))
1633+
((listp spec) (dolist (element spec)
1634+
(when (listp element)
1635+
(gptel--preprocess-schema element))))))
1636+
spec)
1637+
15791638

15801639
;;; Tool use
15811640

@@ -2242,7 +2301,12 @@ Run post-response hooks."
22422301
(cons 'tool-result result-alist) info)
22432302
(gptel--fsm-transition fsm)))))
22442303
(if (null tool-spec)
2245-
(message "Unknown tool called by model: %s" name)
2304+
(if (equal name gptel--ersatz-json-tool) ;Could be a JSON response
2305+
;; Handle structured JSON output supplied as tool call
2306+
(funcall (plist-get info :callback)
2307+
(gptel--json-encode (plist-get tool-call :args))
2308+
info)
2309+
(message "Unknown tool called by model: %s" name))
22462310
(setq arg-values
22472311
(mapcar
22482312
(lambda (arg)
@@ -2277,11 +2341,9 @@ Run post-response hooks."
22772341

22782342
(defun gptel--error-p (info) (plist-get info :error))
22792343

2280-
(defun gptel--tool-use-p (info)
2281-
(and (plist-get info :tools) (plist-get info :tool-use)))
2344+
(defun gptel--tool-use-p (info) (plist-get info :tool-use))
22822345

2283-
(defun gptel--tool-result-p (info)
2284-
(and (plist-get info :tools) (plist-get info :tool-success)))
2346+
(defun gptel--tool-result-p (info) (plist-get info :tool-success))
22852347

22862348
;; TODO(prompt-list): Document new prompt input format to `gptel-request'.
22872349

@@ -2292,7 +2354,7 @@ Run post-response hooks."
22922354
position context dry-run
22932355
(stream nil) (in-place nil)
22942356
(system gptel--system-message)
2295-
transforms (fsm (gptel-make-fsm)))
2357+
schema transforms (fsm (gptel-make-fsm)))
22962358
"Request a response from the `gptel-backend' for PROMPT.
22972359
22982360
The request is asynchronous, this function returns immediately.
@@ -2442,6 +2504,13 @@ additional information (such as from a RAG engine).
24422504
and the state machine. It should run the callback after finishing its
24432505
transformation.
24442506
2507+
If provided, SCHEMA forces the LLM to generate JSON output. Its value
2508+
is a JSON schema, which can be provided as an elisp object, a nested
2509+
plist structure. See the manual or the wiki for examples.
2510+
2511+
Note: SCHEMA is presently experimental and subject to change, and not
2512+
all providers support structured output.
2513+
24452514
See `gptel-prompt-transform-functions' for more.
24462515
24472516
FSM is the state machine driving the request. This can be used
@@ -2470,6 +2539,7 @@ be used to rerun or continue the request at a later time."
24702539
((markerp position) position)
24712540
((integerp position)
24722541
(set-marker (make-marker) position buffer))))
2542+
(gptel--schema schema)
24732543
(prompt-buffer
24742544
(cond ;prompt from buffer or explicitly supplied
24752545
((null prompt)

0 commit comments

Comments
 (0)