|
| 1 | +# LLGuidance support in llama.cpp |
| 2 | + |
| 3 | +[LLGuidance](https://github.com/guidance-ai/llguidance) is a library for constrained decoding (also called constrained sampling or structured outputs) for Large Langauge Models (LLMs). |
| 4 | +It was developed as the backend for [Guidance](https://github.com/guidance-ai/guidance) library, but can be also used standalone. |
| 5 | + |
| 6 | +LLGuidance supports JSON Schemas or arbitrary context-free grammars (CFGs) in |
| 7 | +a [variant](https://github.com/guidance-ai/llguidance/blob/main/parser/src/lark/README.md) of Lark syntax. |
| 8 | +It is [very fast](https://github.com/guidance-ai/jsonschemabench/tree/main/maskbench) |
| 9 | +and has [excellent](https://github.com/guidance-ai/llguidance/blob/main/parser/src/json/README.md) JSON Schema coverage. |
| 10 | +It does, however, complicate llama.cpp build process, as it requires Rust compiler. |
| 11 | + |
| 12 | +## Building |
| 13 | + |
| 14 | +To enable LLGuidance support, build llama.cpp with the `LLAMA_LLGUIDANCE` option: |
| 15 | + |
| 16 | +```sh |
| 17 | +cmake -B build -DLLAMA_LLGUIDANCE=ON |
| 18 | +make -C build -j |
| 19 | +``` |
| 20 | + |
| 21 | +This requires the Rust compiler and `cargo` tool to be [installed](https://www.rust-lang.org/tools/install). |
| 22 | + |
| 23 | +## Interface |
| 24 | + |
| 25 | +There are no new command line arguments or `common_params`. |
| 26 | +When enabled, any grammar starting with `%llguidance` is passed to LLGuidance, not the [current](../grammars/README.md) llama.cpp Grammars. |
| 27 | +Additionally, when JSON Schema is requested (eg., with `-j` argument to `llama-cli`), it's also passed to LLGuidance. |
| 28 | + |
| 29 | +## Performance |
| 30 | + |
| 31 | +Computing "token mask" (ie., set of all allowed tokens), for a llama3 tokenizer (with 128k tokens), |
| 32 | +for [JSON Schema Bench](https://github.com/guidance-ai/jsonschemabench) takes on avarage 50μs of single-core CPU time. The p99 time is 0.5ms, and p100 is 20ms. |
| 33 | + |
| 34 | +This is due to lexer/parser split and a bunch of [optimizations](https://github.com/guidance-ai/llguidance/blob/main/docs/optimizations.md). |
| 35 | + |
| 36 | +## JSON Schema |
| 37 | + |
| 38 | +LLGuidance tries to be faithful to the JSON Schema specification where possible. |
| 39 | +In particular, unlike in current Grammars, `additionalProperties` defaults to `true`, and any whitespace is allowed. |
| 40 | +You can of course set `"additionalProperties": false` yourself. |
| 41 | +LLGuidance will also follow definition order of properties in the `"properties": {}` object, |
| 42 | +regardless if they are required or not (current Grammars always put required properties first). |
| 43 | + |
| 44 | +If a schema is not fully supported by LLGuidance, it will error out with a message. |
| 45 | +That is, no JSON Schema keywords are silently ignored. |
| 46 | + |
| 47 | +## Why not re-use GBNF format? |
| 48 | + |
| 49 | +GBNF has no concept of a lexer. |
| 50 | + |
| 51 | +For virtually all programming languages (including JSON), lexers, typically built using regular expressions, are used to convert a stream of bytes into a stream of lexemes (also called tokens, but that name conflicts with LLM tokens). |
| 52 | +Then, the context-free grammar (CFG) parser can operate on lexemes, and there is way fewer of them than bytes. |
| 53 | +Because regular expressions are cheaper to evaluate than context-free grammars, this two-step process is faster than parsing the whole input with a CFG. |
| 54 | + |
| 55 | +Typically the LLM tokens are somewhat aligned with lexemes, meaning that when executing the grammar against all tokens, the parser needs to be involved in 0.5% or less of cases, leaving the rest to the lexer. |
| 56 | + |
| 57 | +However, the user has to specify the distinction between lexemes and CFG symbols. |
| 58 | +In [Lark](https://github.com/lark-parser/lark) this is done by making the lexemes names all uppercase, |
| 59 | +while CFG symbols are all lowercase. |
| 60 | + |
| 61 | +For example, this is a very simplified grammar for the C programming language: |
| 62 | + |
| 63 | +```lark |
| 64 | +start: program |
| 65 | +
|
| 66 | +program: (function_definition | declaration)* |
| 67 | +
|
| 68 | +function_definition: type ID "(" parameter_list? ")" "{" statement* "}" |
| 69 | +parameter_list: parameter ("," parameter)* |
| 70 | +parameter: type ID |
| 71 | +
|
| 72 | +declaration: type variable_list ";" |
| 73 | +variable_list: ID ("," ID)* |
| 74 | +
|
| 75 | +type: "int" | "float" | "char" | "void" |
| 76 | +
|
| 77 | +statement: declaration |
| 78 | + | assignment ";" |
| 79 | + | "return" expr ";" |
| 80 | + | if_statement |
| 81 | + | while_statement |
| 82 | + | expr ";" |
| 83 | +
|
| 84 | +assignment: ID "=" expr |
| 85 | +expr: term (("+" | "-") term)* |
| 86 | +term: factor (("*" | "/") factor)* |
| 87 | +factor: ID | NUMBER | "(" expr ")" |
| 88 | +
|
| 89 | +if_statement: "if" "(" expr ")" "{" statement* "}" ("else" "{" statement* "}")? |
| 90 | +while_statement: "while" "(" expr ")" "{" statement* "}" |
| 91 | +
|
| 92 | +ID: /[a-zA-Z_][a-zA-Z0-9_]*/ |
| 93 | +NUMBER: /[0-9]+/ |
| 94 | +
|
| 95 | +%ignore /[ \t\f\r\n]+/ |
| 96 | +``` |
| 97 | + |
| 98 | +The GBNF grammar would be very similar, but `ID` and `NUMBER` would typically be |
| 99 | +lowercase, and would be internally translated to a CFG, instead of being kept as regular expressions. |
| 100 | +Also, in the last line we define that all whitespace should be ignored. |
| 101 | +This would have to specified explicitly everywhere in the GBNF format. |
| 102 | + |
| 103 | +While it is possible to write a grammar with only lowercase symbols, it will be much slower than a grammar with lexemes. |
| 104 | +You will also eventually get an error about 'single-byte lexemes' from LLGuidance. |
| 105 | +Typically, renaming some symbols to uppercase will fix this. |
| 106 | + |
| 107 | +## Error handling |
| 108 | + |
| 109 | +Currently, errors are just printed to stderr, and the generation continues. |
| 110 | +This can hopefully be improved in the future. |
0 commit comments