Skip to content

Commit b5399d4

Browse files
committed
add some docs
1 parent afb6cac commit b5399d4

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed

docs/llguidance.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# LLGuidance support in llama.cpp
2+
3+
[LLGuidance](https://github.com/guidance-ai/llguidance) is a library for constrained decoding (also called constrained sampling or structured outputs) for Large Langauge Models (LLMs).
4+
It was developed as the backend for [Guidance](https://github.com/guidance-ai/guidance) library, but can be also used standalone.
5+
6+
LLGuidance supports JSON Schemas or arbitrary context-free grammars (CFGs) in
7+
a [variant](https://github.com/guidance-ai/llguidance/blob/main/parser/src/lark/README.md) of Lark syntax.
8+
It is [very fast](https://github.com/guidance-ai/jsonschemabench/tree/main/maskbench)
9+
and has [excellent](https://github.com/guidance-ai/llguidance/blob/main/parser/src/json/README.md) JSON Schema coverage.
10+
It does, however, complicate llama.cpp build process, as it requires Rust compiler.
11+
12+
## Building
13+
14+
To enable LLGuidance support, build llama.cpp with the `LLAMA_LLGUIDANCE` option:
15+
16+
```sh
17+
cmake -B build -DLLAMA_LLGUIDANCE=ON
18+
make -C build -j
19+
```
20+
21+
This requires the Rust compiler and `cargo` tool to be [installed](https://www.rust-lang.org/tools/install).
22+
23+
## Interface
24+
25+
There are no new command line arguments or `common_params`.
26+
When enabled, any grammar starting with `%llguidance` is passed to LLGuidance, not the [current](../grammars/README.md) llama.cpp Grammars.
27+
Additionally, when JSON Schema is requested (eg., with `-j` argument to `llama-cli`), it's also passed to LLGuidance.
28+
29+
## Performance
30+
31+
Computing "token mask" (ie., set of all allowed tokens), for a llama3 tokenizer (with 128k tokens),
32+
for [JSON Schema Bench](https://github.com/guidance-ai/jsonschemabench) takes on avarage 50μs of single-core CPU time. The p99 time is 0.5ms, and p100 is 20ms.
33+
34+
This is due to lexer/parser split and a bunch of [optimizations](https://github.com/guidance-ai/llguidance/blob/main/docs/optimizations.md).
35+
36+
## JSON Schema
37+
38+
LLGuidance tries to be faithful to the JSON Schema specification where possible.
39+
In particular, unlike in current Grammars, `additionalProperties` defaults to `true`, and any whitespace is allowed.
40+
You can of course set `"additionalProperties": false` yourself.
41+
LLGuidance will also follow definition order of properties in the `"properties": {}` object,
42+
regardless if they are required or not (current Grammars always put required properties first).
43+
44+
If a schema is not fully supported by LLGuidance, it will error out with a message.
45+
That is, no JSON Schema keywords are silently ignored.
46+
47+
## Why not re-use GBNF format?
48+
49+
GBNF has no concept of a lexer.
50+
51+
For virtually all programming languages (including JSON), lexers, typically built using regular expressions, are used to convert a stream of bytes into a stream of lexemes (also called tokens, but that name conflicts with LLM tokens).
52+
Then, the context-free grammar (CFG) parser can operate on lexemes, and there is way fewer of them than bytes.
53+
Because regular expressions are cheaper to evaluate than context-free grammars, this two-step process is faster than parsing the whole input with a CFG.
54+
55+
Typically the LLM tokens are somewhat aligned with lexemes, meaning that when executing the grammar against all tokens, the parser needs to be involved in 0.5% or less of cases, leaving the rest to the lexer.
56+
57+
However, the user has to specify the distinction between lexemes and CFG symbols.
58+
In [Lark](https://github.com/lark-parser/lark) this is done by making the lexemes names all uppercase,
59+
while CFG symbols are all lowercase.
60+
61+
For example, this is a very simplified grammar for the C programming language:
62+
63+
```lark
64+
start: program
65+
66+
program: (function_definition | declaration)*
67+
68+
function_definition: type ID "(" parameter_list? ")" "{" statement* "}"
69+
parameter_list: parameter ("," parameter)*
70+
parameter: type ID
71+
72+
declaration: type variable_list ";"
73+
variable_list: ID ("," ID)*
74+
75+
type: "int" | "float" | "char" | "void"
76+
77+
statement: declaration
78+
| assignment ";"
79+
| "return" expr ";"
80+
| if_statement
81+
| while_statement
82+
| expr ";"
83+
84+
assignment: ID "=" expr
85+
expr: term (("+" | "-") term)*
86+
term: factor (("*" | "/") factor)*
87+
factor: ID | NUMBER | "(" expr ")"
88+
89+
if_statement: "if" "(" expr ")" "{" statement* "}" ("else" "{" statement* "}")?
90+
while_statement: "while" "(" expr ")" "{" statement* "}"
91+
92+
ID: /[a-zA-Z_][a-zA-Z0-9_]*/
93+
NUMBER: /[0-9]+/
94+
95+
%ignore /[ \t\f\r\n]+/
96+
```
97+
98+
The GBNF grammar would be very similar, but `ID` and `NUMBER` would typically be
99+
lowercase, and would be internally translated to a CFG, instead of being kept as regular expressions.
100+
Also, in the last line we define that all whitespace should be ignored.
101+
This would have to specified explicitly everywhere in the GBNF format.
102+
103+
While it is possible to write a grammar with only lowercase symbols, it will be much slower than a grammar with lexemes.
104+
You will also eventually get an error about 'single-byte lexemes' from LLGuidance.
105+
Typically, renaming some symbols to uppercase will fix this.
106+
107+
## Error handling
108+
109+
Currently, errors are just printed to stderr, and the generation continues.
110+
This can hopefully be improved in the future.

0 commit comments

Comments
 (0)