Skip to content

Commit adc4aed

Browse files
committed
clarify docs
1 parent b5399d4 commit adc4aed

File tree

1 file changed

+24
-43
lines changed

1 file changed

+24
-43
lines changed

docs/llguidance.md

Lines changed: 24 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,8 @@
1-
# LLGuidance support in llama.cpp
1+
# LLGuidance Support in llama.cpp
22

3-
[LLGuidance](https://github.com/guidance-ai/llguidance) is a library for constrained decoding (also called constrained sampling or structured outputs) for Large Langauge Models (LLMs).
4-
It was developed as the backend for [Guidance](https://github.com/guidance-ai/guidance) library, but can be also used standalone.
3+
[LLGuidance](https://github.com/guidance-ai/llguidance) is a library for constrained decoding (also called constrained sampling or structured outputs) for Large Language Models (LLMs). Initially developed as the backend for the [Guidance](https://github.com/guidance-ai/guidance) library, it can also be used independently.
54

6-
LLGuidance supports JSON Schemas or arbitrary context-free grammars (CFGs) in
7-
a [variant](https://github.com/guidance-ai/llguidance/blob/main/parser/src/lark/README.md) of Lark syntax.
8-
It is [very fast](https://github.com/guidance-ai/jsonschemabench/tree/main/maskbench)
9-
and has [excellent](https://github.com/guidance-ai/llguidance/blob/main/parser/src/json/README.md) JSON Schema coverage.
10-
It does, however, complicate llama.cpp build process, as it requires Rust compiler.
5+
LLGuidance supports JSON Schemas and arbitrary context-free grammars (CFGs) written in a [variant](https://github.com/guidance-ai/llguidance/blob/main/parser/src/lark/README.md) of Lark syntax. It is [very fast](https://github.com/guidance-ai/jsonschemabench/tree/main/maskbench) and has [excellent](https://github.com/guidance-ai/llguidance/blob/main/parser/src/json/README.md) JSON Schema coverage but requires the Rust compiler, which complicates the llama.cpp build process.
116

127
## Building
138

@@ -18,49 +13,41 @@ cmake -B build -DLLAMA_LLGUIDANCE=ON
1813
make -C build -j
1914
```
2015

21-
This requires the Rust compiler and `cargo` tool to be [installed](https://www.rust-lang.org/tools/install).
16+
This requires the Rust compiler and the `cargo` tool to be [installed](https://www.rust-lang.org/tools/install).
2217

2318
## Interface
2419

25-
There are no new command line arguments or `common_params`.
26-
When enabled, any grammar starting with `%llguidance` is passed to LLGuidance, not the [current](../grammars/README.md) llama.cpp Grammars.
27-
Additionally, when JSON Schema is requested (eg., with `-j` argument to `llama-cli`), it's also passed to LLGuidance.
20+
There are no new command-line arguments or modifications to `common_params`. When enabled, grammars starting with `%llguidance` are passed to LLGuidance instead of the [current](../grammars/README.md) llama.cpp grammars. Additionally, JSON Schema requests (e.g., using the `-j` argument in `llama-cli`) are also passed to LLGuidance.
2821

2922
## Performance
3023

31-
Computing "token mask" (ie., set of all allowed tokens), for a llama3 tokenizer (with 128k tokens),
32-
for [JSON Schema Bench](https://github.com/guidance-ai/jsonschemabench) takes on avarage 50μs of single-core CPU time. The p99 time is 0.5ms, and p100 is 20ms.
33-
34-
This is due to lexer/parser split and a bunch of [optimizations](https://github.com/guidance-ai/llguidance/blob/main/docs/optimizations.md).
24+
Computing a "token mask" (i.e., the set of allowed tokens) for a llama3 tokenizer with 128k tokens takes, on average, 50μs of single-core CPU time for the [JSON Schema Bench](https://github.com/guidance-ai/jsonschemabench). The p99 time is 0.5ms, and the p100 time is 20ms. These results are due to the lexer/parser split and several [optimizations](https://github.com/guidance-ai/llguidance/blob/main/docs/optimizations.md).
3525

3626
## JSON Schema
3727

38-
LLGuidance tries to be faithful to the JSON Schema specification where possible.
39-
In particular, unlike in current Grammars, `additionalProperties` defaults to `true`, and any whitespace is allowed.
40-
You can of course set `"additionalProperties": false` yourself.
41-
LLGuidance will also follow definition order of properties in the `"properties": {}` object,
42-
regardless if they are required or not (current Grammars always put required properties first).
28+
LLGuidance adheres closely to the JSON Schema specification. For example:
29+
30+
- `additionalProperties` defaults to `true`, unlike current grammars, though you can set `"additionalProperties": false` if needed.
31+
- any whitespace is allowed.
32+
- The definition order in the `"properties": {}` object is maintained, regardless of whether properties are required (current grammars always puts required properties first).
4333

44-
If a schema is not fully supported by LLGuidance, it will error out with a message.
45-
That is, no JSON Schema keywords are silently ignored.
34+
Unsupported schemas result in an error message—no keywords are silently ignored.
4635

47-
## Why not re-use GBNF format?
36+
## Why Not Reuse GBNF Format?
4837

49-
GBNF has no concept of a lexer.
38+
GBNF lacks the concept of a lexer.
5039

51-
For virtually all programming languages (including JSON), lexers, typically built using regular expressions, are used to convert a stream of bytes into a stream of lexemes (also called tokens, but that name conflicts with LLM tokens).
52-
Then, the context-free grammar (CFG) parser can operate on lexemes, and there is way fewer of them than bytes.
53-
Because regular expressions are cheaper to evaluate than context-free grammars, this two-step process is faster than parsing the whole input with a CFG.
40+
Most programming languages, including JSON, use a two-step process: a lexer (built with regular expressions) converts a byte stream into lexemes, which are then processed by a CFG parser. This approach is faster because lexers are cheaper to evaluate, and there is ~10x fewer lexemes than bytes.
5441

55-
Typically the LLM tokens are somewhat aligned with lexemes, meaning that when executing the grammar against all tokens, the parser needs to be involved in 0.5% or less of cases, leaving the rest to the lexer.
42+
LLM tokens often align with lexemes, so the parser is engaged in under 0.5% of tokens, with the lexer handling the rest.
5643

57-
However, the user has to specify the distinction between lexemes and CFG symbols.
58-
In [Lark](https://github.com/lark-parser/lark) this is done by making the lexemes names all uppercase,
59-
while CFG symbols are all lowercase.
44+
However, the user has to provide the distinction between lexemes and CFG symbols. In [Lark](https://github.com/lark-parser/lark), lexeme names are uppercase, while CFG symbols are lowercase.
6045

61-
For example, this is a very simplified grammar for the C programming language:
46+
For example, a simplified C grammar in Lark:
6247

6348
```lark
49+
%llguidance {}
50+
6451
start: program
6552
6653
program: (function_definition | declaration)*
@@ -95,16 +82,10 @@ NUMBER: /[0-9]+/
9582
%ignore /[ \t\f\r\n]+/
9683
```
9784

98-
The GBNF grammar would be very similar, but `ID` and `NUMBER` would typically be
99-
lowercase, and would be internally translated to a CFG, instead of being kept as regular expressions.
100-
Also, in the last line we define that all whitespace should be ignored.
101-
This would have to specified explicitly everywhere in the GBNF format.
85+
In GBNF, lexemes like `ID` and `NUMBER` are typically lowercase and converted to CFG rules instead of remaining regular expressions. Ignoring whitespace would need to be explicitly specified everywhere.
10286

103-
While it is possible to write a grammar with only lowercase symbols, it will be much slower than a grammar with lexemes.
104-
You will also eventually get an error about 'single-byte lexemes' from LLGuidance.
105-
Typically, renaming some symbols to uppercase will fix this.
87+
Writing grammars without lexemes would be slower and might result in "single-byte lexeme" errors in LLGuidance, fixable by renaming symbols to uppercase.
10688

107-
## Error handling
89+
## Error Handling
10890

109-
Currently, errors are just printed to stderr, and the generation continues.
110-
This can hopefully be improved in the future.
91+
Errors are currently printed to `stderr`, and generation continues. Improved error handling may be added in the future.

0 commit comments

Comments
 (0)