Commit 5ec7117
RageLtMan
XML Matching Improvements and SpecialTokens
Reliance on single/guessed special tokens only goes so far when we
use constrained outputs because masking-out a potential EOS token
results in infinite generation: we have to account for all possible
candidates in the mask which could normally end generation.
Add SpecialTokens idiomatic extractor as a starting point for this
work and utilize it to feed all EOS tokens to the grammar-building
routines. Add the binary for this library element to examples/ for
@guoqingbao and other developers to have rapid access to what the
SpecialTokens struct actually extracts from any tokenizer.json
provided in ARGV0 or from ./tokenizer.json if none are provided.
Improve XML tool-sled generation. Remaining issue is potential of
XML content within the XML envelope and no ability to mask possibly
infinite strings as anything but infinite due to look-ahead and lazy
regex tricks from interpreted languages not actually compiling to a
finite mask. Use a simple matcher for now, enable env-override by
the user while this gets sorted out (if possible) and critically
enable the grammar generator to honor tool parser override at the
CLI such that `--enforce-parser qwen` produces JSON-constrained
schemas which the parser can then consume.
XML finite masking tracked under:
- guidance-ai/llguidance#306
Multiple EOS token concerns (handled in grammar) under:
- guidance-ai/llguidance#304
- guidance-ai/llguidance#3051 parent fd90c03 commit 5ec7117
File tree
7 files changed
+628
-469
lines changed- src
- core
- server
- tools
- utils
7 files changed
+628
-469
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
100 | 101 | | |
101 | 102 | | |
102 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
103 | 107 | | |
104 | 108 | | |
105 | 109 | | |
| |||
466 | 470 | | |
467 | 471 | | |
468 | 472 | | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
469 | 476 | | |
470 | 477 | | |
471 | 478 | | |
| |||
488 | 495 | | |
489 | 496 | | |
490 | 497 | | |
| 498 | + | |
491 | 499 | | |
492 | 500 | | |
493 | 501 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
525 | 525 | | |
526 | 526 | | |
527 | 527 | | |
528 | | - | |
| 528 | + | |
529 | 529 | | |
530 | | - | |
| 530 | + | |
531 | 531 | | |
532 | 532 | | |
533 | 533 | | |
534 | 534 | | |
535 | 535 | | |
536 | 536 | | |
537 | 537 | | |
538 | | - | |
| 538 | + | |
539 | 539 | | |
540 | 540 | | |
541 | 541 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
0 commit comments