support for INDENT/DEDENT tokens by mmoskal · Pull Request #128 · guidance-ai/llguidance

mmoskal · 2025-02-24T21:47:03Z

Fixes #107

nchammas · 2025-03-17T03:20:09Z

Hello @mmoskal. I am trying to have Guidance generate code with significant indentation, so I am following your work here.

Do you intend as part of this work to implement support for Lark's declare statement? It's a critical part of how Lark enables support for significant indentation, at least as far as I understand it from reviewing the relevant docs as well as Lark's Python grammar.

mmoskal · 2025-03-17T23:42:37Z

%declare in Lark just says the definition of the token is provided elsewhere. For constraint we actually need to know what IDENT and DEDENT tokens do. Unfortunately, it's far from simple.

nchammas · 2025-03-18T22:22:59Z

Yes, in Lark you need to provide an instance of lark.indenter.Indenter:

class TreeIndenter(Indenter):
    NL_type = '_NL'
    OPEN_PAREN_types = []
    CLOSE_PAREN_types = []
    INDENT_type = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8

parser = Lark(tree_grammar, parser='lalr', postlex=TreeIndenter())

I already have an indentation-significant DSL built using Lark, and I am hoping to use the same grammar mostly as-is with Guidance to have an LLM output queries in my DSL. To do that, Guidance's Lark interface would probably need to accept some bit of configuration equivalent to the above.

Is that something you are planning to do, or will the approach be very different? I know this is a work in progress, so I don't expect any definite answers. Just sharing my use case.

Separately, would it help at all if Lark itself provided some kind of API to help with next token prediction? I can see that you have built your own implementation of Lark in Rust (roughly speaking), but I wonder if direct support from Lark itself would also be useful somehow.

mmoskal · 2025-03-18T23:39:06Z

there is a similar setup in this PR

When designing a new DSL to be written by LLMs I would suggest not using indentation. AFAIU it makes the LLM stupider, as it has to keep track of it, instead of simply following braces.

Unfortunately, any changes in Lark Python code cannot be used in LLGuidance.

nchammas · 2025-03-20T15:40:08Z

When designing a new DSL to be written by LLMs I would suggest not using indentation. AFAIU it makes the LLM stupider, as it has to keep track of it, instead of simply following braces.

Oh, that's surprising to hear. I assumed that due to the popularity of Python and YAML, LLMs wouldn't have a particular problem with indentation-significant languages. I wonder why tracking indentation level would be much harder for LLMs than tracking braces.

The DSL I designed is primarily for humans to write, but I am exploring how practical it is to have an LLM assist non-technical users by converting the queries they write in English into DSL queries.

If indentation is such a problem, perhaps I should develop some kind of additional JSON format for my DSL just for LLMs to target. Then I would be able to use the much more mature support for JSON schema-constrained generation. Not sure if this would be a lot of work, or if it would work well, as I haven't worked with JSON schemas before. But that's what I'll research next, I think.

mmoskal added 7 commits February 23, 2025 10:32

start on INDENT/DEDENT tokens

043039c

eager parsing of %llguidance

4aaa928

add tests for indent syntax

75d411b

easier syntax for parens

e315d6b

add simple python-like tests

76191f9

more work on indent

c07486a

attribute source

971d1af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for INDENT/DEDENT tokens#128

support for INDENT/DEDENT tokens#128
mmoskal wants to merge 7 commits intomainfrom
indent

mmoskal commented Feb 24, 2025

Uh oh!

nchammas commented Mar 17, 2025

Uh oh!

mmoskal commented Mar 17, 2025

Uh oh!

nchammas commented Mar 18, 2025

Uh oh!

mmoskal commented Mar 18, 2025

Uh oh!

nchammas commented Mar 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmoskal commented Feb 24, 2025

Uh oh!

nchammas commented Mar 17, 2025

Uh oh!

mmoskal commented Mar 17, 2025

Uh oh!

nchammas commented Mar 18, 2025

Uh oh!

mmoskal commented Mar 18, 2025

Uh oh!

nchammas commented Mar 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants