Conversation
|
Hello @mmoskal. I am trying to have Guidance generate code with significant indentation, so I am following your work here. Do you intend as part of this work to implement support for Lark's |
|
|
|
Yes, in Lark you need to provide an instance of class TreeIndenter(Indenter):
NL_type = '_NL'
OPEN_PAREN_types = []
CLOSE_PAREN_types = []
INDENT_type = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
parser = Lark(tree_grammar, parser='lalr', postlex=TreeIndenter())I already have an indentation-significant DSL built using Lark, and I am hoping to use the same grammar mostly as-is with Guidance to have an LLM output queries in my DSL. To do that, Guidance's Lark interface would probably need to accept some bit of configuration equivalent to the above. Is that something you are planning to do, or will the approach be very different? I know this is a work in progress, so I don't expect any definite answers. Just sharing my use case. Separately, would it help at all if Lark itself provided some kind of API to help with next token prediction? I can see that you have built your own implementation of Lark in Rust (roughly speaking), but I wonder if direct support from Lark itself would also be useful somehow. |
|
there is a similar setup in this PR When designing a new DSL to be written by LLMs I would suggest not using indentation. AFAIU it makes the LLM stupider, as it has to keep track of it, instead of simply following braces. Unfortunately, any changes in Lark Python code cannot be used in LLGuidance. |
Oh, that's surprising to hear. I assumed that due to the popularity of Python and YAML, LLMs wouldn't have a particular problem with indentation-significant languages. I wonder why tracking indentation level would be much harder for LLMs than tracking braces. The DSL I designed is primarily for humans to write, but I am exploring how practical it is to have an LLM assist non-technical users by converting the queries they write in English into DSL queries. If indentation is such a problem, perhaps I should develop some kind of additional JSON format for my DSL just for LLMs to target. Then I would be able to use the much more mature support for JSON schema-constrained generation. Not sure if this would be a lot of work, or if it would work well, as I haven't worked with JSON schemas before. But that's what I'll research next, I think. |
Fixes #107