Langium AI #2008
-
I hope you’re doing well. I’m currently experimenting with the Langium AI tool and my goal is to evaluate which large language model produces responses that correctly follow the Langium grammar I have defined. However, I’ve encountered a challenge: the example workspace available in the Langium AI repository focuses on grammar generation, rather than on validating or parsing LLM-generated output using an existing grammar. Could you please let me know if there are:
Any pointers, code snippets, or references to relevant resources would be incredibly helpful. Thank you so much for your time and support! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @RuanVanRooyenDSA , thanks for the question! Sounds like you're talking about eval-langium example, which does focus on evaluating generated grammars. However the same approach will work for evaluating program output that corresponds to your DSL as well. It just happens to be that in that example the program that we're parsing is a generated Langium grammar, since Langium parses itself. For any other Langium DSL, you can extend the base evaluator class with one that invokes the services for your DSL instead (building a document & invoking validation on it). It's very thin by design, so you can determine how you want to do that for your language. For reference, you can see how the langium-evaluator extends this to invoke the services for parsing & validating Langium grammars, and then retrieving those diagnostics for checking later. Effectively you can do a very similar approach for your own language as well using your own service set, which can be retrieved like so (as an example): const myDSLServices = createMyDSLServices(NodeFileSystem);
const myDSLEvaluator = new MyDSLEvaluator(myDSLServices.mydsl);
// invoke your custom evaluator like in evaluator-langium example This works pretty nicely if you're project is a mono-repo, but you can also link in your DSL project to get access to the services as well. Generally, once you have this setup, we tend to see that users will consume that information as part of a larger Python workflow (like a fine-tuning run for example). This can be via piping in the output via a cli, or running a small server (like express) to make it available to some other local systems. It's somewhat dependent on your preference & workflow constraints. Since the evaluator is pretty lightweight you can also compute additional metrics if needed and modify the evaluator result type, based on your needs & requirements. This comes up as well, since there are often additional metrics that others want to add which are sourced via some other means (quality assessment, code complexity heuristic, or some other grading for example). Hope that helps! |
Beta Was this translation helpful? Give feedback.
Hi @RuanVanRooyenDSA , thanks for the question! Sounds like you're talking about eval-langium example, which does focus on evaluating generated grammars. However the same approach will work for evaluating program output that corresponds to your DSL as well. It just happens to be that in that example the program that we're parsing is a generated Langium grammar, since Langium parses itself.
For any other Langium DSL, you can extend the base evaluator class with one that invokes the services for your DSL instead (building a document & invoking validation on it). It's very thin by design, so you can determine how you want to do that for your language. For reference, you can see how the langi…