[FEATURE] LLM Token-Level Generation Supervision

### Feature Description

Rescued from #368:

You may wish to consider implementing one of the token-level supervision options for LlamaCPP to deliver superior adherence during structured generation. It's the difference between asking "pretty please" and guaranteeing a correctly structured response.

As currently implemented by @xsxszab in [nexa_inference_text.py](https://github.com/NexaAI/nexa-sdk/blob/156d40c220ba998da6fbcc07f6eb25f5f8ae0cf6/nexa/gguf/nexa_inference_text.py#L378), generation will fail if the model does not return a valid JSON response or doesn't follow the requested schema.

### Options

_LM Format Enforcer (Python)_

LM Format Enforcer's [llama-cpp-python integration code](https://github.com/noamgat/lm-format-enforcer/blob/main/lmformatenforcer/integrations/llamacpp.py) should be easy to adapt. This package is already being used in RedHat/IBM's enterprise-focused VLLM project ([reference](https://docs.vllm.ai/en/stable/features/structured_outputs.html)).

A demonstration workbook is available [here](https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_llamacpppython_integration.ipynb). You _may_ be able to run this workbook as-is by merely changing the imports. e.g.:

```diff
-from llama_cpp import LogitsProcessorList
+from nexa.gguf.llama import LogitsProcessorList
```

_LLGuidance (upstream)_

The [LLGuidance](https://github.com/guidance-ai/llguidance) Rust crate has recently been [added](https://github.com/ggerganov/llama.cpp/pull/10224) to upstream llama.cpp.

Enabling this feature during compilation [requires some fiddling](https://github.com/ggerganov/llama.cpp/blob/master/docs/llguidance.md) with Rust, and there are still some bug fixes that need to be finalized ([pull 11644](https://github.com/ggerganov/llama.cpp/pull/11664)). However, these are transitional problems and adopting this approach would probably make it easier for end-users to utilize structured generation using the SDK.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] LLM Token-Level Generation Supervision #370

Feature Description

Options

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] LLM Token-Level Generation Supervision #370

Description

Feature Description

Options

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions