Support for LLGuidance, which uses constrained sampling to facilitate valid JSON output, was added to llama.cpp and then enhanced earlier this year. It's the difference between asking "pretty please", validating the output post-generation, and guaranteeing valid output by supervising each token as it is generated, and it makes working with Small Language Models much more reliable.
Enabling this feature during compilation requires some fiddling with Rust, but is probably the most effective implementation possible given the move away from the llama-cpp-python backend (see #370 for history).