Thank you very much for this!
Every llama.cpp I installed to run LLM on comfyui just loads the gguf on system RAM..
others that made custom llama.cpp versions to be compatible with cuda 12.8 and python 13.2 always gave errors.
This works great and with the latest version, hopefully newer versions will not be broken too. π