Feasibility of integrating GGML into ComfyUI for text encoder models. #12783

Dampfinchen · 2026-03-05T10:00:33Z

Dampfinchen
Mar 5, 2026

From my new reddit thread:

"Hello,

yesterday I was comparing Ace Step 1.5 on Comfy UI vs Acestep.cpp on my RTX 2060 laptop. I wanna share the results with you because they are nothing short of mindboggling.

Let's start with the 16 bit 1.7B text encoder ComfyUi uses by default. If I hit generate and it starts the planning phase, it takes 4 minutes and 30 seconds to finish (for a song of a duration of 120 seconds) and have the audio codecs ready for the diffusion model to work with. The generation speed is 2.1 it/s.

Now, in koboldcpp which uses acestep.cpp and the 4B text encoder model quanted to q6_k, the same work takes... 25 seconds at 31 token/s.

Yes, that is a speedup of a factor of 10x for the text encoding process. In favor of the higher quality 4B text encoder versus the standard 1.7B one!

Not only that, but I am running the higher end text encoder on acestepp.cpp. We know from the LLM world that native gguf q6_k is very close to the quality of the original bf16 model, and since the 4B model is much larger in parameters than the 1.7B text encoder ComfyUI usually uses, it will be be of much higher quality, too. In addition to the speedup.

Why is that? Well ComfyUI uses text encoders at 16 bit precision which doesn't fit into my VRAM, so it has to use CPU offloading. Which is very slow. Meanwhile the quanted 4B at q6_k fits nicely. And remember, text models at q6_k almost have no perceptible loss in quality.

This doesn't just apply to Ace Step, but today's image generation models also usually come with huge text encoders which currently use a lot of VRAM. It is highly likely that even if you are on a higher end system configuration, you could benefit hugely from native GGML support in ComfyUI given the size of those. And even if a text encoder model wouldn't fit in VRAM, GGML has much faster CPU offloading so you could run much larger text encoders at still decent speeds.

For diffusion models however, Comfy's memory management and CPU offloading is efficient and fast. There's no difference in speed.

Now, I have no clue how feasible it would be to integrate the GGML lib in ComfyUI and let it interact with its diffusion engine. But if it could work, that would be a game changer."

I assume it would be a huge challenge to integrate the GGML lib into ComfyUI?

comfyanonymous · 2026-03-05T19:26:41Z

comfyanonymous
Mar 5, 2026
Maintainer

That's mostly because you are on a old 20 series card and comfy runs a lot of these models in fp32 on it because 20 series doesn't support bf16.

GGML is also not very good on the GPU side and I think we can do a lot better if we put in a bit of effort.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feasibility of integrating GGML into ComfyUI for text encoder models. #12783

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feasibility of integrating GGML into ComfyUI for text encoder models. #12783

Uh oh!

Uh oh!

Dampfinchen Mar 5, 2026

Replies: 1 comment

Uh oh!

comfyanonymous Mar 5, 2026 Maintainer

Dampfinchen
Mar 5, 2026

comfyanonymous
Mar 5, 2026
Maintainer