Skip to content

Conversation

@yousef-rafat
Copy link
Contributor

@yousef-rafat yousef-rafat commented Sep 5, 2025

Screenshot 2025-09-09 230818

@yousef-rafat yousef-rafat changed the title Add support to Higgsv2 + Autoregressive Generation Add support for Higgsv2 + Autoregressive Generation Sep 5, 2025
@Kosinkadink Kosinkadink added the Good PR This PR looks good to go, it needs comfy's final review. label Sep 18, 2025
@Kosinkadink Kosinkadink added the Core Core team dependency label Sep 30, 2025
Copy link
Member

@Kosinkadink Kosinkadink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, and sorry it took so long to review! Comfy and I took a look today. There are some comments added, but here is a summary + extras:

  1. CUDA Graph stuff should be removed from the code if possible.
  2. comfy would prefer that the caches from transformers.cache_utils not be used, as he wants to have as little dependency on transformers as possible.
  3. Check if the llama tokenizer .json could be reused for the higgsv2 tokenizer since they might be identical.
  4. Torch over numpy wherever possible

While testing after creating the combined checkpoint file, I found a bug - if you try to run a workflow a second time by incrementing the seed, the Autoregressive Generation node does things for a bit but then ultimately throws this error:

!!! Exception during processing !!! 'StaticCache' object has no attribute 'layers'
Traceback (most recent call last):
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 496, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 315, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 289, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 277, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\nodes.py", line 1588, in generate
    return (auto_sample(self, model, input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed = seed),)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 678, in auto_sample
    samples = node._cached_autoregressive_sampler.generate(main_input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed=seed, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Kosinkadink\ComfyUI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 393, in generate
    result = self.model._sample(
             ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1115, in _sample
    past_key_values, self.current_past_key_values_bucket = self._prepare_kv_cache(
                                                           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1018, in _prepare_kv_cache
    self._copy_kv_cache(
  File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 983, in _copy_kv_cache
    from_layer = from_cache.layers[i]
                 ^^^^^^^^^^^^^^^^^
AttributeError: 'StaticCache' object has no attribute 'layers'```

Let me know if you have any questions/comments!


_NUM_WARMUP_ITERS = 2

class CUDAGraphRunner(nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comfy wants all CUDA graph stuff removed from this PR - unless there is a clear performance benefit. If the torch.cuda.synchronize call is needed, something may be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a noticeable and clear performance boost from CUDA graphs from my tests. You can see that by forcibly enabling/disabling them in the init of the AutoRegressiveGeneration class.

The torch.cuda.synchronize calls were in the original implementation: https://github.com/boson-ai/higgs-audio/blob/main/boson_multimodal/model/higgs_audio/cuda_graph_runner.py
I think I could remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, comfy says that we can eventually just make CUDA graphs a general comfy feature, so we shouldn't implement this for a specific model right now

import warnings
from enum import Enum
from dataclasses import dataclass, fields
from transformers.cache_utils import StaticCache, DynamicCache, Cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comfy would prefer if cache classes were not imported from transformers, so these likely need to use either some existing ComfyUI cache class or be rewritten.

return data

def apply_filter(self, data: torch.Tensor):
if data.is_cuda or self.use_fir:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldnt be separate code paths for CPU/GPU, if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIR filter does an FFT convolution, which benefits much from the gpu compared to a sequential algorithm like the IIR that benefits more from the cpu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run some tests, and it seems that FIR does well on both gpu and cpu compared to IIR, so I will stick with that
fir-vs-iir-performance_.ipynb

def generate_coefficients(self):

A = 10**(self.G/40.0)
w0 = 2.0 * np.pi * (self.fc / self.rate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numpy code should be replaced with torch wherever possible

@Kosinkadink
Copy link
Member

Another thing - when you create a checkpoint for these PRs, could you upload those to huggingface to make it simple to test?

@Kosinkadink
Copy link
Member

I'll review your changes in the next day or so!

return q_embed.to(org_dtype), k_embed.to(org_dtype)
return q_embed.to(org_dtype), k_embed.to(org_dtype), sin, cos

class LlamaRoPE(nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this out of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove it or put it into another specific file?

mlp_activation = "silu"
qkv_bias: bool = False
rope_type: str = "llama3"
rope_scaling: dict = field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kosinkadink
Copy link
Member

Encountered an error trying to run:

!!! Exception during processing !!! 'str' object has no attribute 'shape'
Traceback (most recent call last):
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 496, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 315, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 289, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 277, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy_extras\nodes_autoregressive.py", line 48, in decode
    return clip.cond_stage_model.decode_tokens(tokens)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\text_encoders\higgsv2.py", line 70, in decode_tokens
    vq_code = revert_delay_pattern_vectorized(audio).clip(0, self.audio_codebook_size - 1)[:, 1:-1]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kosinkadink\ComfyUI\comfy\text_encoders\higgsv2.py", line 14, in revert_delay_pattern_vectorized
    num_codebooks, total_len = data.shape
                               ^^^^^^^^^^
AttributeError: 'str' object has no attribute 'shape'

Workflow:
higgsv2_bug.json

@Kosinkadink
Copy link
Member

The audio_tokens in this function is a dictionary instead of an iterable of tensors, I assume something was not pulled out properly?

DQNI1vSZjg

@Kosinkadink
Copy link
Member

Kosinkadink commented Nov 19, 2025

Based on our conversation on slack, the tokens need to go through the autoregressive sampler before becoming useful. Because the output completely change form, there should be a different type outputted from the sampler than the input, otherwise users would be able to make the same mistake very easily and plug things in where they don't belong. Not sure if 'ENCODED_TOKENS' would be the best name for it, but something like that. If there is another way to do this, do let me know and we can review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Core team dependency Good PR This PR looks good to go, it needs comfy's final review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants