-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Add support for Higgsv2 + Autoregressive Generation #9736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Kosinkadink
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, and sorry it took so long to review! Comfy and I took a look today. There are some comments added, but here is a summary + extras:
- CUDA Graph stuff should be removed from the code if possible.
- comfy would prefer that the caches from transformers.cache_utils not be used, as he wants to have as little dependency on transformers as possible.
- Check if the llama tokenizer .json could be reused for the higgsv2 tokenizer since they might be identical.
- Torch over numpy wherever possible
While testing after creating the combined checkpoint file, I found a bug - if you try to run a workflow a second time by incrementing the seed, the Autoregressive Generation node does things for a bit but then ultimately throws this error:
!!! Exception during processing !!! 'StaticCache' object has no attribute 'layers'
Traceback (most recent call last):
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 496, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 315, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 289, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "C:\Users\Kosinkadink\ComfyUI\execution.py", line 277, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\nodes.py", line 1588, in generate
return (auto_sample(self, model, input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed = seed),)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 678, in auto_sample
samples = node._cached_autoregressive_sampler.generate(main_input_ids, max_new_length, min_new_length, top_k, top_p, temperature, do_sample, seed=seed, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\Kosinkadink\ComfyUI\venv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\autoregressive_sampling.py", line 393, in generate
result = self.model._sample(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1115, in _sample
past_key_values, self.current_past_key_values_bucket = self._prepare_kv_cache(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 1018, in _prepare_kv_cache
self._copy_kv_cache(
File "C:\Users\Kosinkadink\ComfyUI\comfy\ldm\higgsv2\model.py", line 983, in _copy_kv_cache
from_layer = from_cache.layers[i]
^^^^^^^^^^^^^^^^^
AttributeError: 'StaticCache' object has no attribute 'layers'```
Let me know if you have any questions/comments!
|
|
||
| _NUM_WARMUP_ITERS = 2 | ||
|
|
||
| class CUDAGraphRunner(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comfy wants all CUDA graph stuff removed from this PR - unless there is a clear performance benefit. If the torch.cuda.synchronize call is needed, something may be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a noticeable and clear performance boost from CUDA graphs from my tests. You can see that by forcibly enabling/disabling them in the init of the AutoRegressiveGeneration class.
The torch.cuda.synchronize calls were in the original implementation: https://github.com/boson-ai/higgs-audio/blob/main/boson_multimodal/model/higgs_audio/cuda_graph_runner.py
I think I could remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, comfy says that we can eventually just make CUDA graphs a general comfy feature, so we shouldn't implement this for a specific model right now
comfy/autoregressive_sampling.py
Outdated
| import warnings | ||
| from enum import Enum | ||
| from dataclasses import dataclass, fields | ||
| from transformers.cache_utils import StaticCache, DynamicCache, Cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comfy would prefer if cache classes were not imported from transformers, so these likely need to use either some existing ComfyUI cache class or be rewritten.
comfy/ldm/higgsv2/loudness.py
Outdated
| return data | ||
|
|
||
| def apply_filter(self, data: torch.Tensor): | ||
| if data.is_cuda or self.use_fir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldnt be separate code paths for CPU/GPU, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIR filter does an FFT convolution, which benefits much from the gpu compared to a sequential algorithm like the IIR that benefits more from the cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big is the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run some tests, and it seems that FIR does well on both gpu and cpu compared to IIR, so I will stick with that
fir-vs-iir-performance_.ipynb
comfy/ldm/higgsv2/loudness.py
Outdated
| def generate_coefficients(self): | ||
|
|
||
| A = 10**(self.G/40.0) | ||
| w0 = 2.0 * np.pi * (self.fc / self.rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numpy code should be replaced with torch wherever possible
|
Another thing - when you create a checkpoint for these PRs, could you upload those to huggingface to make it simple to test? |
|
I'll review your changes in the next day or so! |
| return q_embed.to(org_dtype), k_embed.to(org_dtype) | ||
| return q_embed.to(org_dtype), k_embed.to(org_dtype), sin, cos | ||
|
|
||
| class LlamaRoPE(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this out of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove it or put it into another specific file?
| mlp_activation = "silu" | ||
| qkv_bias: bool = False | ||
| rope_type: str = "llama3" | ||
| rope_scaling: dict = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are used in the llama3 rope calculations:
https://github.com/huggingface/transformers/blob/a43b36cf802f00616800e0bd4d748679236123ee/src/transformers/modeling_rope_utils.py#L532
|
Encountered an error trying to run: Workflow: |
|
Based on our conversation on slack, the tokens need to go through the autoregressive sampler before becoming useful. Because the output completely change form, there should be a different type outputted from the sampler than the input, otherwise users would be able to make the same mistake very easily and plug things in where they don't belong. Not sure if 'ENCODED_TOKENS' would be the best name for it, but something like that. If there is another way to do this, do let me know and we can review. |

Uh oh!
There was an error while loading. Please reload this page.