Skip to content

Commit 6d96d26

Browse files
authored
Merge branch 'comfyanonymous:master' into offloader-maifee
2 parents e07a32c + 6b573ae commit 6d96d26

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+3570
-721
lines changed
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<!-- API_NODE_PR_CHECKLIST: do not remove -->
2+
3+
## API Node PR Checklist
4+
5+
### Scope
6+
- [ ] **Is API Node Change**
7+
8+
### Pricing & Billing
9+
- [ ] **Need pricing update**
10+
- [ ] **No pricing update**
11+
12+
If **Need pricing update**:
13+
- [ ] Metronome rate cards updated
14+
- [ ] Auto‑billing tests updated and passing
15+
16+
### QA
17+
- [ ] **QA done**
18+
- [ ] **QA not required**
19+
20+
### Comms
21+
- [ ] Informed **Kosinkadink**
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
name: Append API Node PR template
2+
3+
on:
4+
pull_request_target:
5+
types: [opened, reopened, synchronize, ready_for_review]
6+
paths:
7+
- 'comfy_api_nodes/**' # only run if these files changed
8+
9+
permissions:
10+
contents: read
11+
pull-requests: write
12+
13+
jobs:
14+
inject:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- name: Ensure template exists and append to PR body
18+
uses: actions/github-script@v7
19+
with:
20+
script: |
21+
const { owner, repo } = context.repo;
22+
const number = context.payload.pull_request.number;
23+
const templatePath = '.github/PULL_REQUEST_TEMPLATE/api-node.md';
24+
const marker = '<!-- API_NODE_PR_CHECKLIST: do not remove -->';
25+
26+
const { data: pr } = await github.rest.pulls.get({ owner, repo, pull_number: number });
27+
28+
let templateText;
29+
try {
30+
const res = await github.rest.repos.getContent({
31+
owner,
32+
repo,
33+
path: templatePath,
34+
ref: pr.base.ref
35+
});
36+
const buf = Buffer.from(res.data.content, res.data.encoding || 'base64');
37+
templateText = buf.toString('utf8');
38+
} catch (e) {
39+
core.setFailed(`Required PR template not found at "${templatePath}" on ${pr.base.ref}. Please add it to the repo.`);
40+
return;
41+
}
42+
43+
// Enforce the presence of the marker inside the template (for idempotence)
44+
if (!templateText.includes(marker)) {
45+
core.setFailed(`Template at "${templatePath}" does not contain the required marker:\n${marker}\nAdd it so we can detect duplicates safely.`);
46+
return;
47+
}
48+
49+
// If the PR already contains the marker, do not append again.
50+
const body = pr.body || '';
51+
if (body.includes(marker)) {
52+
core.info('Template already present in PR body; nothing to inject.');
53+
return;
54+
}
55+
56+
const newBody = (body ? body + '\n\n' : '') + templateText + '\n';
57+
await github.rest.pulls.update({ owner, repo, pull_number: number, body: newBody });
58+
core.notice('API Node template appended to PR description.');

.github/workflows/release-stable-all.yml

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
contents: "write"
1515
packages: "write"
1616
pull-requests: "read"
17-
name: "Release NVIDIA Default (cu129)"
17+
name: "Release NVIDIA Default (cu130)"
1818
uses: ./.github/workflows/stable-release.yml
1919
with:
2020
git_tag: ${{ inputs.git_tag }}
@@ -43,6 +43,23 @@ jobs:
4343
test_release: true
4444
secrets: inherit
4545

46+
release_nvidia_cu126:
47+
permissions:
48+
contents: "write"
49+
packages: "write"
50+
pull-requests: "read"
51+
name: "Release NVIDIA cu126"
52+
uses: ./.github/workflows/stable-release.yml
53+
with:
54+
git_tag: ${{ inputs.git_tag }}
55+
cache_tag: "cu126"
56+
python_minor: "12"
57+
python_patch: "10"
58+
rel_name: "nvidia"
59+
rel_extra_name: "_cu126"
60+
test_release: true
61+
secrets: inherit
62+
4663
release_amd_rocm:
4764
permissions:
4865
contents: "write"

.github/workflows/test-ci.yml

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,15 @@ jobs:
2121
fail-fast: false
2222
matrix:
2323
# os: [macos, linux, windows]
24-
os: [macos, linux]
25-
python_version: ["3.9", "3.10", "3.11", "3.12"]
24+
# os: [macos, linux]
25+
os: [linux]
26+
python_version: ["3.10", "3.11", "3.12"]
2627
cuda_version: ["12.1"]
2728
torch_version: ["stable"]
2829
include:
29-
- os: macos
30-
runner_label: [self-hosted, macOS]
31-
flags: "--use-pytorch-cross-attention"
30+
# - os: macos
31+
# runner_label: [self-hosted, macOS]
32+
# flags: "--use-pytorch-cross-attention"
3233
- os: linux
3334
runner_label: [self-hosted, Linux]
3435
flags: ""
@@ -73,14 +74,15 @@ jobs:
7374
strategy:
7475
fail-fast: false
7576
matrix:
76-
os: [macos, linux]
77+
# os: [macos, linux]
78+
os: [linux]
7779
python_version: ["3.11"]
7880
cuda_version: ["12.1"]
7981
torch_version: ["nightly"]
8082
include:
81-
- os: macos
82-
runner_label: [self-hosted, macOS]
83-
flags: "--use-pytorch-cross-attention"
83+
# - os: macos
84+
# runner_label: [self-hosted, macOS]
85+
# flags: "--use-pytorch-cross-attention"
8486
- os: linux
8587
runner_label: [self-hosted, Linux]
8688
flags: ""

QUANTIZATION.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# The Comfy guide to Quantization
2+
3+
4+
## How does quantization work?
5+
6+
Quantization aims to map a high-precision value x_f to a lower precision format with minimal loss in accuracy. These smaller formats then serve to reduce the models memory footprint and increase throughput by using specialized hardware.
7+
8+
When simply converting a value from FP16 to FP8 using the round-nearest method we might hit two issues:
9+
- The dynamic range of FP16 (-65,504, 65,504) far exceeds FP8 formats like E4M3 (-448, 448) or E5M2 (-57,344, 57,344), potentially resulting in clipped values
10+
- The original values are concentrated in a small range (e.g. -1,1) leaving many FP8-bits "unused"
11+
12+
By using a scaling factor, we aim to map these values into the quantized-dtype range, making use of the full spectrum. One of the easiest approaches, and common, is using per-tensor absolute-maximum scaling.
13+
14+
```
15+
absmax = max(abs(tensor))
16+
scale = amax / max_dynamic_range_low_precision
17+
18+
# Quantization
19+
tensor_q = (tensor / scale).to(low_precision_dtype)
20+
21+
# De-Quantization
22+
tensor_dq = tensor_q.to(fp16) * scale
23+
24+
tensor_dq ~ tensor
25+
```
26+
27+
Given that additional information (scaling factor) is needed to "interpret" the quantized values, we describe those as derived datatypes.
28+
29+
30+
## Quantization in Comfy
31+
32+
```
33+
QuantizedTensor (torch.Tensor subclass)
34+
↓ __torch_dispatch__
35+
Two-Level Registry (generic + layout handlers)
36+
37+
MixedPrecisionOps + Metadata Detection
38+
```
39+
40+
### Representation
41+
42+
To represent these derived datatypes, ComfyUI uses a subclass of torch.Tensor to implements these using the `QuantizedTensor` class found in `comfy/quant_ops.py`
43+
44+
A `Layout` class defines how a specific quantization format behaves:
45+
- Required parameters
46+
- Quantize method
47+
- De-Quantize method
48+
49+
```python
50+
from comfy.quant_ops import QuantizedLayout
51+
52+
class MyLayout(QuantizedLayout):
53+
@classmethod
54+
def quantize(cls, tensor, **kwargs):
55+
# Convert to quantized format
56+
qdata = ...
57+
params = {'scale': ..., 'orig_dtype': tensor.dtype}
58+
return qdata, params
59+
60+
@staticmethod
61+
def dequantize(qdata, scale, orig_dtype, **kwargs):
62+
return qdata.to(orig_dtype) * scale
63+
```
64+
65+
To then run operations using these QuantizedTensors we use two registry systems to define supported operations.
66+
The first is a **generic registry** that handles operations common to all quantized formats (e.g., `.to()`, `.clone()`, `.reshape()`).
67+
68+
The second registry is layout-specific and allows to implement fast-paths like nn.Linear.
69+
```python
70+
from comfy.quant_ops import register_layout_op
71+
72+
@register_layout_op(torch.ops.aten.linear.default, MyLayout)
73+
def my_linear(func, args, kwargs):
74+
# Extract tensors, call optimized kernel
75+
...
76+
```
77+
When `torch.nn.functional.linear()` is called with QuantizedTensor arguments, `__torch_dispatch__` automatically routes to the registered implementation.
78+
For any unsupported operation, QuantizedTensor will fallback to call `dequantize` and dispatch using the high-precision implementation.
79+
80+
81+
### Mixed Precision
82+
83+
The `MixedPrecisionOps` class (lines 542-648 in `comfy/ops.py`) enables per-layer quantization decisions, allowing different layers in a model to use different precisions. This is activated when a model config contains a `layer_quant_config` dictionary that specifies which layers should be quantized and how.
84+
85+
**Architecture:**
86+
87+
```python
88+
class MixedPrecisionOps(disable_weight_init):
89+
_layer_quant_config = {} # Maps layer names to quantization configs
90+
_compute_dtype = torch.bfloat16 # Default compute / dequantize precision
91+
```
92+
93+
**Key mechanism:**
94+
95+
The custom `Linear._load_from_state_dict()` method inspects each layer during model loading:
96+
- If the layer name is **not** in `_layer_quant_config`: load weight as regular tensor in `_compute_dtype`
97+
- If the layer name **is** in `_layer_quant_config`:
98+
- Load weight as `QuantizedTensor` with the specified layout (e.g., `TensorCoreFP8Layout`)
99+
- Load associated quantization parameters (scales, block_size, etc.)
100+
101+
**Why it's needed:**
102+
103+
Not all layers tolerate quantization equally. Sensitive operations like final projections can be kept in higher precision, while compute-heavy matmuls are quantized. This provides most of the performance benefits while maintaining quality.
104+
105+
The system is selected in `pick_operations()` when `model_config.layer_quant_config` is present, making it the highest-priority operation mode.
106+
107+
108+
## Checkpoint Format
109+
110+
Quantized checkpoints are stored as standard safetensors files with quantized weight tensors and associated scaling parameters, plus a `_quantization_metadata` JSON entry describing the quantization scheme.
111+
112+
The quantized checkpoint will contain the same layers as the original checkpoint but:
113+
- The weights are stored as quantized values, sometimes using a different storage datatype. E.g. uint8 container for fp8.
114+
- For each quantized weight a number of additional scaling parameters are stored alongside depending on the recipe.
115+
- We store a metadata.json in the metadata of the final safetensor containing the `_quantization_metadata` describing which layers are quantized and what layout has been used.
116+
117+
### Scaling Parameters details
118+
We define 4 possible scaling parameters that should cover most recipes in the near-future:
119+
- **weight_scale**: quantization scalers for the weights
120+
- **weight_scale_2**: global scalers in the context of double scaling
121+
- **pre_quant_scale**: scalers used for smoothing salient weights
122+
- **input_scale**: quantization scalers for the activations
123+
124+
| Format | Storage dtype | weight_scale | weight_scale_2 | pre_quant_scale | input_scale |
125+
|--------|---------------|--------------|----------------|-----------------|-------------|
126+
| float8_e4m3fn | float32 | float32 (scalar) | - | - | float32 (scalar) |
127+
128+
You can find the defined formats in `comfy/quant_ops.py` (QUANT_ALGOS).
129+
130+
### Quantization Metadata
131+
132+
The metadata stored alongside the checkpoint contains:
133+
- **format_version**: String to define a version of the standard
134+
- **layers**: A dictionary mapping layer names to their quantization format. The format string maps to the definitions found in `QUANT_ALGOS`.
135+
136+
Example:
137+
```json
138+
{
139+
"_quantization_metadata": {
140+
"format_version": "1.0",
141+
"layers": {
142+
"model.layers.0.mlp.up_proj": "float8_e4m3fn",
143+
"model.layers.0.mlp.down_proj": "float8_e4m3fn",
144+
"model.layers.1.mlp.up_proj": "float8_e4m3fn"
145+
}
146+
}
147+
}
148+
```
149+
150+
151+
## Creating Quantized Checkpoints
152+
153+
To create compatible checkpoints, use any quantization tool provided the output follows the checkpoint format described above and uses a layout defined in `QUANT_ALGOS`.
154+
155+
### Weight Quantization
156+
157+
Weight quantization is straightforward - compute the scaling factor directly from the weight tensor using the absolute maximum method described earlier. Each layer's weights are quantized independently and stored with their corresponding `weight_scale` parameter.
158+
159+
### Calibration (for Activation Quantization)
160+
161+
Activation quantization (e.g., for FP8 Tensor Core operations) requires `input_scale` parameters that cannot be determined from static weights alone. Since activation values depend on actual inputs, we use **post-training calibration (PTQ)**:
162+
163+
1. **Collect statistics**: Run inference on N representative samples
164+
2. **Track activations**: Record the absolute maximum (`amax`) of inputs to each quantized layer
165+
3. **Compute scales**: Derive `input_scale` from collected statistics
166+
4. **Store in checkpoint**: Save `input_scale` parameters alongside weights
167+
168+
The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.

README.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ There is a portable standalone build for Windows that should work for running on
173173

174174
### [Direct link to download](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z)
175175

176-
Simply download, extract with [7-Zip](https://7-zip.org) and run. Make sure you put your Stable Diffusion checkpoints/models (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints
176+
Simply download, extract with [7-Zip](https://7-zip.org) or with the windows explorer on recent windows versions and run. For smaller models you normally only need to put the checkpoints (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints but many of the larger models have multiple files. Make sure to follow the instructions to know which subfolder to put them in ComfyUI\models\
177177

178178
If you have trouble extracting it, right click the file -> properties -> unblock
179179

@@ -183,7 +183,9 @@ Update your Nvidia drivers if it doesn't start.
183183

184184
[Experimental portable for AMD GPUs](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_amd.7z)
185185

186-
[Portable with pytorch cuda 12.8 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu128.7z) (Supports Nvidia 10 series and older GPUs).
186+
[Portable with pytorch cuda 12.8 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu128.7z).
187+
188+
[Portable with pytorch cuda 12.6 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu126.7z) (Supports Nvidia 10 series and older GPUs).
187189

188190
#### How do I share models between another UI and ComfyUI?
189191

@@ -200,7 +202,7 @@ comfy install
200202

201203
## Manual Install (Windows, Linux)
202204

203-
Python 3.14 will work if you comment out the `kornia` dependency in the requirements.txt file (breaks the canny node) but it is not recommended.
205+
Python 3.14 works but you may encounter issues with the torch compile node. The free threaded variant is still missing some dependencies.
204206

205207
Python 3.13 is very well supported. If you have trouble with some custom node dependencies on 3.13 you can try 3.12
206208

@@ -221,7 +223,7 @@ AMD users can install rocm and pytorch with pip if you don't have it already ins
221223

222224
This is the command to install the nightly with ROCm 7.0 which might have some performance improvements:
223225

224-
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.0```
226+
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.1```
225227

226228

227229
### AMD GPUs (Experimental: Windows and Linux), RDNA 3, 3.5 and 4 only.
@@ -242,7 +244,7 @@ RDNA 4 (RX 9000 series):
242244

243245
### Intel GPUs (Windows and Linux)
244246

245-
(Option 1) Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
247+
Intel Arc GPU users can install native PyTorch with torch.xpu support using pip. More information can be found [here](https://pytorch.org/docs/main/notes/get_start_xpu.html)
246248

247249
1. To install PyTorch xpu, use the following command:
248250

@@ -252,10 +254,6 @@ This is the command to install the Pytorch xpu nightly which might have some per
252254

253255
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu```
254256

255-
(Option 2) Alternatively, Intel GPUs supported by Intel Extension for PyTorch (IPEX) can leverage IPEX for improved performance.
256-
257-
1. visit [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) for more information.
258-
259257
### NVIDIA
260258

261259
Nvidia users should install stable pytorch using this command:

0 commit comments

Comments
 (0)