Skip to content

Commit 558dbd5

Browse files
committed
push obscure models out of main model doc
1 parent 93696dc commit 558dbd5

File tree

3 files changed

+287
-232
lines changed

3 files changed

+287
-232
lines changed

docs/Model Support.md

Lines changed: 27 additions & 163 deletions
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,25 @@
1313
[Chroma](#chroma) | MMDiT | 2025 | Lodestone Rock | 8.9B | No | Recent, Decent Quality |
1414
[Chroma Radiance](#chroma-radiance) | Pixel MMDiT | 2025 | Lodestone Rock | 8.9B | No | Recent, Bad Quality (WIP) |
1515
[Lumina 2.0](#lumina-2) | NextDiT | 2025 | Alpha-VLLM | 2.6B | Partial | Modern, Passable Quality |
16-
[OmniGen 2](#omnigen-2) | MLLM | 2025 | VectorSpaceLab | 7B | No | Modern, Decent Quality |
1716
[Qwen Image](#qwen-image) | MMDiT | 2025 | Alibaba-Qwen | 20B | Minimal | Modern, Great Quality, very memory intense |
1817
[Hunyuan Image 2.1](#hunyuan-image-21) | MMDiT | 2025 | Tencent | 17B | No | Modern, Great Quality, very memory intense |
1918
[Z-Image](#z-image) | S3-DiT | 2025 | Tongyi MAI (Alibaba) | 6B | No | Modern, Great Quality, lightweight |
2019
[Kandinsky 5](#kandinsky-5) | DiT | 2025 | Kandinsky Lab | 6B | No | Modern, Decent Quality |
2120

22-
Old or bad options also tracked listed:
21+
Old or bad options also tracked listed via [Obscure Model Support](/docs/Obscure%20Model%20Support.md):
2322

2423
| Model | Architecture | Year | Author | Scale | Censored? | Quality/Status |
2524
| ---- | ---- | ---- | ---- | ---- | ---- | ---- |
26-
[Stable Diffusion v1 and v2](#stable-diffusion-v1-and-v2) | unet | 2022 | Stability AI | 1B | No | Outdated |
27-
[Stable Diffusion v1 Inpainting Models](#stable-diffusion-v1-inpainting-models) | unet | 2022 | RunwayML | 1B | No | Outdated |
28-
[Segmind SSD 1B](#segmind-ssd-1b) | unet | 2023 | Segmind | 1B | Partial | Outdated |
29-
[Stable Cascade](#stable-cascade) | unet cascade | 2024 | Stability AI | 5B | Partial | Outdated |
30-
[PixArt Sigma](#pixart-sigma) | DiT | 2024 | PixArt | 1B | ? | Outdated |
31-
[Nvidia Sana](#nvidia-sana) | DiT | 2024 | NVIDIA | 1.6B | No | Just Bad |
32-
[Nvidia Cosmos Predict2](#cosmos-predict2) | DiT | 2025 | NVIDIA | 2B/14B | Partial | Just Bad |
33-
[HiDream i1](#hidream-i1) | MMDiT | 2025 | HiDream AI (Vivago) | 17B | Minimal | Good Quality, lost community attention |
34-
[Ovis](#ovis) | MMDiT | 2025 | AIDC-AI (Alibaba) | 7B | No | Passable quality, but outclassed on launch |
25+
[Stable Diffusion v1 and v2](/docs/Obscure%20Model%20Support.md#stable-diffusion-v1-and-v2) | unet | 2022 | Stability AI | 1B | No | Outdated |
26+
[Stable Diffusion v1 Inpainting Models](/docs/Obscure%20Model%20Support.md#stable-diffusion-v1-inpainting-models) | unet | 2022 | RunwayML | 1B | No | Outdated |
27+
[Segmind SSD 1B](/docs/Obscure%20Model%20Support.md#segmind-ssd-1b) | unet | 2023 | Segmind | 1B | Partial | Outdated |
28+
[Stable Cascade](/docs/Obscure%20Model%20Support.md#stable-cascade) | unet cascade | 2024 | Stability AI | 5B | Partial | Outdated |
29+
[PixArt Sigma](/docs/Obscure%20Model%20Support.md#pixart-sigma) | DiT | 2024 | PixArt | 1B | ? | Outdated |
30+
[Nvidia Sana](/docs/Obscure%20Model%20Support.md#nvidia-sana) | DiT | 2024 | NVIDIA | 1.6B | No | Just Bad |
31+
[Nvidia Cosmos Predict2](/docs/Obscure%20Model%20Support.md#cosmos-predict2) | DiT | 2025 | NVIDIA | 2B/14B | Partial | Just Bad |
32+
[HiDream i1](/docs/Obscure%20Model%20Support.md#hidream-i1) | MMDiT | 2025 | HiDream AI (Vivago) | 17B | Minimal | Good Quality, lost community attention |
33+
[OmniGen 2](/docs/Obscure%20Model%20Support.md#omnigen-2) | MLLM | 2025 | VectorSpaceLab | 7B | No | Modern, Decent Quality, quickly outclassed |
34+
[Ovis](/docs/Obscure%20Model%20Support.md#ovis) | MMDiT | 2025 | AIDC-AI (Alibaba) | 7B | No | Passable quality, but outclassed on launch |
3535

3636
- **Architecture** is the fundamental machine learning structure used for the model, UNet's were used in the past but DiT (Diffusion Transformers) are the modern choice
3737
- **Scale** is how big the model is - "B" for "Billion", so for example "2B" means "Two billion parameters".
@@ -85,40 +85,22 @@ Image model(s) most worth using, as of April 2025:
8585
- You could make a point that maybe I should have set CFG different or used a sigma value or changed up prompt phrasing or etc. and get better quality - this test intentionally uses very bland parameters to maximize identical comparison. Keep in mind that you can get better results out of a model by fiddling parameters.
8686
- You'll note models started being able to do decently well on this test in late 2024. Older models noticeable fail at the basic requirements of this test.
8787

88-
89-
## Stable Diffusion v1 and v2
90-
91-
![img](/docs/images/models/sd15.jpg)
92-
*(Above image is SDv1.5)*
93-
94-
SDv1/SDv2 models work exactly as normal. Even legacy (pre-[ModelSpec](https://github.com/Stability-AI/ModelSpec) models are supported).
95-
96-
### Stable Diffusion v1 Inpainting Models
97-
98-
SDv1 inpaint models (RunwayML) are supported, but will work best if you manually edit the Architecture ID to be `stable-diffusion-v1/inpaint`.
99-
100-
Under `Init Image` param group, checkmark `Use Inpainting Encode`.
101-
102-
## Stable Diffusion XL
88+
# Stable Diffusion XL
10389

10490
![img](/docs/images/models/sdxl.jpg)
10591

10692
SDXL models work as normal, with the bonus that by default enhanced inference settings will be used (eg scaled up rescond).
10793

10894
Additional, SDXL-Refiner architecture models can be inferenced, both as refiner or even as a base (you must manually set res to 512x512 and it will generate weird results).
10995

110-
## SD1 and SDXL Turbo Variants
96+
# SD1 and SDXL Turbo Variants
11197

11298
Turbo, LCM (Latent Consistency Models), Lightning, etc. models work the same as regular models, just set `CFG Scale` to `1` and:
11399
- For Turbo, `Steps` to `1` Under the `Sampling` group set `Scheduler` to `Turbo`.
114100
- For LCM, `Steps` to `4`. Under the `Sampling` group set `Sampler` to `lcm`.
115101
- For lightning, (?)
116102

117-
## SegMind SSD-1B
118-
119-
SegMind SSD-1B models work the same as SD models.
120-
121-
## Stable Diffusion 3
103+
# Stable Diffusion 3
122104

123105
![img](/docs/images/models/sd3m.jpg)
124106

@@ -152,54 +134,6 @@ For upscaling with SD3, the `Refiner Do Tiling` parameter is highly recommended
152134
- SD3.5 Medium <https://huggingface.co/city96/stable-diffusion-3.5-medium-gguf/tree/main>
153135
- SD 3.5 Medium support resolutions from 512x512 to 1440x1440, and the model metadata of the official model recommends 1440x1440. However, the official model is not good at this resolution. You will want to click the `` hamburger menu on a model, then `Edit Metadata`, then change the resolution to `1024x1024` for better results. You can of course set the `Aspect Ratio` parameter to `Custom` and the edit resolutions on the fly per-image.
154136

155-
## Stable Cascade
156-
157-
![img](/docs/images/models/cascade.jpg)
158-
159-
Stable Cascade is supported if you use the "ComfyUI Format" models (aka "All In One") https://huggingface.co/stabilityai/stable-cascade/tree/main/comfyui_checkpoints that come as a pair of `stage_b` and `stage_c` models.
160-
161-
You must keep the two in the same folder, named the same with the only difference being `stage_b` vs `stage_c` in the filename.
162-
163-
Either model can be selected in the UI to use them, it will automatically use both.
164-
165-
# PixArt Sigma
166-
167-
![img](/docs/images/models/pixart-sigma-xl-2.jpg)
168-
*(above image is PixArt Sigma XL 2 1024 MS)*
169-
170-
The [PixArt Sigma MS models](https://huggingface.co/PixArt-alpha/PixArt-Sigma/tree/main) are supported in Swarm with a few setup steps.
171-
172-
These steps are not friendly to beginners (if PixArt gains popularity, likely more direct/automated/native support will be added), but advanced users can follow:
173-
174-
- After downloading the model, run Swarm's **Utilities** -> **Pickle To Safetensors** -> `Convert Models`. You need a safetensors models for Swarm to accurately identify model type.
175-
- Or download a preconverted copy, like this one: https://huggingface.co/HDiffusion/Pixart-Sigma-Safetensors
176-
- After you have a safetensors model, find it in the Models tab and click the menu button on the model and select "`Edit Metadata`"
177-
- From the `Architecture` dropdown, select `PixArtMS Sigma XL 2` for 1024 or lower models, or `XL 2 (2K)` for the 2k
178-
- In the `Standard Resolution` box, enter `1024x1024` for 1024 or `512x512` for the 512, or `2048x2048` for the 2k
179-
- The first time you run a PixArt model, it will prompt you to install [Extra Models by City96](https://github.com/city96/ComfyUI_ExtraModels). You must accept this for PixArt models to work.
180-
- Make sure in **User Settings**, you have a `DefaultSDXLVae` selected. If not, Swarm will autodownload a valid SDXL VAE.
181-
- Swarm will autodownload T5XXL-EncoderOnly for you on first run (same as SD3-Medium T5-Only mode)
182-
- You can now use the model as easily as any other model. Some feature compatibility features might arise.
183-
184-
# NVIDIA Sana
185-
186-
![img](/docs/images/models/sana-1600m.jpg)
187-
*(above image is Nvidia Sana 1600M 1024)*
188-
189-
The [Nvidia Sana models](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px) are supported in Swarm with a few setup steps.
190-
191-
These steps are not friendly to beginners (if Sana gains popularity, likely more direct/automated/native support will be added), but advanced users can follow:
192-
193-
- Recommended: use the [preconverted Sana model](https://huggingface.co/mcmonkey/sana-models/blob/main/Sana_1600M_1024px.safetensors)
194-
- Otherwise, if you use the original 'pth' version, after downloading the model, run Swarm's **Utilities** -> **Pickle To Safetensors** -> `Convert Models`. You need a safetensors models for Swarm to accurately identify model type.
195-
- The first time you run a Sana model, it will prompt you to install [Extra Models by City96](https://github.com/city96/ComfyUI_ExtraModels). You must accept this for Sana models to work.
196-
- You may need to manually install pip packages: `python -s -m pip install -U transformers`, possibly also `bitsandbytes`
197-
- Swarm will autodownload the Sana DCAE VAE for you on the first run.
198-
- The text encoder, Gemma 2B, will also be autodownloaded (in this case by the backing comfy nodes)
199-
- You can now use the model as easily as any other model. Some feature compatibility features might arise.
200-
- Only Sana 1600M 1024 has been validated currently
201-
- use a CFG around 4
202-
203137
# AuraFlow
204138

205139
![img](/docs/images/models/auraflow-02.jpg)
@@ -416,74 +350,6 @@ These steps are not friendly to beginners (if Sana gains popularity, likely more
416350
- **Renorm CFG:** Lumina 2 reference code sets a new advanced parameter `Renorm CFG` to 1. This is available in Swarm under `Advanced Sampling`.
417351
- The practical difference is subjective and hard to predict, but enabling it seems to tend towards more fine detail
418352

419-
# HiDream-i1
420-
421-
![img](/docs/images/models/hidream-i1-dev.jpg)
422-
*(Generated with HiDream-i1 Dev, CFG=1, Steps=20, SigmaShift=3)*
423-
424-
- HiDream-i1 Models are supported in SwarmUI.
425-
- You can pick Full, Dev, or Fast variant. Most users should prefer Dev or Fast.
426-
- **Full:** Uses standard CFG and step counts, no distillation or other tricks. Slowest option, theoretically smartest model (in practice visual quality is poor, but prompt understanding is strong)
427-
- **Dev:** Uses CFG=1 distillation but standard step counts, akin to Flux-Dev. Best middle ground option.
428-
- **Fast:** Uses CFG=1 and low step count distillation, akin to Flux-Schnell. Best for speed focus, at cost of quality.
429-
- The models are 17B, which is massive, so you'll likely prefer a quantized version.
430-
- Dev model gguf quant: <https://huggingface.co/city96/HiDream-I1-Dev-gguf/tree/main>
431-
- Full model gguf quant: <https://huggingface.co/city96/HiDream-I1-Full-gguf/tree/main>
432-
- `Q6_K` is best accuracy on high VRAM, but `Q4_K_S` cuts VRAM requirements while still being very close to original quality, other variants shouldn't be used normally
433-
- Comfy Org's fp8 and fat bf16 versions: <https://huggingface.co/Comfy-Org/HiDream-I1_ComfyUI/tree/main/split_files/diffusion_models>
434-
- Goes in `(Swarm)/Models/diffusion_models`
435-
- All models share the same architecture identifiers. Make sure to configure parameters appropriately for the specific variant you're using (CFG and Steps).
436-
- There's also "Edit", a version that does ip2p style editing (give an init image, set creativity to 1, and prompt it with a change request, eg "draw a mustache on her")
437-
- BF16 raw fat file here <https://huggingface.co/Comfy-Org/HiDream-I1_ComfyUI/blob/main/split_files/diffusion_models/hidream_e1_full_bf16.safetensors>
438-
- This model class cannot be automatically detected, and so you must manually click the `` hamburger menu on a model, then `Edit Metadata`, and set the `Architecture:` field to `HiDream i1 Edit`, otherwise it will not use the input image properly
439-
- Also set `Resolution:` to `768x768`, the Edit model misbehaves at high res
440-
- HiDream uses the Flux VAE, it will be autodownloaded for you if not already present
441-
- HiDream uses a quad-textencoder of Long-CLIP L, Long-CLIP G, T5-XXL, and LLaMA-3.1-8B (this is unhinged I'm so sorry for your RAM size)
442-
- These will be autodownloaded for you if not already present
443-
- LoRAs cross-apply between the three variants, but best alignment between dev/fast, full tends to be more different
444-
- Parameters:
445-
- **CFG Scale:** HiDream Full uses standard standard CFG ranges (eg 6), HiDream Dev and Fast use CFG=1
446-
- **Steps:** HiDream Dev uses standard step counts (eg 20), HiDream Fast can use low counts (eg 8). HiDream Full requires higher than normal step counts (at least 30, maybe 50) for clean results.
447-
- Official recommendation from HiDream team is: Full=50, Dev=28, Fast=16.
448-
- **Sampler and Scheduler:** Standard samplers/schedulers work. Defaults to `Euler` and `Normal`
449-
- The dev model is more open to weirder samplers like `LCM` and official recommendation for Full is UniPC, but these are not needed
450-
- **Sigma Shift:** Sigma shift defaults to 3 and does not need to be modified.
451-
- Officially, HiDream Full and Fast recommend Shift of 3, but for Dev they recommend 6. That 6 on dev seems to look worse though, so I don't recommend it.
452-
453-
# Cosmos Predict2
454-
455-
![img](/docs/images/models/cosmos-predict2-14b.jpg)
456-
457-
*(Nvidia Cosmos Predict2 14B Text2Image)*
458-
459-
- Nvidia Cosmos Predict2 Text2Image models are natively supported in SwarmUI.
460-
- Do not recommend, generally just worse than other contemporary models.
461-
- There is a 2B and a 14B variant.
462-
- 2B: <https://huggingface.co/Comfy-Org/Cosmos_Predict2_repackaged/blob/main/cosmos_predict2_2B_t2i.safetensors>
463-
- 14B: <https://huggingface.co/Comfy-Org/Cosmos_Predict2_repackaged/blob/main/cosmos_predict2_14B_t2i.safetensors>
464-
- 14B GGUFs here <https://huggingface.co/city96/Cosmos-Predict2-14B-Text2Image-gguf/tree/main>
465-
- **Resolution:** ? 1024-ish.
466-
- **CFG and Steps:** Default recommends CFG=4 and Steps=35
467-
- **Performance:** Oddly slower than similar sized models by a fair margin. It does not make up for this in quality.
468-
- The text encoder is old T5-XXL v1, not the same T5-XXL used by other models.
469-
- It will be automatically downloaded.
470-
- The VAE is the Wan VAE, and will be automatically downloaded.
471-
472-
# OmniGen 2
473-
474-
- [OmniGen 2](https://github.com/VectorSpaceLab/OmniGen2) is natively partially supported in SwarmUI.
475-
- It is technically an LLM, and the LLM features are not supported, only the direct raw image features.
476-
- Download the model here <https://huggingface.co/Comfy-Org/Omnigen2_ComfyUI_repackaged/blob/main/split_files/diffusion_models/omnigen2_fp16.safetensors>
477-
- Save it to `diffusion_models`
478-
- The text encoder is Qwen 2.5 VL 3B (LLM), and will be automatically downloaded.
479-
- The VAE is the Flux VAE, and will be automatically downloaded.
480-
- Add images to the prompt box to use them as input images for the model. If no input images are given, but you have an Init Image, that will be used as the input image.
481-
- **CFG:** Usual CFG rules, around 5 to 7 is a good baseline
482-
- The reference workflows for comfy used dual-CFG guidance, IP2P style. If you want to do this, you can use advanced param `IP2P CFG 2` to control the secondary CFG, defaults to 2, and set regular CFG to around 5.
483-
- **Steps:** normal ~20
484-
- **Resolution:** Normal 1024x1024-ish.
485-
- **Performance:** Pretty terribly slow. Incompatible with fp8, incompatible with sage attention.
486-
- **Prompts:** their demo page has some prompt tips and examples <https://huggingface.co/spaces/OmniGen2/OmniGen2>
487353

488354
# Qwen Image
489355

@@ -625,21 +491,6 @@ These steps are not friendly to beginners (if Sana gains popularity, likely more
625491
- Despite being a Union controlnet, the Union Type parameter is not used.
626492
- Because it is "Model Patch" based, the Start and End parameters also do not work.
627493

628-
# Ovis
629-
630-
- [Ovis](https://huggingface.co/AIDC-AI/Ovis-Image-7B) is supported in SwarmUI.
631-
- It is a 7B-scale MMDiT image model from Alibaba's AIDC-AI, with image quality roughly a bit above base SDXL and a focus on strong text understanding.
632-
- Download the model from [Comfy-Org/Ovis-Image](<https://huggingface.co/Comfy-Org/Ovis-Image/blob/main/split_files/diffusion_models/ovis_image_bf16.safetensors>)
633-
- Save in `diffusion_models`
634-
- Uses the Flux.1 VAE
635-
- **Parameters:**
636-
- **Prompt:** Supports general prompting in any format just fine. Speaks English and Chinese.
637-
- **Sampler:** Default is fine (`Euler`)
638-
- **Scheduler:** Default works, but `Beta` may be better
639-
- **CFG Scale:** Normal CFG ranges, `5` is the official recommendation
640-
- **Steps:** Normal step counts (eg `20`), but they recommend `50`
641-
- **Resolution:** Side length `1024`. Quickly breaks above that.
642-
643494
# Kandinsky 5
644495

645496
- Kandinsky 5 Image Lite is supported in SwarmUI
@@ -709,3 +560,16 @@ These steps are not friendly to beginners (if Sana gains popularity, likely more
709560
- You can generate TensorRT engines from the model menu. This includes a button on-page to autoinstall TRT support your first time using it, and configuration of graph size limits and optimal scales. (TensorRT works fastest when you generate at the selected optimal resolution, and slightly less fast at any dynamic resolution outside the optimal setting.)
710561
- Note that TensorRT is not compatible with LoRAs, ControlNets, etc.
711562
- Note that you need to make a fresh TRT engine for any different model you want to use.
563+
564+
# Obscure Model Redirection
565+
566+
### Stable Diffusion v1 and v2
567+
### SegMind SSD-1B
568+
### Stable Cascade
569+
### PixArt Sigma
570+
### NVIDIA Sana
571+
### HiDream-i1
572+
### Cosmos Predict2
573+
### OmniGen 2
574+
### Ovis
575+
These obscure/old/bad/unpopular/etc. models have been moved to [Obscure Model Support](/docs/Obscure%20Model%20Support.md)

0 commit comments

Comments
 (0)