# Change Log for SD.Next ## Update for 2026-03-23 ### Highlights for 2026-03-23 This release brings massive code refactoring to modernize codebase and removal of some obsolete features. Leaner & Faster! And since its a bit quieter period when it comes to new models, notable additions would be : *FireRed-Image-Edit* *SkyWorks-UniPic-3* and *Anima-Preview-2* If you're on Windows platform, we have a brand new [All-in-one Installer & Launcher](https://github.com/vladmandic/sdnext-launcher): simply download [exe or zip](https://github.com/vladmandic/sdnext-launcher/releases) and done! *What else*? Really a lot! New color grading module, updated localization with new languages and improved translations, new civitai integration module, new finetunes loader, several new upscalers, improvements to LLM/VLM in captioning and prompt enhance, a lot of new control preprocessors, new realtime server info panel, some new UI themes And major work on API hardening: security, rate limits, secrets handling, new endpoints, etc. But also many smaller quality-of-life improvements - for full details, see [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2026-03-23 - **Models** - [Google Flash 3.1 Image](https://ai.google.dev/gemini-api/docs/models/gemini-3-flash-preview) a.k.a. *Nano Banana 2* - [FireRed Image Edit](https://huggingface.co/FireRedTeam/FireRed-Image-Edit-1.0) *1.0 and 1.1* *Note*: FireRed is a fine-tune of Qwen-Image-Edit regardless of its claim as a new base-model - [Skyworks UniPic-3](https://huggingface.co/Skywork/Unipic3), *Consistency and DMD* variants to reference/community section *Note*: UniPic-3 is a fine-tune of Qwen-Image-Edit with new distillation regardless of its claim of major changes - [Anima Preview-v2](https://huggingface.co/circlestone-labs/Anima) - **Image manipulation** - new **color grading** module apply basic corrections to your images: brightness,contrast,saturation,shadows,highlights move to professional photo corrections: hue,gamma,sharpness,temperature correct tone: shadows,midtones,highlights add effects: vignette,grain apply professional lut-table using .cube file *hint* color grading is available as step during generate or as processing item for already existing images - update **latent corrections** *(former HDR Corrections)* expand allowed models - add support for [spandrel](https://github.com/chaiNNer-org/spandrel) **upscaling** engine with suport for new upscaling model families - add two new ai upscalers: *RealPLKSR NomosWebPhoto* and *RealPLKSR AnimeSharpV2* - add two new **interpolation** methods: *HQX* and *ICB* - use high-quality [sharpfin](https://github.com/drhead/Sharpfin) accelerated library when available (*cuda-only*) - **upscalers**: extend chainner support for additional models - **Captioning / Prompt Enhance** - new models: **Qwen-3.5**, **Mistral-3** in multiple variations - new models: multiple *heretic* and *abliterated* finetunes for **Qwen, Gemma, Mistral** - **captioning** and **prompt enhance**: add support for all cloud-based Gemini models *3.1/3.0/2.5 pro/flash/flash-lite* - improve captioning and prompt enhance memory handling/offloading - **Control** - new **pre-processors**: *anyline, depth_anything v2, dsine, lotus, marigold normals, oneformer, rtmlib pose, sam2, stablenormal, teed, vitpose* - **Features** - **secrets** handling: new `secrets.json` and special handling for tokens/keys/passwords used to be treated like any other `config.json` param which can cause security issues - pipelines: add **ZImageInpaint** - rewritten **civitai** module browse/discover mode with sort, period, type/base dropdowns; URL paste; subfolder sorting; auto-browse; dynamic dropdowns - **hires**: allow using different lora in refiner prompt - **nunchaku** models are now listed in networks tab as reference models instead of being used implicitly via quantization - improve image **metadata** parser for foreign metadata (e.g. XMP) - **Compute** - **ROCm** support for additional AMD GPUs: `gfx103X`, thanks @crashingalexsan - **Cuda** `torch==2.10` removed support for `rtx1000` series, use following before first startup: > `set TORCH_COMMAND='torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu126'` - **UI** - new panel: **server info** with detailed runtime informaton - **networks** add **UNet/DiT** - **localization** improved translation quality and new translations locales: *en, en1, en2, en3, en4, hr, es, it, fr, de, pt, ru, zh, ja, ko, hi, ar, bn, ur, id, vi, tr, sr, po, he, xx, yy, qq, tlh* yes, this now includes stuff like *latin, esperanto, arabic, hebrew, klingon* and a lot more! and also introduce some pseudo-locales such as: *techno-babbel*, *for-n00bs* *hint*: click on locale icon in bottom-left corner to cycle through available locales, or set default in *settings -> ui* - **server settings** new section in *settings* - **kanvas** add paste image from clipboard - **themes** add *CTD-NT64Light*, *CTD-NT64Medium* and *CTD-NT64Dark*, thanks @resonantsky - **themes** add *Vlad-Neomorph* - **gallery** add option to auto-refresh gallery, thanks @awsr - **token counters** add per-section display for supported models, thanks @awsr - **API** - **rate limiting**: global for all endpoints, guards against abuse and denial-of-service type of attacks configurable in *settings -> server settings* - new `/sdapi/v1/upload` endpoint with support for both POST with form-data or PUT using raw-bytes - new `/sdapi/v1/torch` endpoint for torch info (backend, version, etc.) - new `/sdapi/v1/gpu` endpoint for GPU info - new `/sdapi/v1/rembg` endpoint for background removal - new `/sdadpi/v1/unet` endpoint to list available unets/dits - use rate limiting for api logging - **Internal** - `python==3.13` full support - `python==3.14` initial support see [docs](https://vladmandic.github.io/sdnext-docs/Python/) for details - remove hard-dependnecies: `clip, numba, skimage, torchsde, omegaconf, antlr, patch-ng, patch-ng, astunparse, addict, inflection, jsonmerge, kornia`, `resize-right, voluptuous, yapf, sqlalchemy, invisible-watermark, pi-heif, ftfy, blendmodes, PyWavelets, imp` these are now installed on-demand when needed - bump `huggingface_hub==1.5.0` - bump `transformers==5.3.0` - refactor to/from *image/tensor* logic - refactor reorganize `cli` scripts - refactor move tests to dedicated `/test/` - refactor all image handling to `modules/image/` - refactor: many params that were server-global are now ui params that are handled per-request *schedulers, todo, tome, etc.* - refactor: error handling during `torch.compile` - refactor: move `rebmg` to core instead of extensions - remove face restoration - unified command line parsing - use explicit icon image references in `gallery`, thanks @awsr - launch use threads to async execute non-critical tasks - switch from deprecated `pkg_resources` to `importlib` - modernize typing and type annotations - improve `pydantic==2.x` compatibility - refactor entire logging into separate `modules/logger` - replace `timestamp` based startup checks with state caching - split monolithic `shared` module and introduce `ui_definitions` - modularize all imports and avoid re-imports - use `threading` for deferable operatios - use `threading` for io-independent parallel operations - remove requirements: `clip`, `open-clip` - add new build of `insightface`, thanks @hameerabbasi - reduce use of generators with ui interactor - better subprocess execute, thanks @awsr - better wslopen handling, thanks @awsr - **Obsolete** - remove `normalbae` pre-processor - remove `dwpose` pre-processor - remove `hdm` model support - remove `xadapter` script - remove `codeformer` and `gfpgan` face restorers - **Checks** - switch to `pyproject.toml` for tool configs - update `lint` rules, thanks @awsr - add `ty` to optional lint tooling - add `pyright` to optional lint tooling - **Fixes** - ui `gallery` cache recursive cleanup, thanks @awsr - ui main results pane sizing - ui connection monitor - handle `clip` installer doing unwanted `setuptools` update - cleanup for `uv` installer fallback - add `metadata` restore to always-on scripts - improve `wildcard` weights parsing, thanks @Tillerz - model detection for `anima` - handle `lora` unwanted unload - improve `preview` error handler - handle `gallery` over remote/unsecure connections - fix `ltx2-i2v` - handle missing `preview` image - kandinsky 5 t2i/i2i model type detection - kanvas notify core on image size change - command arg `--reinstall` stricter enforcement - handle `api` state reset - processing upscaler refresh button - simplify and validate `rembg` dependencies - improve video generation progress tracking - handle startup with bad `scripts` more gracefully - thread-safety for `error-limiter`, thanks @awsr - add `lora` support for flux2-klein ## Update for 2026-02-04 ### Highlights for 2026-02-04 Refresh release two weeks after prior release, yet we still somehow managed to pack in *~150 commits*! Highlights would be two new models: **Z-Image-Base** and **Anima**, *captioning* support for **tagger** models and a massive addition of new **schedulers** Also here are updates to `torch` and additional GPU archs support for `ROCm` backends, plus a lot of internal improvements and fixes. [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2026-02-04 - **Models** - [Tongyi-MAI Z-Image Base](https://tongyi-mai.github.io/Z-Image-blog/) yup, its finally here, the full base model of **Z-Image** - [CircleStone Anima](https://huggingface.co/circlestone-labs/Anima) 2B anime optimized model based on a modified Cosmos-Predict, using Qwen3-0.6B as a text encoder - **Features** - **caption** tab support for Booru tagger models, thanks @CalamitousFelicitousness - add SmilingWolf WD14/WaifuDiffusion tagger models, thanks @CalamitousFelicitousness - support comments in wildcard files, using `#` - support aliases in metadata skip params, thanks @CalamitousFelicitousness - ui gallery improve cache cleanup and add manual option, thanks @awsr - selectable options to add system info to metadata, thanks @Athari see *settings -> image metadata* - **Schedulers** - schedulers documentation has new home: - add 13(!) new scheduler families not a port, but more of inspired-by [res4lyf](https://github.com/ClownsharkBatwing/RES4LYF) library all schedulers should be compatible with both `epsilon` and `flow` prediction style! *note*: each family may have multiple actual schedulers, so the list total is 56(!) new schedulers - core family: *RES* - exponential: *DEIS, ETD, Lawson, ABNorsett* - integrators: *Runge-Kutta, Linear-RK, Specialized-RK, Lobatto, Radau-IIA, Gauss-Legendre* - flow: *PEC, Riemannian, Euclidean, Hyperbolic, Lorentzian, Langevin-Dynamics* - add 3 additional schedulers: *CogXDDIM, DDIMParallel, DDPMParallel* not originally intended to be a general purpose schedulers, but they work quite nicely and produce good results - image metadata: always log scheduler class used - **API** - add `/sdapi/v1/xyz-grid` to enumerate xyz-grid axis options and their choices see `/cli/api-xyzenum.py` for example usage - add `/sdapi/v1/sampler` to get current sampler config - modify `/sdapi/v1/samplers` to enumerate available samplers possible options see `/cli/api-samplers.py` for example usage - **Internal** - tagged release history: each major for the past year is now tagged for easier reference - **torch** update *note*: may cause slow first startup/generate **cuda**: update to `torch==2.10.0` **xpu**: update to `torch==2.10.0` **rocm**: update to `torch==2.10.0` **openvino**: update to `torch==2.10.0` and `openvino==2025.4.1` - rocm: expand available gfx archs, thanks @crashingalexsan - rocm: set `MIOPEN_FIND_MODE=2` by default, thanks @crashingalexsan - relocate all json data files to `data/` folder existing data files are auto-migrated on startup - refactor and improve connection monitor, thanks @awsr - further work on type consistency and type checking, thanks @awsr - log captured exceptions - improve temp folder handling and cleanup - remove torch errors/warings on fast server shutdown - add ui placeholders for future agent-scheduler work, thanks @ryanmeador - implement abort system on repeated errors, thanks @awsr currently used by lora and textual-inversion loaders - update package requirements - **Fixes** - add video ui elem_ids, thanks @ryanmeador - use base steps as-is for non sd/sdxl models - ui css fixes for modernui - support lora inside prompt selector - framepack video save - metadata save for manual saves ## Update for 2026-01-22 Bugfix refresh - add `SD_DEVICE_DEBUG` env variable to trace rocm/xpu/directml init failures - fix detailer double save - fix lora load when using peft/diffusers loader - fix rocm hipblaslt detection - fix image delete, thanks @awsr - fix `all_seeds` error - fix qwen settings typo, thanks @liutyi - improve `wrap_gradio` error handling - use refiner/detail steps as-is for non sd/sdxl models ## Update for 2026-01-20 ### Highlights for 2026-01-20 First release of 2026 brings quite a few new models: **Flux.2-Klein, Qwen-Image-2512, LTX-2-Dev, GLM-Image** There are also improvements to *SDNQ* quantization engine, updated *Prompt Enhance*, *Image Preview* and many others. Plus some significant under-the-hood changes to improve code coverage and quality which resulted in more than usual levels of bug-fixes and some ~330 commits! For full list of changes, see full changelog. [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2026-01-20 - **Models** - [Flux.2 Klein](https://bfl.ai/blog/flux2-klein-towards-interactive-visual-intelligence) Flux.2-Klein is a new family of compact models from BFL in *4B and 9B sizes* and avaialable as *destilled and base* variants also includes are *sdnq prequantized variants* *note*: 9B variant is [gated](https://vladmandic.github.io/sdnext-docs/Gated/) - [Qwen-Image-2512](https://qwen.ai/blog?id=qwen-image-2512) Qwen-Image successor, significantly reduces the AI-generated look and adds finer natural detailils and improved text rendering available in both *original*, *sdnq-svd prequantized* and *sdnq-dynamic prequantized* variants thanks @CalamitousFelicitousness - [LTX-2 19B Dev](https://ltx.io/model/ltx-2) LTX-2 is a new very large 19B parameter video generation model from Lightricks using Gemma-3 text encoder available for T2I/I2I workflows in original and sdnq prequantized variants *note*: model is very sensitive to input params and will result in errors otherwise - [GLM-Image](https://z.ai/blog/glm-image) GLM-image is a new image generation model that adopts a hybrid autoregressive with diffusion decoder architecture available in both *original* and *sdnq-dynamic prequantized* variants thanks @CalamitousFelicitousness *note*: model requires pre-release versions of `transformers` package: > pip install --upgrade git+ > ./webui.sh --experimental - [Nunchaku Z-Image Turbo](https://huggingface.co/nunchaku-tech/nunchaku-z-image-turbo) nunchaku optimized z-image turbo - **Feaures** - **SDNQ**: add *dynamic* quantization method sdnq can dynamically determine best quantization method for each module layer slower to quantize on-the-fly, but results in better quality with minimal resource usage - **SDNQ** now has *19 int* based and *69 float* based quantization types *note*: not all are exposed via ui purely for simplicity, but all are available via api and scripts - **wildcards**: allow weights, thanks @Tillerz - **sampler**: add laplace beta schedule results in better prompt adherence and smoother infills - **prompt enhance**: improve handling and refresh ui, thanks @CalamitousFelicitousness new models such moondream-3 and xiaomo-mimo add support for *thinking* mode where model can reason about the prompt add support for *vision* processing where prompt enhance can also optionally analyze input image add support for *pre-fill* mode where prompt enhance can continue from existing caption - **chroma**: add inpaint pipeline support - **taesd preview**: support for more models, thanks @alerikaisattera - **image ouput paths**: better handling of relative/absolute paths, thanks @CalamitousFelicitousness - **UI** - kanvas add send-to functionality - kanvas improve support for standardui - improve extensions tab layout and behavior, thanks @awsr - indicate collapsed/hidden sections - persistent panel minimize/maximize state - gallery improve sorting behavior - gallery implement prev/next navigation in full screen viewer, thanks @ryanmeador - **Internal** - **lora** native support by default will now skip text-encoder can be enabled in *settings -> networks* - update core js linting to `eslint9`, thanks @awsr - update modernui js linting to `eslint9`, thanks @awsr - update kanvas js linting to `eslint9`, thanks @awsr - update strong typing checks, thanks @awsr - update reference models previews, thanks @liutyi - update models specs page, thanks @alerikaisattera - sdnq improvements - startup sequence optimizations - rocm/hip/hipblast detection and initialization improvements - zluda detection and initialization improvements - new env variable `SD_VAE_DEFAULT` to force default vae processing - update `nunchaku==1.1.0` - lora switch logic from force-diffusers to allow-native - split `reference.json` - print system env on startup - disable fallback on models with custom loaders - refactor triggering of prompt parser and set secondary prompts when needed - refactor handling of seeds - allow unsafe ssl context for downloads - **Fixes** - controlnet: controlnet with non-english ui locales - core: add skip_keys to offloading logic, fixes wan frames mismatch, thanks @ryanmeador - core: force model move on offload=none - core: hidiffusion tracing - core: hip device name detection - core: reduce triton test verbosity - core: switch processing class not restoring params - extension tab: update checker, date handling, formatting etc., thanks @awsr - lora force unapply on change - lora handle null description, thanks @CalamitousFelicitousness - lora loading when using torch without distributed support - lora skip with strength zero - lora: generate slowdown when consequtive lora-diffusers enabled - model: google-genai auth, thanks @CalamitousFelicitousness - model: improve qwen i2i handling - model: kandinsky-5 image and video on non-cuda platforms - model: meituan-longca-image-edit missing image param - model: wan 2.2 i2v - model: z-image single-file loader - other: update civitai base models, thanks @trojaner - ui: gallery save/delete - ui: mobile auto-collapse when using side panel, thanks @awsr - ui: networks filter by model type - ui: networks icon/list view type switch, thanks @awsr - vae: force align width/height to vae scale factor - wildards with folder specification ## Update for 2025-12-26 ### Highlights for 2025-12-26 End of year release update, just two weeks after previous one, with several new models and features: - Several new models including highly anticipated **Qwen-Image-Edit 2511** as well as **Qwen-Image-Layered**, **LongCat Image** and **Ovis Image** - New features including support for **Z-Image** *ControlNets* and *fine-tunes* and **Detailer** segmentation support [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2025-12-26 - **Models** - [LongCat Image](https://github.com/meituan-longcat/LongCat-Image) in *Image* and *Image Edit* variants LongCat is a new 8B diffusion base model using Qwen-2.5 as text encoder - [Qwen-Image-Edit 2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) in *base* and *pre-quantized* variants Key enhancements: mitigate image drift, improved character consistency, enhanced industrial design generation, and strengthened geometric reasoning ability - [Qwen-Image-Layered](https://huggingface.co/Qwen/Qwen-Image-Layered) in *base* and *pre-quantized* variants Qwen-Image-Layered, a model capable of decomposing an image into multiple RGBA layers *note*: set number of desired output layers in *settings -> model options* - [Ovis Image 7B](https://huggingface.co/AIDC-AI/Ovis-Image-7B) Ovis Image is a new text-to-image base model based on Qwen3 text-encoder and optimized for text-rendering - **Features** - Google **Gemini** and **Veo** models support for both *Dev* and *Vertex* access methods see [docs](https://vladmandic.github.io/sdnext-docs/Google-GenAI/) for details - **Z-Image Turbo** support loading transformer file-tunes in safetensors format as with any transformers/unet finetunes, place them then `models/unet` and use **UNET Model** to load safetensors file as they are not complete models - **Z-Image Turbo** support for **ControlNet Union** includes 1.0, 2.0 and 2.1 variants - **Detailer** support for segmentation models some detection models can produce exact segmentation mask and not just box to enable, set `use segmentation` option added segmentation models: *anzhc-eyes-seg*, *anzhc-face-1024-seg-8n*, *anzhc-head-seg-8n* - **Internal** - update nightlies to `rocm==7.1` - mark `python==3.9` as deprecated - extensions improved status indicators, thanks @awsr - additional type-safety checks, thanks @awsr - add model info to ui overlay - **Wiki/Docs/Illustrations** - update models page, thanks @alerikaisattera - update reference models samples, thanks @liutyi - **Fixes** - generate forever fix loop checks, thanks @awsr - tokenizer expclit use for flux2, thanks @CalamitousFelicitousness - torch.compile skip offloading steps - kanvas css with standardui - control input media with non-english locales - handle embeds when on meta device - improve offloading when model has manual modules - ui section colapsible state, thanks @awsr - ui filter by model type ## Update for 2025-12-11 ### Highlights for 2025-12-11 *What's new?* New native [kanvas](https://vladmandic.github.io/sdnext-docs/Kanvas/) module for image manipulation that fully replaces *img2img*, *inpaint* and *outpaint* controls, massive update to **Captioning/VQA** models and features New generation of **Flux.2** large image model, new **Z-Image** model that is creating a lot of buzz, new **Kandinsky 5 Lite** image model and new **Photoroom PRX** model And first cloud models with **Google Nano Banana** *2.5 Flash and 3.0 Pro* and **Google Veo** *3.1* video model Also new are **HunyuanVideo 1.5** and **Kandinsky 5 Pro** video models Plus a lot of internal improvements and fixes ![Screenshot](https://github.com/user-attachments/assets/54b25586-b611-4d70-a28f-ee3360944034) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2025-12-11 - **Models** - [Black Forest Labs FLUX.2 Dev](https://bfl.ai/blog/flux-2) and prequantized variation [SDNQ-SVD-Uint4](https://huggingface.co/Disty0/FLUX.2-dev-SDNQ-uint4-svd-r32) **FLUX.2-Dev** is a brand new model from BFL and uses large 32B DiT together with Mistral 24B as text encoder model is available for text, image and edit tasks and can optionally use control input as second input image this is a very large model at ~100GB, so use of prequantized model at ~32GB is strongly advised using prequant version and default offloading, model runs on GPUs with ~20GB *note*: model is [gated](https://vladmandic.github.io/sdnext-docs/Gated/) - [Z-Image Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) and prequantized variation [SDNQ-SVD-Uint4](https://huggingface.co/Disty0/Z-Image-Turbo-SDNQ-uint4-svd-r32) **Z-Image** is a powerful and highly efficient image generation model with 6B parameters and using Qwen-3 as text encoder unlike most of new models that are far larger, Z-Image architecture allows it to run with good performance even on mid-range hardware *note*: initial release is *Turbo* variant only with *Base* and *Edit* variants to follow - [Kandinsky 5.0 Lite](https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Lite-5s-Diffusers) is a new 6B model using Qwen-2.5 as text encoder it comes in text-to-image and image-edit variants - **Google Gemini Nano Banana** [2.5 Flash](https://blog.google/products/gemini/gemini-nano-banana-examples/) and [3.0 Pro](https://deepmind.google/models/gemini-image/pro/) first cloud-based model directly supported in SD.Next UI *note*: need to set `GOOGLE_API_KEY` environment variable with your key to use this model - [Photoroom PRX 1024 Beta](https://huggingface.co/Photoroom/prx-1024-t2i-beta) PRX (Photoroom Experimental) is a small 1.3B parameter t2i model trained entirely from scratch, it uses T5-Gemma text-encoder - **Video** - [HunyuanVideo 1.5](https://huggingface.co/tencent/HunyuanVideo-1.5) in T2V and I2V variants, both standard and distilled and both 720p and 480p resolutions **HunyuanVideo 1.5** improves upon previous 1.0 version with better quality and higher resolution outputs, it uses Qwen2.5-VL text-encoder distilled variants provide faster generation with slightly reduced quality - [Kandinsky 5.0 Pro Video](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers) in T2V and I2V variants larger 19B (and more powerful version) of previously released Lite 2B models - [Google Veo 3.1](https://gemini.google/us/overview/video-generation/) for T2V and I2V workflows *note*: need to set `GOOGLE_API_KEY` environment variable with your key to use this model - **Kanvas**: new module for native canvas-based image manipulation kanvas is a full replacement for *img2img, inpaint and outpaint* controls see [docs](https://vladmandic.github.io/sdnext-docs/Kanvas/) for details *experimental*: report any feedback in master [issue](https://github.com/vladmandic/sdnext/issues/4358) - **Captioning** and **VQA: Visual Question & Answer** massive update to both features and supported models, thanks @CalamitousFelicitousness models: - additional `mooondream-2` features - support for `moondream-3-preview` - support for `qwen3-vl` with thinking - additional `gemma-3-vl` finetunes - support for `XiaomiMiMo` ui: - ability to annotate actual image, not just generate captions/answers e.g. actualy mark detected regions/points features: - ui indicator of model capabilities - support for *prefill* style of prompting/answering - support for *reasoning* mode for supported models with option to output answer-only or reasoning-process - additional debug logging - **Other Features** - **wildcards**: allow recursive inline wildcards using curly braces syntax - **sdnq**: simplify pre-quantization saved config - **attention**: additional torch attention settings - **lora**: separate fuse setting for native-vs-diffuser implementations - **auth**: strong-enforce auth check on all api endpoints - **amdgpu**: prefer rocm-on-windows over zluda - **amdgpu**: improve rocm-on-windows installer - **sdnq**: improve dequant logic - **gallery**: significant performance improvements, thanks @awsr - **API** - `/control` endpoint is now fully compatible with scripts - `/control` additional params to to control *xyz grid* see `cli/api-xyz.py` for simple example - `/detailers` new endpoint to list available detailers, both built-in and any custom downloaded - `/face-restorers` expanded to list model folders - **Internal** - python: set 3.10 as minimum supported version - sdnq: multiple improvements to quantization and dequantization logic - torch: update to `torch==2.9.1` for *cuda, ipex, openvino, rocm* backends - attention: refactor attention handling - scripts: remove obsolete video scripts - lint: update global lint rules - chrono: switch to official pipeline - pipeline: add optional preprocess and postprocess hooks - auth: wrap all internal api calls with auth check and use token when possible - installer: reduce requirements - installer: auto-restart on self-update - server: set correct mime-types - sdnq: unconditional register on startup - python: start work on future-proofing for modern python versions, thanks @awsr - nunchaku: update to `1.0.2` - lint: add rules for run-on-windows - gallery: setting to enable/disable client-side caching, thanks @awsr - gallery: faster thumbnail generation, thanks @awsr - gallery: purge old thumbnails, thanks @awsr - **Docs** - update supported models table with VAE information, thanks @alerikaisattera - **Fixes** - xyz-grid: improve parsing of axis lists, thanks @awsr - hires: strength save/load in metadata, thanks @awsr - imgi2img: fix initial scale tab, thanks @awsr - img2img: fix restoring refine sampler from metadata, thanks @awsr - log: client log formatting, thanks @awsr - rocm: check if installed before forcing install - pony-v7: fix text-encoder - detailer: with face-restorers - detailer: using lora in detailer prompt - detailer: fail on unsupported models instead of corrputing results - ui: fix collapsible panels - svd: fix stable-video-diffusion dtype mismatch - animatediff: disable sdnq if used - lora: restore pipeline type if reload/recompile needed - process: improve send-to functionality - control: safe load non-sparse controlnet - control: fix marigold preprocessor with bfloat16 - auth: fix password being shown in clear text during login - github: better handling of forks - firefox: remove obsolete checks, thanks @awsr - runai streamer: cleanup logging, thanks @CalamitousFelicitousness - gradio: event handlers, thanks @awsr - seedvr: handle non-cuda environments, thanks @resonantsky ## Update for 2025-11-06 ### Highlights for 2025-11-06 Service pack release that handles critical issues and improvements for **ROCm-on-Windows** and **ZLUDA** backends Also included are several new features, notably improvements to **detailer** and ability to run [SD.Next](https://github.com/vladmandic/sdnext) with specific modules disabled And new video model, **nVidia SANA 2B** ![Screenshot](https://github.com/user-attachments/assets/d6119a63-6ee5-4597-95f6-29ed0701d3b5) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2025-11-06 - **Models** - [SANA Video_2B_480p T2V](https://huggingface.co/Efficient-Large-Model/SANA-Video_2B_480p_diffusers) is a small 2B ultra-efficient diffusion model designed for rapid generation of high-quality videos and uses Gemma2 text encoder - **Features** - **ROCm for Windows** switch to using **TheRock** `torch` builds when available recommended to run: `webui --use-rocm --reinstall` - **ZLUDA** improve detection and handling of unsupported GPUs recommended to run: `webui --use-zluda --reinstall` - **detailer** optional include detection image to output results optional sort detection objects left-to-right for improved prompt consistency enable multi-subject and multi-model prompts - **disable modules** ability to disable parts of the app useful for custom deployments where some features are not desired *note*: this doesn't just hide it from user, it completely disables the code paths use `--disable x,y,z` possible values: - main tabs: *control,txt2img,img2img,video,extras,caption,gallery* - aside tabs: *extensions,models,info,update,history,monitor,onnx,system,networks,logs* - special: *settings,config* (hidden instead of disabled) - **wildcards**: add inline processing using curly braces syntax - add setting to control `cudnn` enable/disable *note*: this can also be used to enable/disable `MIOpen` on ROCm backends - change `vlm` beams to 1 by default for faster response - **controlnet** allow processor to keep aspect-ratio for override images based on i2i or t2i resolution - **networks** info details now displays image metadata from preview image - **networks** new model previews, thanks @liutyi - **Fixes** - zluda: test and disable MIOpen as needed - qwen: improve lora compatibility - chrono: transformers handling - chrono: extract last frame - chrono: add vae scale override, thanks @CalamitousFelicitousness - runai: improve streamer integration - transformers: `dtype` use new syntax - rocm: possible endless loop during hip detection - rocm: auto-disable `miopen` for gfx120x - detailer: better handling of settings, thanks @awsr - installer: cleanup `--optional` - hires: guard against multi-controlnet - inpaint: fix init - version: detection when cloned with .git suffix, thanks @awsr - sdnq: init on video model load - model type: detection - model type: add tracing to model detection - settings: guard against non-string values, thanks @awsr - ui: wait for server options to be ready before initializing ui - ui: fix full-screen image viewer buttons with non-standard ui theme - ui: control tab show override section - ui: mobile layout for video tab - ui: increase init timeout - video: save to subfolder - taesd: warn on long decode times - metadata: keep exif on thumbnail generation - wildcard: obey seed for reproducible results - sageattention: handle possible triton issues on some nvidia gpus, thanks @CalamitousFelicitousness ## Update for 2025-10-31 ### Highlights for 2025-10-31 Less than 2 weeks since last release, here's a service-pack style update with a lot of fixes and improvements: - Reorganization of **Reference Models** into *Base, Quantized, Distilled and Community* sections for easier navigation and introduction of optimized **pre-quantized** variants for many popular models - use this as your quick start! - New models: **HunyuanImage 2.1** capable of 2K images natively, **HunyuanImage 3.0** large unified multimodal autoregressive model, **ChronoEdit** that re-purposes temporal consistency of generation for image editing **Pony 7** based on AuraFlow architecture, **Kandinsky 5** 10s video models - New **offline mode** to use previously downloaded models without internet connection - Optimizations to **WAN-2.2** given its popularity plus addition of native **VAE Upscaler** and optimized **pre-quantized** variants - New SOTA model loader using **Run:ai streamer** - Updates to `rocm` and `xpu` backends - Fixes, fixes, fixes... too many to list here! ![Screenshot](https://github.com/user-attachments/assets/d6119a63-6ee5-4597-95f6-29ed0701d3b5) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2025-10-31 - **Reference** networks section is now split into actual *Base* models plus: - **Quantized**: pre-quantized variants of the base models using SDNQ-SVD quantization for optimal quality and smallest possible resource usage examples: *FLUX.1-Dev/Krea/Kontext/Schnell, Qwen-Image/Edit/2509, Chroma1-HD, WAN-2.2-A44B, etc.* *note*: pre-quantized *WAN-2.2-14B* is also available in video models and runs with only 12GB VRAM! - **Distilled**: distilled variants of base models examples: *Turbo, Lightning, Lite, SRPO, Distill, Pruning, etc.* - **Community**: community highlights examples: *Tempest, Juggernaut, Illustrious, Pony, NoobAI, etc.* and all reference models have new preview images, thanks @liutyi - **Models Reference** - [Tencent HunyuanImage 2.1](https://huggingface.co/tencent/HunyuanImage-2.1) in *full*, *distilled* and *refiner* variants *HunyuanImage-2.1* is a large (51GB) T2I model capable of natively generating 2K images and uses Qwen2.5 + T5 text-encoders and 32x VAE - [Tencent HunyuanImage 3.0](https://huggingface.co/tencent/HunyuanImage-3.0) in [pre-quant](https://huggingface.co/Disty0/HunyuanImage3-SDNQ-uint4-svd-r32) only variant due to massive size *HunyuanImage 3.0* is very large at 47GB pre-quantized (oherwise its 157GB) that unifies multimodal understanding and generation within an autoregressive framework - [nVidia ChronoEdit](https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers) *ChronoEdit* is a 14B image editing model based on *WAN* this model reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency to extend temporal consistency for image editing, set *settings -> model options -> chrono temporal steps* to desired number of temporaly reasoning steps - [Kandinsky 5 Lite 10s](https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers') in *SFT, CFG-distilled and Steps-distilled* variants second series of models in *Kandinsky5* series is T2V model optimized for 10sec videos and uses Qwen2.5 text encoder - [Pony 7](https://huggingface.co/purplesmartai/pony-v7-base) Pony 7 steps in a different direction from previous Pony models and is based on AuraFlow architecture and UMT5 encoder - **Models Auxiliary** - [Qwen 3-VL](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) VLM for interrogate and prompt enhance, thanks @CalamitousFelicitousness this includes *2B, 4B and 8B* variants - [WAN Asymettric Upscale](https://huggingface.co/spacepxl/Wan2.1-VAE-upscale2x) available as general purpose upscaler that can be used during standard workflow or process tab available as VAE for compatible video models: *WAN-2.x-14B, SkyReels-v2* models - [Apple DepthPro](https://huggingface.co/apple/DepthPro) controlnet processor, thanks @nolbert82 - [LibreFlux controlnet](https://huggingface.co/neuralvfx/LibreFlux-ControlNet) segmentation controlnet for FLUX.1 - **Features** - **offline mode**: enable in *settings -> hugginface* enables fully offline mode where previously downloaded models can be used as-is *note*: must be enabled only after all packages have been installed and model has been run online at least once - **model load**: SOTA method using nVidia's [Run:ai streamer](https://github.com/run-ai/runai-model-streamer) enable in *settings -> model options -> runai streamer* applies to *diffusers, transformers and sdnq* loaders, note this is linux-only feature *experimental* but shows significant model load speedups, 20-40% depending on model and hardware - **Backend** - switch to `torch==2.9` for *ipex, rocm and openvino* - switch to `rocm==7.0` for nightlies - log `triton` availability on startup - add `xpu` stats in gpu monitor - **Other** - improved **SDNQ SVD** and low-bit matmul performance - reduce RAM usage on model load using **SDNQ SVD** - change default **schedulers** for sdxl - warn on `python==3.9` end-of-life and `python==3.10` not actively supported - **scheduler** add base and max shift parameters for flow-matching samplers - enhance `--optional` flag to pre-install optional packages - add `[lora]` to recognized filename patterns - when using **shared-t5** *(default)*, it will load standard or pre-quant depending on model - enhanced LoRA support for **Wan-2.2-14B** - log available attention mechanisms on startup - support for switching back-and-forth **t2i** and **t2v** for *wan-2.x* models - control `api` cache controlnets - additional model modules **deduplication** for both normal and pre-quant models: *umt5, qwen25-vl* - **Fixes** - startup error with `--profile` enabled if using `--skip` - restore orig init image for each batch sequence - fix modernui hints layout - fix `wan-2.2-a14b` stage selection - fix `wan-2.2-5b` vae decode - disabling live preview should not disable progress updates - video tab create `params.txt` with metadata - fix full-screen image-viewer toolbar actions with control tab - improve filename sanitization - lora auto-detect low/high stage if not specified - lora disable fuse on partially applied network - fix networks display with extended characters, thanks @awsr - installer handle different `opencv` package variants - fix using pre-quantized shared-t5 - fix `wan-2.2-14b-vace` single-stage exectution - fix `wan-2.2-5b` tiled vae decode - fix `controlnet` loading with quantization - video use pre-quantized text-encoder if selected model is pre-quantized - handle sparse `controlnet` models - catch `xet` warnings - avoid unnecessary pipe variant switching - validate pipelines on import - fix `nudenet` process tab operations - `controlnet` input validation - log metadata keys that cannot be applied - fix `framepack` with image input ## Update for 2025-10-18 - **Models** [Kandinsky 5 Lite](https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers') in *SFT, CFG-distilled and Steps-distilled* variants first model in Kandinsky5 series is T2V model optimized for 5sec videos and uses Qwen2.5 text encoder - **Fixes** - ROCm-on-Windows additional checks - SDNQ-SVD fallback on incompatible layers - Huggingface model download - Video implement dynamic and manual sampler shift - Fix interrupt batch processing - Delay import of control processors until used - Fix tiny VAE with batched results - Fix CFG scale not added to metadata and set valid range to >=1.0 - **Other** - Optimized Video tab layout - Video enable VAE slicing and framewise decoding when possible - Detect and log `flash-attn` and `sageattention` if installed - Remove unused UI settings ## Update for 2025-10-17 ### Highlights for 2025-10-17 It's been a month since the last release and number of changes is yet again massive with over 300 commits! Highlight are: - **Torch**: ROCm on Windows for AMD GPUs if you have a compatible GPU, performance gains are significant! - **Models**: a lot of new stuff with **Qwen-Image-Edit** including multi-image edits and distilled variants, new **Flux**, **WAN**, **LTX**, **HiDream** variants, expanded **Nunchaku** support and new SOTA upscaler with **SeedVR2** plus improved video support in general, including new methods of video encoding - **Quantization**: new **SVD**-style quantization using SDNQ offers almost zero-loss even with **4bit** quantization and now you can also test your favorite quantization on-the-fly and then save/load model for future use - Other: support for **Huggingface** mirrors, changes to installer to prevent unwanted `torch-cpu` operations, improved VAE previews, etc. ![Screenshot](https://github.com/user-attachments/assets/d6119a63-6ee5-4597-95f6-29ed0701d3b5) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2025-10-17 - **Models** - [WAN 2.2 14B VACE](https://huggingface.co/alibaba-pai/Wan2.2-VACE-Fun-A14B) available for *text-to-image* and *text-to-video* and *image-to-video* workflows - [Qwen Image Edit 2509](https://huggingface.co/Qwen/Qwen-Image-Edit-2509) and [Nunchaku Qwen Image Edit 2509](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image-edit-2509) updated version of Qwen Image Edit with improved image consistency - [Qwen Image Pruning](https://huggingface.co/OPPOer/Qwen-Image-Pruning) and [Qwen Image Edit Pruning](https://huggingface.co/OPPOer/Qwen-Image-Edit-Pruning) pruned versions of Qwen with 13B params instead of 20B, with some quality tradeoff - [Tencent FLUX.1 Dev SRPO](https://huggingface.co/tencent/SRPO) SRPO is trained by Tencent with specific technique: directly aligning the full diffusion trajectory with fine-grained human preference - [Nunchaku SDXL](https://huggingface.co/nunchaku-tech/nunchaku-sdxl) and [Nunchaku SDXL Turbo](https://huggingface.co/nunchaku-tech/nunchaku-sdxl-turbo) impact of nunchaku engine on unet-based model such as sdxl is much less than on a dit-based models, but its still significantly faster than baseline note that nunchaku optimized and pre-quantized unet is replacement for base unet, so its only applicable to base models, not any of fine-tunes *how to use*: enable nunchaku in settings -> quantization and then load either sdxl-base or sdxl-base-turbo reference models - [HiDream E1.1](https://huggingface.co/HiDream-ai/HiDream-E1-1) updated version of HiDream-E1 image editing model - [LTXVideo 0.9.8](https://huggingface.co/Lightricks/LTX-Video-0.9.8-13B-distilled) updated version of LTXVideo t2v/i2iv model - [SeedVR2](https://iceclear.github.io/projects/seedvr/) originally designed for video restoration, seedvr works great for image detailing and upscaling! available in 3B, 7B and 7B-sharp variants, use as any other upscaler! note: seedvr is a very large model (6.4GB and 16GB respectively) and not designed for lower-end hardware, quantization is highly recommended note: seedvr is highly sensitive to its cfg scale, set in *settings -> postprocessing* lower values will result in smoother output while higher values add details - [X-Omni SFT](https://x-omni-team.github.io/) *experimental*: X-omni is a transformer-only discrete auto-regressive image generative model trained with reinforcement learning - **Features** - **Model save**: ability to save currently loaded model as a new standalone model why? SD.Next always prefers to start with full model and quantize on-demand during load however, when you find your exact preferred quantization settings that work well for you, saving such model as a new model allows for faster loads and reduced disk space usage so its best of both worlds: you can experiment and test different quantization methods and once you find the one that works for you, save it as a new model saved models appear in network tab as normal models and can be loaded as such available in *models* tab - [Qwen Image-Edit](https://huggingface.co/Qwen/Qwen-Image-Edit-2509) multi-image editing requires qwen-image-edit-2509 or its variant as multi-image edits are not available in original qwen-image in ui control tab: inputs -> separate init image add image for *input media* and *control media* can be - [Cache-DiT](https://github.com/vipshop/cache-dit) cache-dit is a unified, flexible and training-free cache acceleration framework compatible with many dit-based models such as FLUX.1, Qwen, HunyuanImage, Wan2.2, Chroma, etc. enable in *settings -> pipeline modifiers -> cache-dit* - [Nunchaku Flux.1 PulID](https://nunchaku.tech/docs/nunchaku/python_api/nunchaku.pipeline.pipeline_flux_pulid.html) automatically enabled if loaded model is FLUX.1 with Nunchaku engine enabled and when PulID script is enabled - **Huggingface mirror** in *settings -> huggingface* if you're working from location with limited access to huggingface, you can now specify a mirror site for example enter, `https://hf-mirror.com` - **Compute** - **ROCm** for Windows support for both official torch preview release of `torch-rocm` for windows and **TheRock** unofficial `torch-rocm` builds for windows note that rocm for windows is still in preview and has limited gpu support, please check rocm docs for details - **DirectML** warn as *end-of-life* `torch-directml` received no updates in over 1 year and its currently superseded by `rocm` or `zluda` - command line params `--use-zluda` and `--use-rocm` will attempt desired operation or fail if not possible previously sdnext was performing a fallback to `torch-cpu` which is not desired - **installer** if `--use-cuda` or `--use-rocm` are specified and `torch-cpu` is installed, installer will attempt to reinstall correct torch package - **installer** warn if *cuda* or *rocm* are available and `torch-cpu` is installed - support for `torch==2.10-nightly` with `cuda==13.0` - **Extensions** - [Agent-Scheduler](https://github.com/SipherAGI/sd-webui-agent-scheduler) was a high-value built-in extension, but it has not been maintained for 1.5 years it also does not work with control and video tabs which are the core of sdnext nowadays so it has been removed from built-in extensions: manual installation is still possible - [DAAM: Diffusion Attentive Attribution Maps](https://github.com/castorini/daam) create heatmap visualizations of which parts of the prompt influenced which parts of the image available in scripts for sdxl text-to-image workflows - **Offloading** - improve offloading for pipelines with multiple stages such as *wan-2.2-14b* - add timers to measure onload/offload times during generate - experimental offloading using `torch.streams` enable in settings -> model offloading - new feature to specify which models types not to offload in *settings -> model offloading -> model types not to offload* - **UI** - **connection monitor** main logo in top-left corner now indicates server connection status and hovering over it shows connection details - separate guidance and detail sections - networks ability to filter lora by base model version - add interrogate button to input images - disable spellchecks on all text inputs - **SDNQ** - add `SVDQuant` quantization method support - make sdnq scales compatible with balanced offload - add int8 `matmul` support for RDNA2 GPUs via triton - improve int8 `matmul` performance on Intel GPUs - **Other** - server will note when restart is recommended due to package updates - **interrupt** will now show last known preview image *keep incomplete* setting is now *save interrupted* - **logging** enable `debug`, `docs` and `api-docs` by default - **logging** add detailed ram/vram utilization info to log logging frequency can be specified using `--monitor x` command line param, where x is number of seconds - **ipex** simplify internal implementation - refactor to use new libraries - styles and wildcards now use same seed as main generate for reproducible results - **api** new endpoint POST `/sdapi/v1/civitai` to trigger civitai models metadata update accepts optional `page` parameter to search specific networks page - **reference models** additional example images, thanks @liutyi - **reference models** add model size and release date, thanks @alerikaisattera - **video** support for configurable multi-stage models such as WAN-2.2-14B - **video** new LTX model selection - replace `pynvml` with `nvidia-ml-py` for gpu monitoring - update **loopback** script with radon seed option, thanks @rabanti - **vae** slicing enable for *lowvram/medvram*, tiling for *lowvram*, both disabled otherwise - **attention** remove split-attention and add explicitly attention slicing enable/disable option enable in *settings -> compute settings* can be combined with sdp, enabling may improve stability when used on iGPU or shared memory systems - **nunchaku** update to `1.0.1` and enhance installer - **xyz-grid** add guidance section - **preview** implement configurable layers for WAN, Qwen, HV - update swagger `/docs` endpoint style - add `[epoch]` to filename template - starting `[seq]` for filename template is now higher of largest previous sequence or number of files in folder - **Video** - use shared **T5** text encoder for video models when possible - use shared **LLama** text encoder for video models when possible - unified video save code across all video models also avoids creation of temporary files for each frame unless user wants to save them - unified prompt enhance code across all video models - add job state tracking for video generation - fix quantization not being applied on load for some models - improve offloading for **ltx** and **wan** - fix model selection in **ltx** tab - **Experimental** - `new` command line flag enables new `pydantic` and `albumentations` packages - **modular pipelines**: enable in *settings -> model options* only compatible with some pipelines, invalidates preview generation - **modular guiders**: automatically used for compatible pipelines when *modular pipelines* is enabled allows for using many different guidance methods: *CFG, CFGZero, PAG, APG, SLG, SEG, TCFG, FDG* - **Wiki** - updates to *AMD-ROCm, ZLUDA, LoRA, DirectML, SDNQ, Quantization, Prompting, LoRA* pages - new *Stability-Matrix* page - **Fixes** - **Microsoft Florence 2** both base and large variants *note* this will trigger download of the new variant of the model, feel free to delete older variant in `huggingface` folder - **MiaoshouAI PromptGen** 1.5/2.0 in both base and large variants - fix prompt scheduling, thanks @nolbert82 - ui: fix image metadata display when switching selected image in control tab - framepack: add explicit hf-login before framepack load - framepack: patch solver for unsupported gpus - benchmark: remove forced sampler from system info benchmark - xyz-grid: fix xyz grid with random seeds - reference: fix download for sd15/sdxl reference models - fix checks in init/mask image decode - fix hf token with extra chars - image viewer refocus on gallery after returning from full screen mode - fix attention guidance metadata save/restore - vae preview add explicity cuda.sync ## Update for 2025-09-15 ### Highlights for 2025-09-15 *What's new*? Big one is that we're (*finally*) switching the default UI to **ModernUI**, for both desktop and mobile use! **StandardUI** is still available and can be selected in settings, but ModernUI is now the default for new installs *What's else*? **Chroma** is in its final form, there are several new **Qwen-Image** variants and **Nunchaku** hit version 1.0! Also, there are quite a few offloading improvements and many quality-of-life changes to UI and overall workflows And check out new **history** tab in the right panel, it now shows visualization of entire processing timeline! ![Screenshot](https://github.com/user-attachments/assets/d6119a63-6ee5-4597-95f6-29ed0701d3b5) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) ### Details for 2025-09-15 - **Models** - **Chroma** final versions: [Chroma1-HD](https://huggingface.co/lodestones/Chroma1-HD), [Chroma1-Base](https://huggingface.co/lodestones/Chroma1-Base) and [Chroma1-Flash](https://huggingface.co/lodestones/Chroma1-Flash) - **Qwen-Image** [InstantX ControlNet Union](https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union) support *note* qwen-image is already a very large model and controlnet adds 3.5GB on top of that so quantization and offloading are highly recommended! - [Qwen-Lightning-Edit](https://huggingface.co/vladmandic/Qwen-Lightning-Edit) and [Qwen-Image-Distill](https://huggingface.co/SahilCarterr/Qwen-Image-Distill-Full) variants - **Nunchaku** variants of [Qwen-Image-Lightning](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image), [Qwen-Image-Edit](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image-edit), [Nunchaku-Qwen-Image-Edit-Lightning](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image-edit) - **Nunchaku** variant of [Flux.1-Krea-Dev](https://huggingface.co/nunchaku-tech/nunchaku-flux.1-krea-dev) if you have a compatible nVidia GPU, Nunchaku is the fastest quantization & inference engine - [HunyuanDiT ControlNet](https://huggingface.co/Tencent-Hunyuan/HYDiT-ControlNet-v1.2) Canny, Depth, Pose - [KBlueLeaf/HDM-xut-340M-anime](https://huggingface.co/KBlueLeaf/HDM-xut-340M-anime) highly experimental: HDM *Home-made-Diffusion-Model* is a project to investigate specialized training recipe/scheme for pre-training T2I model at home based on super-light architecture *requires*: generator=cpu, dtype=float16, offload=none, both positive and negative prompts are required and must be long & detailed - [Apple FastVLM](https://huggingface.co/apple/FastVLM-0.5B) in 0.5B, 1.5B and 7B variants available in captioning tab - updated [SD.Next Model Samples Gallery](https://vladmandic.github.io/sd-samples/compare.html) - **UI** - default to **ModernUI** standard ui is still available via *settings -> user interface -> theme type* - mobile-friendly! - new **History** section in the right panel shows detailed job history plus timeline of the execution - make hints touch-friendly: hold touch to display hint - improved image scaling in img2img and control interfaces - add base model type to networks display, thanks @Artheriax - additional hints to ui, thanks @Artheriax - add video support to gallery, thanks @CalamitousFelicitousness - additional artwork for reference models in networks, thanks @liutyi - improve ui hints display - restyled all toolbuttons to be modernui native - reordered system settings - dynamic direction of dropdowns - improve process tab layout - improve detection of active tab - configurable horizontal vs vertical panel layout in settings -> user interface -> panel min width *example*: if panel width is less than specified value, layout switches to vertical - configurable grid images size in *settings -> user interface -> grid image size* - gallery now includes reference model images - reference models now include indicator if they are *ready* or *need download* - **Offloading** - **balanced** - enable offload during pre-forward by default - improve offloading of models with multiple dits - improve offloading of models with impliciy vae processing - improve offloading of models with controlnet - more aggressive offloading of controlnet with lowvram flag - **group** - new offloading method, using *type=leaf* works on a similar level as sequential offloading and can present significant savings on low-vram gpus, but comes at the higher performance cost - **Quantization** - option to specify models types not to quantize: *settings -> quantization* allows for having quantization enabled, but skipping specific model types that do not need it *example*: `sd, sdxl` - **sdnq** - add quantized matmul support for all quantization types and group sizes - improve the performance of low bit quants - **nunchaku**: update to `nunchaku==1.0.0` *note*: nunchaku updated the repo which will trigger re-download of nunchaku models when first used nunchaku is currently available for: *Flux.1 Dev/Schnell/Kontext/Krea/Depth/Fill*, *Qwen-Image/Qwen-Lightning*, *SANA-1.6B* - **tensorrt**: new quantization engine from nvidia *experimental*: requires new pydantic package which *may* break other things, to enable start sdnext with `--new` flag *note*: this is model quantization only, no support for tensorRT inference yet - **Other** - **LoRA** allow specifying module to apply lora on *example*: `` would apply lora *only* on unet regardless of lora content this is particularly useful when you have multiple loras and you want to apply them on different parts of the model *example*: `` and `` *note*: `low` is shorthand for `module=transformer_2` and `high` is shortcut for `module=transformer` - **Detailer** allow manually setting processing resolution *note*: this does not impact the actual image resolution, only the resolution at which detailer internally operates - refactor reuse-seed and add functionality to all tabs - refactor modernui js codebase - move zluda flash attenion to *Triton Flash attention* option - remove samplers filtering - allow both flow-matching and discrete samplers for sdxl models - cleanup command line parameters - add `--new` command line flag to enable testing of new packages without breaking existing installs - downgrade rocm to `torch==2.7.1` - set the minimum supported rocm version on linux to `rocm==6.0` - disallow `zluda` and `directml` on non-windows platforms - update openvino to `openvino==2025.3.0` - add deprecation warning for `python==3.9` - allow setting denoise strength to 0 in control/img2img this allows to run workflows which only refine or detail existing image without changing it - **Fixes** - normalize path hanlding when deleting images - unified compile upscalers - fix OpenVINO with ControlNet - fix hidden model tags in networks display - fix networks reference models display on windows - fix handling of pre-quantized `flux` models - fix `wan` use correct pipeline for i2v models - fix `qwen-image` with hires - fix `omnigen-2` failure - fix `auraflow` quantization - fix `kandinsky-3` noise - fix `infiniteyou` pipeline offloading - fix `skyreels-v2` image-to-video - fix `flex2` img2img denoising strength - fix `flex2` contronet vs inpaint image selection, thanks @alerikaisattera - fix some use cases with access via reverse-proxy - fix segfault on startup with `rocm==6.4.3` and `torch==2.8` - fix wildcards folders traversal, thanks @dymil - fix zluda flash attention with enable_gqa - fix `wan a14b` quantization - fix reprocess workflow for control with hires - fix samplers set timesteps vs sigmas - fix `detailer` missing metadata - fix `infiniteyou` lora load with ## Update for 2025-08-20 A quick service release with several important hotfixes, improved localization support and adding new **Qwen** model variants... [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) - **Models** - [Qwen-Image-Edit](https://huggingface.co/Qwen/Qwen-Image-Edit) Image editing using natural language prompting, similar to `Flux.1-Kontext`, but based on larger 20B `Qwen-Image` model - [Nunchaku-Qwen-Image](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image) if you have a compatible nVidia GPU, Nunchaku is the fastest quantization engine, currently available for Flux.1, SANA and Qwen-Image models *note*: release version of `nunchaku==0.3.2` does NOT include support, so you need to build [nunchaku](https://nunchaku.tech/docs/nunchaku/installation/installation.html) from source - [SD.Next Model Samples Gallery](https://vladmandic.github.io/sd-samples/compare.html) - updated with new models - **Features** - new *setting -> huggingface -> download method* default is `rust` as new `xet` is known to cause issues - support for `flux.1-kontext` lora - support for `qwen-image` lora - new *setting -> quantization -> modules dtype dict* used to manually override quant types per module - **UI** - new artwork for reference models in networks thanks @liutyi - updated [localization](https://vladmandic.github.io/sdnext-docs/Locale/) for all 8 languages - localization support for ModernUI - single-click on locale rotates current locale double-click on locale resets locale to `en` - exclude ModernUI from list of extensions ModernUI is enabled in settings, not by manually enabling extension - **Docs** - Models and Video pages updated with links to original model repos, model licenses and original release dates thanks @alerikaisattera - **Fixes** - nunchaku use new download links and default to `0.3.2` nunchaku wheels: - fix OpenVINO with offloading - add explicit offload calls on prompt encode - error reporting on model load failure - fix torch version checks - remove extra cache clear - enable explicit sync calls for `rocm` on windows - note if restart-needed on initial startup import error - bypass diffusers-lora-fuse on quantized models - monkey-patch diffusers to use original weights shape when loading lora - guard against null prompt - install `hf_transfter` and `hf_xet` when needed - fix ui cropped network tags - enum reference models on startup - dont report errors if agent scheduler is disabled ## Update for 2025-08-15 ### Highlights for 2025-08-15 New release two weeks after the last one and its a big one with over 150 commits! - Several new models: [Qwen-Image](https://qwenlm.github.io/blog/qwen-image/) (plus *Lightning* variant) and [FLUX.1-Krea-Dev](https://www.krea.ai/blog/flux-krea-open-source-release) - Several updated models: [Chroma](https://huggingface.co/lodestones/Chroma), [SkyReels-V2](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers), [Wan-VACE](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B-diffusers), [HunyuanDiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers-Distilled) - Plus continuing with major **UI** work with new embedded **Docs/Wiki** search, redesigned real-time **hints**, **wildcards** UI selector, built-in **GPU monitor**, **CivitAI** integration and more! - On the compute side, new profiles for high-vram GPUs, offloading improvements, parallel-load for large models, support for new `torch` release and improved quality when using low-bit quantization! - [SD.Next Model Samples Gallery](https://vladmandic.github.io/sd-samples/compare.html): pre-generated image gallery with 60 models (45 base and 15 finetunes) and 40 different styles resulting in 2,400 high resolution images! gallery additionally includes model details such as typical load and inference times as well as sizes and types of each model component (*e.g. unet, transformer, text-encoder, vae*) - And (*as always*) many bugfixes and improvements to existing features! ![sd-samples](https://github.com/user-attachments/assets/3efc8603-0766-4e4e-a4cb-d8c9b13d1e1d) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) *Note*: Change-in-behavior - locations of downloaded HuggingFace models and components are changed to allow for de-duplication of common modules and switched from using system default cache folder to `models/huggingface` SD.Next will warn on startup on unused cache entries that can be removed. Also, to take advantage of de-duplication, you'll need to delete models from your `models/Diffusers` folder and let SD.Next re-download them! ### Details for 2025-08-15 - **Models** - [Qwen-Image](https://qwenlm.github.io/blog/qwen-image/) new image foundational model with *20B* params DiT and using *Qwen2.5-VL-7B* as the text-encoder! available via *networks -> models -> reference* *note*: this model is almost 2x the size of Flux, quantization and offloading are highly recommended! *recommended* params: *steps=50, attention-guidance=4* also available is pre-packaged [Qwen-Lightning](https://huggingface.co/vladmandic/Qwen-Lightning) which is an unofficial merge of [Qwen-Image](https://qwenlm.github.io/blog/qwen-image/) with [Qwen-Lightning-LoRA](https://github.com/ModelTC/Qwen-Image-Lightning/) to improve quality and allow for generating in 8-steps! - [FLUX.1-Krea-Dev](https://www.krea.ai/blog/flux-krea-open-source-release) new 12B base model compatible with FLUX.1-Dev from *Black Forest Labs* with opinionated aesthetics and aesthetic preferences in mind available via *networks -> models -> reference* - [Chroma](https://huggingface.co/lodestones/Chroma) great model based on FLUX.1 and then redesigned and retrained by *lodestones* update with latest **HD**, **HD Flash** and **HD Annealed** variants which are based on *v50* release available via *networks -> models -> reference* - [SkyReels-V2](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers) SkyReels-V2 is a genarative video model based on Wan-2.1 but with heavily modified execution to allow for infinite-length video generation supported variants are: - diffusion-forcing: *T2I DF 1.3B* for 540p videos, *T2I DF 14B* for 720p videos, *I2I DF 14B* for 720p videos - standard: *T2I 14B* for 720p videos and *I2I 14B* for 720p videos - [Wan-VACE](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B-diffusers) basic support for *Wan 2.1 VACE 1.3B* and *14B* variants optimized support with granular guidance control will follow soon - [HunyuanDiT-Distilled](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers-Distilled) variant of HunyuanDiT with reduced steps and improved performance **Torch** - Set default to `torch==2.8.0` for *CUDA, ROCm and OpenVINO* - Add support for `torch==2.9.0-nightly` - **UI** - new embedded docs/wiki search! **Docs** search: fully-local and works in real-time on all document pages **Wiki** search: uses github api to search online wiki pages - updated real-time hints, thanks @CalamitousFelicitousness - add **Wilcards** UI in networks display - every heading element is collapsible! - quicksettings reset button to restore all quicksettings to default values because things do sometimes get wrong... - configurable image fit in all image views - rewritten **CivitAI downloader** in *models -> civitai* *hint*: you can enter model id in a search bar to pull information on specific model directly *hint*: you can download individual versions or batch-download all-at-once! - redesigned **GPU monitor** - standard-ui: *system -> gpu monitor* - modern-ui: *aside -> console -> gpu monitor* - supported for *nVidia CUDA* and *AMD ROCm* platforms - configurable interval in *settings -> user interface* - updated *models* tab - updated *models -> current* tab - updated *models -> list models* tab - updated *models -> metadata* tab - updated *extensions* tab - redesigned *settings -> user interface* - gallery bypass browser cache for thumbnails - gallery safer delete operation - networks display indicator for currently active items applies to: *styles, loras* - apply privacy blur to hf and civitai tokens - image download will now use actual image filename - increase default and maximum ui request timeout to 2min/5min - *hint*: card layout card layout is used by networks, gallery, civitai search, etc. you can change card size in *settings -> user interface* - **Offloading** - changed **default** values for offloading based on detected gpu memory see [offloading docs](https://vladmandic.github.io/sdnext-docs/Offload/) for details - new feature to specify which modules to offload always or never in *settings -> model offloading -> offload always/never* - new `highvram` profile provides significant performance boost on gpus with more than 24gb - new `offload during pre-forward` option in *settings -> model offloading* switches from explicit offloading to implicit offloading on module execution change - new `diffusers_offload_nonblocking` exerimental setting instructs torch to use non-blocking move operations when possible - **Features** - new `T5: Use shared instance of text encoder` option in *settings -> text encoder* since a lot of new models use T5 text encoder, this option allows to share the same instance across all models without duplicate downloads *note* this will not reduce size of your already downloaded models, but will reduce size of future downloads - **Wan** select which stage to run: *first/second/both* with configurable *boundary ration* when running both stages in settings -> model options - prompt parser allow explict `BOS` and `EOS` tokens in prompt - **Nunchaku** support for *FLUX.1-Fill* and *FLUX.1-Depth* models - update requirements/packages - use model vae scale-factor for image width/heigt calculations - **SDNQ** add `modules_dtype_dict` to quantize *Qwen Image* with mixed dtype - **prompt enhance** add `allura-org/Gemma-3-Glitter-4B`, `Qwen/Qwen3-4B-Instruct-2507`, `Qwen/Qwen2.5-VL-3B-Instruct` model support improve system prompt - **schedulers** add **Flash FlowMatch** - **model loader** add parallel loader option enabled by default, selectable in *settings -> model loading* - **filename namegen** use exact sequence number instead of next available this allows for more predictable and consistent filename generation - **network delete** new feature that allows to delete network from disk in *networks -> show details -> delete* this will also delete description, metadata and previews associated with the network only applicable to safetensors networks, not downloaded diffuser models - **Wiki** - Models page updated with links to original model repos and model licenses, thanks @alerikaisattera - Updated Model-Support with newly supported models - Updated Offload, Prompting, API pages - **API** - add `/sdapi/v1/checkpoint` POST endpoint to simply load a model - add `/sdapi/v1/modules` GET endpoint to get info on model components/modules - all generate endpoints now support `sd_model_checkpoint` parameter this allows to specify which model to use for generation without needing to use additional endpoints - **Refactor** - change default huggingface cache folder from system default to `models/huggingface` sd.next will warn on startup on unused cache entries - new unified pipeline component loader in `pipelines/generic` - remove **LDSR** - remove `api-only` cli option - **Docker** - update cuda base image: `pytorch/pytorch:2.8.0-cuda12.8-cudnn9-runtime` - update official builds: - **Fixes** - refactor legacy processing loop - fix settings components mismatch - fix *Wan 2.2-5B I2V* workflow - fix *Wan* T2I workflow - fix OpenVINO - fix video model vs pipeline mismatch - fix video generic save frames - fix inpaint image metadata - fix processing image save loop - fix progress bar with refine/detailer - fix api progress reporting endpoint - fix `openvino` backend failing to compile - fix `zluda` with hip-sdk==6.4 - fix `nunchaku` fallback on unsupported model - fix `nunchaku` windows download links - fix *Flux.1-Kontext-Dev* with variable resolution - use `utf_16_be` as primary metadata decoding - fix `sd35` width/height alignment - fix `nudenet` api - fix global state tracking - fix ui tab detection for networks - fix ui checkbox/radio styling for non-default themes - fix loading custom transformers and t5 safetensors tunes - add mtime to reference models - patch torch version so 3rd party libraries can use expected format - unified stat size/mtime calls - reapply offloading on ipadapter load - api set default script-name - avoid forced gc and rely on thresholds - add missing interrogate in output panel ## Update for 2025-07-29 ### Highlights for 2025-07-29 This is a big one: simply looking at number of changes, probably the biggest release since the project started! Feature highlights include: - [ModernUI](https://github.com/user-attachments/assets/6f156154-0b0a-4be2-94f0-979e9f679501) has quite some redesign which should make it more user friendly and easier to navigate plus several new UI themes If you're still using **StandardUI**, give [ModernUI](https://vladmandic.github.io/sdnext-docs/Themes/) a try! - New models such as [WanAI 2.2](https://wan.video/) in 5B and A14B variants for both *text-to-video* and *image-to-video* workflows as well as *text-to-image* workflow! and also [FreePik F-Lite](https://huggingface.co/Freepik/F-Lite), [Bria 3.2](https://huggingface.co/briaai/BRIA-3.2) and [bigASP 2.5](https://civitai.com/models/1789765?modelVersionId=2025412) - Redesigned [Video](https://vladmandic.github.io/sdnext-docs/Video) interface with support for general video models plus optimized [FramePack](https://vladmandic.github.io/sdnext-docs/FramePack) and [LTXVideo](https://vladmandic.github.io/sdnext-docs/LTX) support - Fully integrated nudity detection and optional censorship with [NudeNet](https://vladmandic.github.io/sdnext-docs/NudeNet) - New background replacement and relightning methods using **Latent Bridge Matching** and new **PixelArt** processing filter - Enhanced auto-detection of default sampler types/settings results in avoiding common mistakes - Additional **LLM/VLM** models available for captioning and prompt enhance - Number of workflow and general quality-of-life improvements, especially around **Styles**, **Detailer**, **Preview**, **Batch**, **Control** - Compute improvements - [Wiki](https://github.com/vladmandic/automatic/wiki) & [Docs](https://vladmandic.github.io/sdnext-docs/) updates, especially new end-to-end [Parameters](https://vladmandic.github.io/sdnext-docs/Parameters/) page In this release we finally break with legacy with the removal of the original [A1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui/) codebase which has not been maintained for a while now This plus major cleanup of codebase and external dependencies resulted in ~55k LoC (*lines-of-code*) reduction and spread over [~750 files](https://github.com/vladmandic/sdnext/pull/4017) in ~200 commits! We also switched project license to [Apache-2.0](https://github.com/vladmandic/sdnext/blob/dev/LICENSE.txt) which means that SD.Next is now fully compatible with commercial and non-commercial use and redistribution regardless of modifications! And (*as always*) many bugfixes and improvements to existing features! For details, see [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) > [!NOTE] > We recommend clean install for this release due to sheer size of changes > Although upgrades and existing installations are tested and should work fine! ![Screenshot](https://github.com/user-attachments/assets/6f156154-0b0a-4be2-94f0-979e9f679501) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) ### Details for 2025-07-29 - **License** - SD.Next [license](https://github.com/vladmandic/sdnext/blob/dev/LICENSE.txt) switched from **aGPL-v3.0** to **Apache-v2.0** this means that SD.Next is now fully compatible with commercial and non-commercial use and redistribution regardless of modifications! - **Models** - [WanAI Wan 2.2](https://github.com/Wan-Video/Wan2.2) both 5B and A14B variants, for both T2V and I2V support go to: *video -> generic -> wan -> pick variant* optimized support with *VACE*, etc. will follow soon *caution* Wan2.2 on its own is ~68GB, but also includes optional second-stage for later low-noise processing which is absolutely massive at additional ~54GB you can enable second stage processing in *settings -> model options*, its disabled by default *note*: quantization and offloading are highly recommended regardless of first-stage only or both stages! - [WanAI Wan](https://wan.video/) T2V models for T2I workflows Wan is originally designed for *video* workflows, but now also be used for *text-to-image* workflows! supports *Wan-2.1 in 1.3B* and 14B variants and *Wan-2.2 in 5B and A14B* variants supports all standard features such as quantization, offloading, TAESD preview generation, LoRA support etc. can also load unet/transformer fine-tunes in safetensors format using UNET loader simply select in *networks -> models -> reference* *note* 1.3B model is a bit too small for good results and 14B is very large at 78GB even without second-stage so aggressive quantization and offloading are recommended - [FreePik F-Lite](https://huggingface.co/Freepik/F-Lite) in *7B, 10B and Texture* variants F-Lite is a 7B/10B model trained exclusively on copyright-safe and SFW content, trained on internal dataset comprising approximately 80 million copyright-safe images available via *networks -> models -> reference* - [Bria 3.2](https://huggingface.co/briaai/BRIA-3.2) Bria is a smaller 4B parameter model built entirely on licensed data and safe for commercial use *note*: this is a gated model, you need to [accept terms](https://huggingface.co/briaai/BRIA-3.2) and set your [huggingface token](https://vladmandic.github.io/sdnext-docs/Gated/) available via *networks -> models -> reference* - [bigASP 2.5](https://civitai.com/models/1789765) bigASP is an experimental SDXL finetune using Flow matching method load as usual, and leave sampler set to *Default* or you can use following samplers: *UniPC, DPM, DEIS, SA* required sampler settings: *prediction-method=flow-prediction*, *sigma-method=flowmatch* recommended sampler settings: *flow-shift=1.0* - [LBM: Latent Bridge Matching](https://github.com/gojasper/LBM) very fast automatic image background replacement methods with relightning! *simple*: automatic background replacement using [BiRefNet](https://github.com/ZhengPeng7/BiRefNet) *relighting*: automatic background replacement with reglighting so source image fits desired background with optional composite blending available in *img2img or control -> scripts* - add **FLUX.1-Kontext-Dev** inpaint workflow - add **FLUX.1-Kontext-Dev** **Nunchaku** support *note*: FLUX.1 Kontext is about 2-3x faster with Nunchaku vs standard execution! - support **FLUX.1** all-in-one safetensors - support for [Google Gemma 3n](https://huggingface.co/google/gemma-3n-E4B-it) E2B and E4B LLM/VLM models available in **prompt enhance** and process **captioning** - support for [HuggingFace SmolLM3](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) 3B LLM model available in **prompt enhance** - add [fal AuraFlow 0.2](https://huggingface.co/fal/AuraFlow-v0.2) in addition to existing [fal AuraFlow 0.3](https://huggingface.co/fal/AuraFlow-v0.3) due to large differences in model behavior available via *networks -> models -> reference* - add integrated [NudeNet](https://vladmandic.github.io/sdnext-docs/NudeNet) as built-in functionality *note*: used to be available as a separate [extension](https://github.com/vladmandic/sd-extension-nudenet) - **Video** - redesigned **Video** interface - support for **Generic** video models includes support for many video models without specific per-model optimizations included: *Hunyuan, LTX, WAN, Mochi, Latte, Allegro, Cog* supports quantization, offloading, frame interpolation, etc. - support for optimized [FramePack](https://vladmandic.github.io/sdnext-docs/FramePack) with *t2i, i2i, flf2v* workflows LoRA support, prompt enhance, etc. now fully integrated instead of being a separate extension - support for optmized [LTXVideo](https://vladmandic.github.io/sdnext-docs/LTX) with *t2i, i2i, v2v* workflows optional native upsampling and video refine workflows LoRA support with different conditioning types such as Canny/Depth/Pose, etc. - support for post load quantization - **UI** - major update to modernui layout - add new Windows-like *Blocks* UI theme - redesign of the *Flat* UI theme - enhanced look&feel for *Gallery* tab with better search and collapsible sections, thanks to @CalamitousFelicitousness - **WIKI** - new [Parameters](https://vladmandic.github.io/sdnext-docs/Parameters/) page that lists and explains all generation parameters massive thanks to @CalamitousFelicitousness for bringing this to life! - updated *Models, Video, LTX, FramePack, Styles*, etc. - **Compute** - support for [SageAttention2++](https://github.com/thu-ml/SageAttention) provides 10-15% performance improvement over default SDPA for transformer-based models! enable in *settings -> compute settings -> sdp options* *note*: SD.Next will use either SageAttention v1/v2/v2++, depending which one is installed until authors provide pre-build wheels for v2++, you need to install it manually or SD.Next will auto-install v1 - support for `torch.compile` for LLM: captioning/prompt-enhannce - support for `torch.compile` with repeated-blocks reduces time-to-compile 5x without loss of performance! enable in *settings -> model compile -> repeated* *note*: torch.compile is not compatible with balanced offload - **Other** - **Styles** can now include both generation params and server settings see [Styles docs](https://vladmandic.github.io/sdnext-docs/Styles/) for details - **TAESD** is now default preview type since its the only one that supports most new models - support **TAESD** preview and remote VAE for **HunyuanDit** - support **TAESD** preview and remote VAE for **AuraFlow** - support **TAESD** preview for **WanAI** - SD.Next now starts with *locked* state preventing model loading until startup is complete - warn when modifying legacy settings that are no longer supported, but available for compatibilty - warn on incompatible sampler and automatically restore default sampler - **XYZ grid** can now work with control tab: if controlnet or processor are selected in xyz grid, they will overwrite settings from first unit in control tab, when using controlnet/processor selected in xyz grid, behavior is forced as control-only also freely selectable are control strength, start and end values - **Batch** warn on unprocessable images and skip operations on errors so that other images can still be processed - **Metadata** improved parsing and detect foreign metadata detect ComfyUI images detect InvokeAI images - **Detailer** add `expert` mode where list of detailer models can be converted to textbox for manual editing see [docs](https://vladmandic.github.io/sdnext-docs/Detailer/) for more information - **Detailer** add option to merge multiple results from each detailer model for example, hands model can result in two hands each being processed separately or both hands can be merged into one composite job - **Control** auto-update width/height on image upload - **Control** auto-determine image save path depending on operations performed - autodetect **V-prediction** models and override default sampler prediction type as needed - **SDNQ** - use inference context during quantization - use static compile - rename quantization type for text encoders `default` option to `Same as model` - **API** - add `/sdapi/v1/lock-checkpoint` endpoint that can be used to lock/unlock model changes if model is locked, it cannot be changed using normal load or unload methods - **Fixes** - allow theme type `None` to be set in config - installer dont cache installed state - fix Cosmos-Predict2 retrying TAESD download - better handle startup import errors - fix traceback width preventing copy&paste - fix ansi controle output from scripts/extensions - fix diffusers models non-unique hash - fix loading of manually downloaded diffuser models - fix api `/sdapi/v1/embeddings` endpoint - fix incorrect reporting of deleted and modified files - fix SD3.x loader and TAESD preview - fix xyz with control enabled - fix control order of image save operations - fix control batch-input processing - fix modules merge save model - fix torchvision bicubic upsample with ipex - fix instantir pipeline - fix prompt encoding if prompts within batch have different segment counts - fix detailer min/max size - fix loopback script - fix networks tags display - fix yolo refresh models - cleanup control infotext - allow upscaling with models that have implicit VAE processing - framepack improve offloading - improve prompt parser tokenizer loader - improve scripts error handling - improve infotext param parsing - improve extensions ui search - improve model type autodetection - improve model auth check for hf repos - improve Chroma prompt padding as per recommendations - lock directml torch to `torch-directml==0.2.4.dev240913` - lock directml transformers to `transformers==4.52.4` - improve install of `sentencepiece` tokenizer - add int8 matmul fallback for ipex with onednn qlinear - **Refactoring** *note*: none of the removals result in loss-of-functionality since all those features are already re-implemented goal here is to remove legacy code, code duplication and reduce code complexity - obsolete **original backend** - remove majority of legacy **a1111** codebase - remove legacy ldm codebase: `/repositories/ldm` - remove legacy blip codebase: `/repositories/blip` - remove legacy codeformer codebase: `/repositories/codeformer` - remove legacy clip patch model: `/models/karlo` - remove legacy model configs: `/configs/*.yaml` - remove legacy submodule: `/modules/k-diffusion` - remove legacy hypernetworks support: `/modules/hypernetworks` - remove legacy lora support: `/extensions-builtin/Lora` - remove legacy clip/blip interrogate module - remove modern-ui remove `only-original` vs `only-diffusers` code paths - refactor control processing and separate preprocessing and image save ops - refactor modernui layouts to rely on accordions more than individual controls - refactore pipeline apply/unapply optional components & features - split monolithic `shared.py` - cleanup `/modules`: move pipeline loaders to `/pipelines` root - cleanup `/modules`: move code folders used by pipelines to `/pipelines/` folder - cleanup `/modules`: move code folders used by scripts to `/scripts/