Skip to content

Multi-resolution dataset for SD1/SDXL#2269

Open
woct0rdho wants to merge 1 commit intokohya-ss:mainfrom
woct0rdho:multi-reso-sd
Open

Multi-resolution dataset for SD1/SDXL#2269
woct0rdho wants to merge 1 commit intokohya-ss:mainfrom
woct0rdho:multi-reso-sd

Conversation

@woct0rdho
Copy link
Contributor

I think multi-resolution training is something we should encourage people to do more. I'm still using SDXL as a lightweight model when I need to upscale images to 4K.

In sd-scripts, multi-resolution dataset is already documented in

[[datasets]]

where we can create multiple datasets with different resolutions and the same image_dir. It's already enabled for all newer models (Anima, Flux, Hunyuan, Lumina, SD3), but not for SD1/SDXL. This PR enables it.

However, this is a breaking change for people who already cached a lot of images. They may use a script to migrate the cache.

@kohya-ss
Copy link
Owner

Thank you, this is great!

Sorry, despite what the documentation says, I think SD/SDXL doesn't handle caching correctly when the image directory is the same even if the datasets are different currently.

For existing caches, it would be a good idea to prepare a migration script. Alternatively, as a temporary solution, falling back to key names without resolution suffixes might be one idea.

I'll review and merge this soon, probably tomorrow.

@woct0rdho
Copy link
Contributor Author

woct0rdho commented Feb 17, 2026

Falling back to key names without resolution suffixes is not always safe. For example, if a user uses multiple resolutions 768, 1024, 1280 without re-caching the latents after this PR, and we fallback to the old keys, then all 3 datasets will load the same latents.

Currently I do not do any fallback, so when the user starts a training after this PR, all latents will be cached again. The only downside is that the old latents are still saved in the same npz files. If the user is out of disk space, they can just delete the old npz files and cache the latents again.

I guess those people who already cached TBs of latents should know how to write the script and migrate it...

@kohya-ss
Copy link
Owner

Hmm, that certainly could be a problem...

It might be one idea to set a guard for fallback. If the shape of the previously saved latent to fallback to is different from the resolution, raise an error. I think this should prevent unintended fallbacks.

@woct0rdho
Copy link
Contributor Author

woct0rdho commented Feb 17, 2026

If we check the array shape using npz[key].shape, it will load the array data (rather than just the metadata) when checking the cache before training, which is fine for GBs of cache but not so fine for TBs of cache.

It's possible to only read the metadata but we need some private API of numpy. Do you think we should implement this? (BTW, it's easy to read metadata in safetensors)

@kohya-ss
Copy link
Owner

Thank you, I didn't realize that fallbacks would also need to be considered when checking the cache.

It might be a good idea to release this PR at the same time as the safetensors format cache feature and provide a script for migrating the cache (adding the resolution suffix and converting to safetensors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants