Multi-resolution dataset for SD1/SDXL#2269
Conversation
|
Thank you, this is great! Sorry, despite what the documentation says, I think SD/SDXL doesn't handle caching correctly when the image directory is the same even if the datasets are different currently. For existing caches, it would be a good idea to prepare a migration script. Alternatively, as a temporary solution, falling back to key names without resolution suffixes might be one idea. I'll review and merge this soon, probably tomorrow. |
|
Falling back to key names without resolution suffixes is not always safe. For example, if a user uses multiple resolutions 768, 1024, 1280 without re-caching the latents after this PR, and we fallback to the old keys, then all 3 datasets will load the same latents. Currently I do not do any fallback, so when the user starts a training after this PR, all latents will be cached again. The only downside is that the old latents are still saved in the same npz files. If the user is out of disk space, they can just delete the old npz files and cache the latents again. I guess those people who already cached TBs of latents should know how to write the script and migrate it... |
|
Hmm, that certainly could be a problem... It might be one idea to set a guard for fallback. If the shape of the previously saved latent to fallback to is different from the resolution, raise an error. I think this should prevent unintended fallbacks. |
|
If we check the array shape using It's possible to only read the metadata but we need some private API of numpy. Do you think we should implement this? (BTW, it's easy to read metadata in safetensors) |
|
Thank you, I didn't realize that fallbacks would also need to be considered when checking the cache. It might be a good idea to release this PR at the same time as the safetensors format cache feature and provide a script for migrating the cache (adding the resolution suffix and converting to safetensors). |
I think multi-resolution training is something we should encourage people to do more. I'm still using SDXL as a lightweight model when I need to upscale images to 4K.
In sd-scripts, multi-resolution dataset is already documented in
sd-scripts/docs/config_README-en.md
Line 244 in 48d368f
where we can create multiple datasets with different resolutions and the same
image_dir. It's already enabled for all newer models (Anima, Flux, Hunyuan, Lumina, SD3), but not for SD1/SDXL. This PR enables it.However, this is a breaking change for people who already cached a lot of images. They may use a script to migrate the cache.