Replies: 71 comments 58 replies
-
Nice list! The alternate img2img script is a Reverse Euler method of modifying an image, similar to cross attention control. |
Beta Was this translation helpful? Give feedback.
-
There is also a (supposedly) more efficient take on prompt-to-prompt. |
Beta Was this translation helpful? Give feedback.
-
I just posted a discussion about Search Operators- it seems vaguely similar to "Composable diffusion" listed under Partly Implemented and might be covered by it but I'm not sure. |
Beta Was this translation helpful? Give feedback.
-
Think it'd be possible to also have list for optimization/performance boost methods? Two of unimplemented ones that i know: Implemented ones afaik:
|
Beta Was this translation helpful? Give feedback.
-
Added Imagic (similar to cycle diffusion) |
Beta Was this translation helpful? Give feedback.
-
Added RunwayML Inpainting |
Beta Was this translation helpful? Give feedback.
-
Wow, that StyleCLIP looks useful. |
Beta Was this translation helpful? Give feedback.
-
hope they can come soon:) |
Beta Was this translation helpful? Give feedback.
-
added some history on specific subjects, because that is interesting to me :-) |
Beta Was this translation helpful? Give feedback.
-
Can't make sense anymore of what is prompt-to-prompt and what is cross-attention control :-/ It seems to be very much linked. Shall I put it all under the cross-attention control, or prompt2prompt ? Or still separate ? |
Beta Was this translation helpful? Give feedback.
-
Should this include img2animation? If that's the case, here are some: |
Beta Was this translation helpful? Give feedback.
-
Added "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis" |
Beta Was this translation helpful? Give feedback.
-
textual inversion embedding advanced prompt tuning |
Beta Was this translation helpful? Give feedback.
-
I saw this at the adobe max conference. Is there a similar technology now? |
Beta Was this translation helpful? Give feedback.
-
Hopefully either better img2img is added or the alternative test is updated. Since the inpainting runway update, I get the "TypeError: 'NoneType' object is not subscriptable" error, and the fact that it doesn't support the unlimited tokens. |
Beta Was this translation helpful? Give feedback.
-
Text2image adapter (fine-tune section) - based on SD - current status: not implemented in a1111 |
Beta Was this translation helpful? Give feedback.
-
Latent Blending, technique that achieves smooth interpolation between generated images. |
Beta Was this translation helpful? Give feedback.
-
extension for [Dynamic thresholding] -> https://github.com/mcmonkeyprojects/sd-dynamic-thresholding |
Beta Was this translation helpful? Give feedback.
-
Hard prompts made easy (fine tuning text prompt) |
Beta Was this translation helpful? Give feedback.
-
SD leap booster (faster TI) |
Beta Was this translation helpful? Give feedback.
-
various additions about Justin Pinkney's work |
Beta Was this translation helpful? Give feedback.
-
added some new models in "competing" section |
Beta Was this translation helpful? Give feedback.
-
added info about Novel AI two newest samplers : smea and smea-dyn |
Beta Was this translation helpful? Give feedback.
-
2023 research (MIT x Google Deepmind x Google Brain x INRIA) on diffusion models : Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC |
Beta Was this translation helpful? Give feedback.
-
multi-unet cond guidance (in "crazy ideas - alternating models" section for the time being) |
Beta Was this translation helpful? Give feedback.
-
DIS V2.0 to remove background, not released for now it seems: |
Beta Was this translation helpful? Give feedback.
-
image guidance incoming : #8064 |
Beta Was this translation helpful? Give feedback.
-
Tuning encoder - single image fast fine tune |
Beta Was this translation helpful? Give feedback.
-
New way of mixing concepts with per-layer prompting (no code implementation yet) |
Beta Was this translation helpful? Give feedback.
-
DeepFloyd IF is now out. Would love to see this as a plugin. https://deepfloyd.ai/deepfloyd-if |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Precious newsfeed : https://rentry.org/sdupdates3
img2img
"basic" img2img "using a gaussian-diffusion denoising mechanism as first proposed by SDEdit"
img2img Variations
img2img alternative script - identical to bare img2img but using other noise-encoding samplers that CAN'T 1-jump so multi-step encoding is nesserary.
Prompt manipulation (i.e. prompt-to-prompt but NO Cross Attention Control (see unimplemented section for further links on C.A.C.)
Prompt editing
Alternating words
Composable diffusion
Google's Prompt-to-prompt with Cross Attention Control
Instruct-pix-to-pix
Attention manipulation
Latents manipulation
Operations on latents, conditionings and sigmas mid-sampling by @dfaker - merged PR Add mid-kdiffusion cfgdenoiser script callback - access latents, conditionings and sigmas mid-sampling #4021
Latent upscaling
Scale Latent
for improved sharpness, details and color science. #2668Image blending / Latent Interpolation / Multi-image prompt :
Special section about Justin Pinkney's image variations model "fine-tuned from CompVis/stable-diffusion-v1-3-original to accept CLIP image embedding rather than text embeddings" (model card)[https://huggingface.co/lambdalabs/stable-diffusion-image-conditioned]
A1111 extension for image variation via finetuned model, similar to justin pinkney's Image Variation PoC, incoming ! Add cond and uncond hidden states to CFGDenoiserParams #8064
SD Remix using SD2.1 unCLIP : https://github.com/unishift/stable-diffusion-remix
CFG manipulation
Inpainting
Outpainting
Artistic img2img
Partly implemented
WIP
CLIP guidance
MagicMix: Semantic Mixing with Diffusion Models : https://magicmix.github.io/
Paint with words SD (similar to Nvidia's eDiffi functionality)
Dynamic thresholding - better images at high cfg
Update and rescale CFG denoising scale
Noise & Seed
Seed combination : [Feature Request]: Stacking or mixing variation seeds to refine a result #3745
Noise scaling : variable-scale noise and noise operations #2163
Other noise is possible and can be combined => an example with perlin noise :
Latent perturbation : [Feature Request]: Latent Perturbation #4164
Samplers
Schedulers
Fine-tuning methods
Can be combined to enhance results :
Dreambooth * Aesthetic Gradient : Using Aesthetic Images Embeddings to improve Dreambooth or TI results #3350
HyperNetwork * Aesthetic Gradient : Hypernetwork Style Training, a tiny guide #2670
Hypernetwork * TI : interesting question
Textual Inversion (word embedding optimization) - style/object/person integration
Faster Textual Inversion via specific "TI-model" : SD Leap Booster
Hard Prompts made easy :
Text prompt inversion from 1-n images
Dreambooth - whole model fine tuning
Aesthetics gradient - style refining
HyperNetworks (NAI-version) - (mostly) style transfert (?)
multi-concepts partial fine-tuning (75MB) (Adobe Research)
Low Rank Adaptation (LoRA) : dreambooth-like results but <5MB
Control Nets : Conditionning on anything through fine-tuned "helper" models
T2I-Adapter : "simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models"
Whole model fine-tuning :
Tuning encoder : single image fine tuning : https://tuning-encoder.github.io/
Pre/Post-processors
Not implemented - to my knowledge
"Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC "
Cycle diffusion
IMAGIC - "complex (e.g., non-rigid) text-guided semantic edits to a single real image" : combines (if I understood well this How to make image inversion more precise? bloc97/CrossAttentionControl#20 (comment)) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
Diffusion CLIP
Training-Free Structured Diffusion Guidance (TFSDG)
LPIPS guidance (Learned Perceptual Image Patch Similarity)
StyleCLIP - based on GAN but 3 methods can be drawn from it (some may already be present in A1111, IDK) :
Image segmentation for inpainting
faster incremental inpainting : "get 3x to 7.5x faster inpainting with this one weird trick" #4266
patch-batch init mode for outpainting / inpainting : [Feature Request]: PatchMatch init mode for inpainting / outpainting #4681
fourier-shaped noise INpainting (similar to mk2 outpainting but for inpainting) : [Feature Request]: fourier-shaped noise IN-painting ? (mk2 inpainting) #4739
Who knows ?
Depth-map and transparent background
DiffEdit: Diffusion-based semantic image editing with mask guidance
Pose transfer
Hand-fixer ? Brainstorming: ideas on how to better control subjects and contexts #3615 (comment)
Crazy ideas ?
Best (subjective) Competing models
OpenAI
NVIDIA
Midjourney (MJ)
Google's unreleased text-2-image hype models :
BlueWillow (free)
DeepFloyd (highly anticipated / mega hype) - a Stability AI team (Originating from ShonenkovAI, linked to RuDALLE-e)
SD models and fine-tunes / embeddings repositories
Feel free to add what is missing and / correct the list if necessary
Kind of related but not really :
text-to-3D :
DreamFields :
DreamFusion (Google AI) :
Point-E (OpenAI) : https://github.com/openai/point-e
Magic3D (NVIDIA) : https://research.nvidia.com/labs/dir/magic3d/
3DFuse "Let 2D Diffusion Model Know 3D-Consistency
for Robust Text-to-3D Generation" : https://ku-cvlab.github.io/3DFuse/
text-to-4D (3D coherent video) :
text-to-video (2D animation) :
AI video inpainting :
- Video object removal : https://runwayml.com/inpainting/
text-driven video editing :
Image-driven video editing : "paint on video"
text/image-driven video editing :
Beta Was this translation helpful? Give feedback.
All reactions