List of SD innovations (hopefully implemented in A1111) #2940

Ehplodor · 2022-10-17T09:35:50Z

Ehplodor
Oct 17, 2022

Precious newsfeed : https://rentry.org/sdupdates3

img2img

"basic" img2img "using a gaussian-diffusion denoising mechanism as first proposed by SDEdit"
- uses a forward pass gaussian noise nice property to "1-jump" directly to full gaussian-noise-encoded latents, then decode gradually into the "new" image (see here for short summary)
img2img Variations
- A bit of history : Variations are not working properly #305
- other implementations :
  - https://github.com/justinpinkney/stable-diffusion
img2img alternative script - identical to bare img2img but using other noise-encoding samplers that CAN'T 1-jump so multi-step encoding is nesserary.
- personal short study
- Why multi-step noise encoding ? answer
- Example of inpainting vs img2img alt script (and original)
- A bit of history :
  - https://www.reddit.com/r/StableDiffusion/comments/xapbn8/comment/inv5cdg/ (so this was linked to prompt2prompt initially at least (see history of p2p)
  - https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
  - [Feature Request] A better (?) way of doing img2img by finding the noise which reconstructs the original image #288
  - amazing modification to k_euler resynthesizing original image in img2img! #291
  - alternative img2img with DDIM sampler is broken #561 (report of broken DDIM denoising after Euler inverse sampling)
  - negative encode CFG ? link1, link2
  - normalization (divide by sigmas)
  - some difficulties reported here
- inverse samplers - to find latent noise from image
  - inverse K-Euler (all links above reference this "inverted" euler sampling)
  - inverse DDIM : "NEW"(commit in bloc97 repo from 13 Oct. 2022) - NOT implemented YET in A1111 ([Feature Request]: new inverse DDIM / DPM fast / DPM adaptative samplers available for img2img alternative #4213)
  - reverse DPM-fast / DPM-adaptative - NOT implemented yet in A1111 : crowsonkb/k-diffusion@8413eb2
  - bloc97 implementation (based on inverse DDIM -> see above)
    - @bloc97 made a link to Google's Imagic that combines (if I understood well this comment) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
- Pivotal inversion x null text optimization :

Prompt manipulation (i.e. prompt-to-prompt but NO Cross Attention Control (see unimplemented section for further links on C.A.C.)

Prompt editing
- syntax : [from:to:when]
- A bit of history :
Alternating words
- syntax : [cow|horse] in a field
Composable diffusion
- Multi cond guidance !WIP! [WIP] Implement multi-cond guidance for Composable Diffusion #1695
  - c26732f (TY @ClashSAN) - "AND" documentation
  - Multi-Cond Guidance integration #1325
  - originating from "sequential token weighting"
- Negation has been implemented early in A1111 in the form of negative prompt and should be treated as an integral part of composable diffusion according to https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch news from 10/10/2022
  - [Bug]: Negation Prompts -- AND NOT operator #3747
Google's Prompt-to-prompt with Cross Attention Control
- Google Prompt to Prompt #2884
- Promt-to-Prompt implementation #2863
- Prompt-to-Prompt Image Editing with Cross Attention Control #2725
- Implemented in
  - Diffusers : https://github.com/bloc97/CrossAttentionControl
  - SD : https://github.com/sunwoo76/CrossAttentionControl-stablediffusion
- Other issues / discussions about CAC in A1111
- Optimization available in https://github.com/cccntu/efficient-prompt-to-prompt (TY @matrix4767)
- EDIT : actually, This may be equivalent to A1111's PROMPT EDITING main feature (PR [Feature] Implementation of prompt2prompt by Doggettx #483) BUT it should be verified to allow for the same degree of manipulations as that of google ??? IHDK
Instruct-pix-to-pix
- https://github.com/timothybrooks/instruct-pix2pix
- implemented in webui

Attention manipulation

Attention emphasis
- old syntax : () [] -> how to use
- new syntax : (word:1.05) (word:0.975)
Cross Attention Control : NOT implemented yet (see relevant section down below)
Attend & Excite (not implemented as of 13/02/2023)
Self-Attention Guidance (SAG) - better looking images overall -> Feature request

Fine-tuning methods

Can be combined to enhance results :

Dreambooth * Aesthetic Gradient : Using Aesthetic Images Embeddings to improve Dreambooth or TI results #3350
HyperNetwork * Aesthetic Gradient : Hypernetwork Style Training, a tiny guide #2670
Hypernetwork * TI : interesting question
Textual Inversion (word embedding optimization) - style/object/person integration
- https://github.com/rinongal/textual_inversion
- Useful discussion : Textual Inversion #1528
- DreamArtist extension (COntrastive prompt tuning - formerly ADVANCED PROMPT TUNING) for Textual Inversion embedding training (down to only 1 image training TI) :
  - propose an Contrastive Prompt Tuning method (DreamArtist), can super dramatically improve the image quality and diversity #2945
  - https://github.com/7eu7d7/DreamArtist-sd-webui-extension
Faster Textual Inversion via specific "TI-model" : SD Leap Booster
Hard Prompts made easy :
Text prompt inversion from 1-n images
- https://github.com/YuxinWenRick/hard-prompts-made-easy
- https://huggingface.co/spaces/tomg-group-umd/pez-dispenser
- A1111 extension: Unprompted : https://github.com/ThereforeGames/unprompted
Dreambooth - whole model fine tuning
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion
- https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
- PRs :
  - Dreambooth #2002
  - replaces PR#2002 : Dreambooth: Ready to go! #3995 (10GB VRAM OR {8GB VRAM + 24GB RAM})
- https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth (9.92 VRAM usage)
Aesthetics gradient - style refining
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
- can you support aesthetic-gradients? trains super fast and is good for styles #2388
- PRs
  - Implementation of Stable Diffusion with Aesthetic Gradients #2585 (ongoing WIP)
  - Initial Commit for POC for Aesthetic Gradients #2498 (closed)
- Added in commit 7d6b388
HyperNetworks (NAI-version) - (mostly) style transfert (?)
- Emulate NovelAI #2017
- Hypernetwork Style Training, a tiny guide #2670
- Hypernetwork training #2284
- Stacking HNs script : Multiple Hypernetworks Script #4752 (TY @antis0007)
multi-concepts partial fine-tuning (75MB) (Adobe Research)
- https://github.com/adobe-research/custom-diffusion
Low Rank Adaptation (LoRA) : dreambooth-like results but <5MB
- https://huggingface.co/blog/lora
- https://github.com/cloneofsimo/lora
Control Nets : Conditionning on anything through fine-tuned "helper" models
- (Updated 02/17) New research: ControlNet - Adding Conditional Control to Text-to-Image Diffusion Models #7732
- Impemented in the "unprompted" extension : ControlNet now available in the WebUI! #7784
- implemented too in : https://github.com/Mikubill/sd-webui-controlnet
- control nets being fine tuned from original model, only "work" for this very specific model (i.e. SD 1.5 for C.Nets available on hugginface) (and possibly close variants) however it is still possible to use it with further variants of SD 1.5 following a procedure described in : [Experiment] Transfer Control to Other SD1.X Models lllyasviel/ControlNet#12
T2I-Adapter : "simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models"
- https://github.com/TencentARC/T2I-Adapter
Whole model fine-tuning :
- https://github.com/justinpinkney/stable-diffusion
  - Example : https://huggingface.co/spaces/lambdalabs/text-to-pokemon
  - How-to : https://github.com/LambdaLabsML/examples
Tuning encoder : single image fine tuning : https://tuning-encoder.github.io/
- status : code not released
- hype level : very high expectations, judging from the numerous examples and comparisons provided.
Pre/Post-processors
- VAE Variational Auto Encoder
  - VAE selector commit 675b51e
  - VAE selector PR with study VAE Selector #3986
  - VAE "mse" tailored for SD v1.5 (some comparison with/without VAE) https://huggingface.co/stabilityai/sd-vae-ft-mse-original
  - VAE "mse" vs GFPGAN : VAE (vae-ft-mse-840000-ema-pruned.ckpt) Documentation invoke-ai/InvokeAI#1279
- face restoration
  - GFPGAN "neural network that fixes faces"
  - CodeFormer "face restoration tool as an alternative to GFPGAN"
  - DFDNET (not Impl.): [Feature Request]: DFDNet #4317

Not implemented - to my knowledge

"Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC "
- https://energy-based-model.github.io/reduce-reuse-recycle/
- https://twitter.com/_akhaliq/status/1628572876092604416?s=20
Cycle diffusion
- Cycle Diffusion. img to img composition in SD #2792
- reference article
IMAGIC - "complex (e.g., non-rigid) text-guided semantic edits to a single real image" : combines (if I understood well this How to make image inversion more precise? bloc97/CrossAttentionControl#20 (comment)) latents inversion with the inversion of prompt embeddings and a specific model "fine-tuned on the inverted embeddings" that help reconstruct the image better...
- first reported in A1111's discussions here : Now this is exciting! Text based real image editing paper! #3070
- seems quite similar to Cycle Diffusion ? img2img alternative too
- SD implementation (>24 GB VRAM) :
  - justinpinkney/stable-diffusion@abdbb0d
  - https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb
- Diffusers implementation (>11 GB VRAM) : (I honestly don't know the difference between diffusers and stable-diffusion) ShivamShrirao/diffusers@d34dcaa
Diffusion CLIP
- Diffusion clip #2485
- Implementation of DiffusionCLIP as an extension #3770
Training-Free Structured Diffusion Guidance (TFSDG)
- implemented in https://github.com/shunk031/training-free-structured-diffusion-guidance
- related tweet
- Implement Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis #1417
- OpenReview - very interesting reading : https://openreview.net/forum?id=PUIqjT4rzq7
LPIPS guidance (Learned Perceptual Image Patch Similarity)
- implemented in https://github.com/aicrumb/doohickey
- Probably useful only for img2img functionalities
StyleCLIP - based on GAN but 3 methods can be drawn from it (some may already be present in A1111, IDK) :
- "Latent vector optimization"
- "Latent mapper, trained to manipulate latent vectors according to a specific text description"
- "Global directions in the StyleSpace"
Image segmentation for inpainting
- Apply Image segmentation to mask a specific area for inpainting alternate to manual masking. #3222
faster incremental inpainting : "get 3x to 7.5x faster inpainting with this one weird trick" #4266
patch-batch init mode for outpainting / inpainting : [Feature Request]: PatchMatch init mode for inpainting / outpainting #4681
fourier-shaped noise INpainting (similar to mk2 outpainting but for inpainting) : [Feature Request]: fourier-shaped noise IN-painting ? (mk2 inpainting) #4739

Who knows ?

Depth-map and transparent background
- monocular depth map estimation is used in scn2img script in Scene-to-Image Prompt Layering System Sygil-Dev/sygil-webui#1179 for layered image generation
- img2img - Depth/segmentation guidance similar to color correction or clip-guidance #1757
- Transparent images #2364
- Image segmentation and background removal - further links and ideas : Transparent images #2364
- deforum's 3D animation uses monocular estimated depthmap for diverse 3D operations
  - A1111 script : https://github.com/deforum-art/deforum-for-automatic1111-webui
- 2D-to-3D workflow : https://www.youtube.com/watch?v=JtRDVmqqn7c
DiffEdit: Diffusion-based semantic image editing with mask guidance
- examples : https://twitter.com/bigblueboo/status/1585761916718383110
Pose transfer
- https://github.com/yoyo-nb/Thin-Plate-Spline-Motion-Model
- [Feature Request]: img2img pose transfer #4219
Hand-fixer ? Brainstorming: ideas on how to better control subjects and contexts #3615 (comment)
Crazy ideas ?
- Alternating models :
  - [Feature Request]: give a way to switch a model mid-prompt #4468
  - Actually that's possible ! Multi-UNet Cond. Guidance by @brkirch : https://twitter.com/Birchlabs/status/1599208472481972225?s=20

Best (subjective) Competing models

OpenAI
- Dall-E 2 : https://openai.com/dall-e-2/
NVIDIA
- DeepImagination https://deepimagination.cc/eDiffi/
Midjourney (MJ)
- https://www.midjourney.com/
- current version = V4 ; generation via discord
Google's unreleased text-2-image hype models :
- IMAGEN : https://imagen.research.google/
- PARTI : https://parti.research.google/
- Muse : https://muse-model.github.io/
BlueWillow (free)
- https://www.bluewillow.ai/
- current status : beta ; generation via discord
DeepFloyd (highly anticipated / mega hype) - a Stability AI team (Originating from ShonenkovAI, linked to RuDALLE-e)
- IF model : https://huggingface.co/DeepFloyd (yes it is empty...)
  - SOON...
SD models and fine-tunes / embeddings repositories
- CivitAI : https://civitai.com/
- HugginFace : https://huggingface.co/models?pipeline_tag=text-to-image

Feel free to add what is missing and / correct the list if necessary

Kind of related but not really :

text-to-3D :
- DreamFields :
- DreamFusion (Google AI) :
- Point-E (OpenAI) : https://github.com/openai/point-e
- Magic3D (NVIDIA) : https://research.nvidia.com/labs/dir/magic3d/
- 3DFuse "Let 2D Diffusion Model Know 3D-Consistency
  for Robust Text-to-3D Generation" : https://ku-cvlab.github.io/3DFuse/
text-to-4D (3D coherent video) :
- Make a Video 3D (Meta AI) : https://make-a-video3d.github.io/
text-to-video (2D animation) :
- Phenaki - long text-driven videos : https://phenaki.video/
- Imagen video (Google AI) : https://imagen.research.google/video/
- Make a video (Meta AI) : https://makeavideo.studio/

AI video inpainting :
- Video object removal : https://runwayml.com/inpainting/

text-driven video editing :
- Text2Live : https://text2live.github.io/
- FateZero : https://github.com/ChenyangQiQi/FateZero
- Tune a video : https://github.com/showlab/Tune-A-Video
Image-driven video editing : "paint on video"
- EbSynth
  - https://ebsynth.com/
  - https://github.com/jamriska/ebsynth
  - https://github.com/s9roll7/ebsynth_utility (A1111 extension integrating ebsynth)
text/image-driven video editing :
- Dreamix : https://dreamix-video-editing.github.io/

ClashSAN · 2022-10-17T09:51:15Z

ClashSAN
Oct 17, 2022
Collaborator

Nice list!
Composable diffusion is implemented, the AND feature only. c26732f

The alternate img2img script is a Reverse Euler method of modifying an image, similar to cross attention control.
Cycle diffusion, prompt-to-prompt and cross attention control look to be used very similarly.

3 replies

Ehplodor Oct 17, 2022
Author

I agree, all this stuff is serving the same purpose and could probably be used interchangeably to some degree. Also, I focused on SD-based methods or what could eventually be integrated into SD one way or another IMHO (e.g. StyleCLIP).

Idea : a competition of image manipulation using various tools, with the same starting image and same goal, to "see" what works best.

cmp-nct Oct 21, 2022

Nice list! Composable diffusion is implemented, the AND feature only. c26732f

The alternate img2img script is a Reverse Euler method of modifying an image, similar to cross attention control. Cycle diffusion, prompt-to-prompt and cross attention control look to be used very similarly.

Similar but less sophisticated, no ?
That script only generates noise that helps the AI reconstruct the original. But it does not help it reconstruct or replace the actual CLIP tags we want to change.
I've had some good success with it and complete failures.
The implementation of bloc seems to work better

Ehplodor Nov 4, 2022
Author

@cmp-nct do you mean the inverse DDIM implementation in block97 repository ? If so, I did not understood your comment well enough two weeks ago. Sorry. added info about a few hours ago.

matrix4767 · 2022-10-17T11:28:12Z

matrix4767
Oct 17, 2022

There is also a (supposedly) more efficient take on prompt-to-prompt.
https://github.com/cccntu/efficient-prompt-to-prompt

1 reply

Ehplodor Oct 17, 2022
Author

I edited the list to reflect this optimization. thank you

Kinomora · 2022-10-17T11:52:39Z

Kinomora
Oct 17, 2022

I just posted a discussion about Search Operators- it seems vaguely similar to "Composable diffusion" listed under Partly Implemented and might be covered by it but I'm not sure.

#2951

2 replies

Ehplodor Oct 17, 2022
Author

Thank you for your contribution. As far as i can understand, composable diffusion's AND operator is used for combining two concepts into one. This is not for generating two alternatives images. What you seem to wish for is already implemented : https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-matrix

Kinomora Oct 17, 2022

Thank you- yes, that is kind of what I want but I don't want a matrix of every variation/option- just a random chance for a particular tag/set of tags in the prompt to be applied.

edit: some linked custom scripts with advanced prompt matrix which seems to be closer what I'm looking for

x02Sylvie · 2022-10-17T13:34:17Z

x02Sylvie
Oct 17, 2022

Think it'd be possible to also have list for optimization/performance boost methods?

Two of unimplemented ones that i know:
#1625 - AITemplate interference
https://mobile.twitter.com/_akhaliq/status/1579268875006275586 - distillation

Implemented ones afaik:

xformers

2 replies

Ehplodor Oct 18, 2022
Author

Sure, that would be awesome in itself to have similar bird's eye view of performance boosts and other optimizations. Maybe better in a proper focused other discussion, though ? That's not really my focus but I'll be happy to contribute if I can.

Ehplodor Oct 19, 2022
Author

implemented : #576

Ehplodor · 2022-10-19T09:44:51Z

Ehplodor
Oct 19, 2022
Author

Added Imagic (similar to cycle diffusion)

0 replies

Ehplodor · 2022-10-19T09:51:41Z

Ehplodor
Oct 19, 2022
Author

Added RunwayML Inpainting

0 replies

matrix4767 · 2022-10-19T09:54:09Z

matrix4767
Oct 19, 2022

Wow, that StyleCLIP looks useful.

1 reply

Ehplodor Oct 19, 2022
Author

Yes ! the future of hairdressing

thetwo222 · 2022-10-19T10:40:25Z

thetwo222
Oct 19, 2022

hope they can come soon：）

0 replies

Ehplodor · 2022-10-19T13:17:17Z

Ehplodor
Oct 19, 2022
Author

added some history on specific subjects, because that is interesting to me :-)

0 replies

Ehplodor · 2022-10-20T07:44:48Z

Ehplodor
Oct 20, 2022
Author

Can't make sense anymore of what is prompt-to-prompt and what is cross-attention control :-/ It seems to be very much linked. Shall I put it all under the cross-attention control, or prompt2prompt ? Or still separate ?

0 replies

horribleCodes · 2022-10-20T11:15:09Z

horribleCodes
Oct 20, 2022

Should this include img2animation? If that's the case, here are some:
https://aliaksandrsiarohin.github.io/first-order-model-website/
https://shihmengli.github.io/3D-Photo-Inpainting/
https://eulerian.cs.washington.edu/
https://github.com/yoyo-nb/Thin-Plate-Spline-Motion-Model

4 replies

Ehplodor Oct 20, 2022
Author

Thank you. If you ask me, I think that img2animation / img2vid would deserve its own thread, mainly because, to be totally transparent, I do not use these features at all so i am not well informed, hence of keeping such a list up to date is out of reach for me at the moment. Of course, would be happy to contribute if I stumble on a new related feature that would not have been reported before !

Ehplodor Oct 20, 2022
Author

I updated the title to make it clear that this list is focused on img2img, thank you.

Ehplodor Nov 3, 2022
Author

I just referenced thin plate spline motion model in the context of potential pose transfer

Ehplodor Nov 4, 2022
Author

@horribleCodes I will begin add new sections shortly, refactoring the entire list. So txt2video list soon, because I feel the rate of img2img innovation is slowing down a little. It will probably begin with your list. If, in the meantime, you came across new related innovations, feel free to add it below and I'll add it up to the list. TY again.

Ehplodor · 2022-10-20T22:41:36Z

Ehplodor
Oct 20, 2022
Author

Added "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"

0 replies

Ehplodor · 2022-10-20T23:02:37Z

Ehplodor
Oct 20, 2022
Author

textual inversion embedding advanced prompt tuning

0 replies

thetwo222 · 2022-10-21T05:42:06Z

thetwo222
Oct 21, 2022

I saw this at the adobe max conference. Is there a similar technology now?

2 replies

Ehplodor Oct 21, 2022
Author

The "stuff trick" presented by @EtherealIntellect in #3232 could represent a first step towards that goal IMHO

Ehplodor Oct 21, 2022
Author

However, look at the man's feet, they are not in the same place from one picture to the other. There must be some kind of automatic cropping then translate/rotate/scale then a kind of img2img pass, with low strength, to bring all this in coherent colours, then reintegration of the cropped now coherently coloured "things" (moon, tent, man) back into the original photo ?

matrix4767 · 2022-10-21T08:04:44Z

matrix4767
Oct 21, 2022

Hopefully either better img2img is added or the alternative test is updated. Since the inpainting runway update, I get the "TypeError: 'NoneType' object is not subscriptable" error, and the fact that it doesn't support the unlimited tokens.

0 replies

Ehplodor · 2023-02-17T09:27:25Z

Ehplodor
Feb 17, 2023
Author

Text2image adapter (fine-tune section) - based on SD - current status: not implemented in a1111

0 replies

ghost · 2023-02-18T15:53:47Z

ghost
Feb 18, 2023

Latent Blending, technique that achieves smooth interpolation between generated images.

0 replies

fractal-fumbler · 2023-02-18T21:48:29Z

fractal-fumbler
Feb 18, 2023

extension for [Dynamic thresholding] -> https://github.com/mcmonkeyprojects/sd-dynamic-thresholding

0 replies

Ehplodor · 2023-02-22T22:07:31Z

Ehplodor
Feb 22, 2023
Author

Hard prompts made easy (fine tuning text prompt)

1 reply

ghost Feb 22, 2023

It is in the "Unprompted" extension. just put in [img2pez] as prompt in img2img after installing the extension and it will generate

Ehplodor · 2023-02-23T08:22:31Z

Ehplodor
Feb 23, 2023
Author

SD leap booster (faster TI)

0 replies

Ehplodor · 2023-02-23T10:43:13Z

Ehplodor
Feb 23, 2023
Author

various additions about Justin Pinkney's work

0 replies

Ehplodor · 2023-02-23T11:23:29Z

Ehplodor
Feb 23, 2023
Author

added some new models in "competing" section

0 replies

Ehplodor · 2023-02-23T14:01:46Z

Ehplodor
Feb 23, 2023
Author

added info about Novel AI two newest samplers : smea and smea-dyn

0 replies

Ehplodor · 2023-02-23T15:27:38Z

Ehplodor
Feb 23, 2023
Author

2023 research (MIT x Google Deepmind x Google Brain x INRIA) on diffusion models : Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

0 replies

Ehplodor · 2023-02-23T15:37:27Z

Ehplodor
Feb 23, 2023
Author

multi-unet cond guidance (in "crazy ideas - alternating models" section for the time being)

1 reply

brkirch Feb 23, 2023
Collaborator

Actually that's possible ! Multi-UNet Cond. Guidance by @brkirch : https://twitter.com/Birchlabs/status/1599208472481972225?s=20

That’s not me. That is Birch-san.

Lute-4 · 2023-02-23T18:09:54Z

Lute-4
Feb 23, 2023

DIS V2.0 to remove background, not released for now it seems:
https://github.com/xuebinqin/DIS

1 reply

AugmentedRealityCat Feb 23, 2023

This DIS looks so much better than anything else for masking objects precisely. It looks as good, if not better, than Photoshop's own Select Subject.

Ehplodor · 2023-02-24T08:28:22Z

Ehplodor
Feb 24, 2023
Author

image guidance incoming : #8064

2 replies

ghost Feb 24, 2023

What will this do to the output?

Ehplodor Feb 25, 2023
Author

This is a step towards accepting images as input to clip embedding, together with text. There are multiple ways to do so. This is equivalent to using an image I stead of text. That helps producing variations on a theme or an object that can't be easily described by words.

Ehplodor · 2023-02-24T22:30:45Z

Ehplodor
Feb 24, 2023
Author

Tuning encoder - single image fast fine tune

2 replies

papuSpartan Feb 25, 2023

links?

Lute-4 Feb 25, 2023

links?

https://tuning-encoder.github.io/

ghost · 2023-03-17T13:18:45Z

ghost
Mar 17, 2023

New way of mixing concepts with per-layer prompting (no code implementation yet)

0 replies

djo267 · 2023-04-29T23:10:43Z

djo267
Apr 29, 2023

DeepFloyd IF is now out. Would love to see this as a plugin. https://deepfloyd.ai/deepfloyd-if

0 replies

List of SD innovations (hopefully implemented in A1111) #2940

Uh oh!

Uh oh!

img2img

Prompt manipulation (i.e. prompt-to-prompt but NO Cross Attention Control (see unimplemented section for further links on C.A.C.)

Attention manipulation

Latents manipulation

CFG manipulation

Artistic img2img

Partly implemented

WIP

Fine-tuning methods

Not implemented - to my knowledge

Who knows ?

Best (subjective) Competing models

Replies: 71 comments · 58 replies

Uh oh!

ClashSAN Oct 17, 2022 Collaborator

Uh oh!

Ehplodor Oct 17, 2022 Author

Uh oh!

Uh oh!

Ehplodor Nov 4, 2022 Author

Uh oh!

Uh oh!

Ehplodor Oct 17, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ehplodor Oct 17, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ehplodor Oct 18, 2022 Author

Uh oh!

Ehplodor Oct 19, 2022 Author

Uh oh!

Ehplodor Oct 19, 2022 Author

Uh oh!

Ehplodor Oct 19, 2022 Author

Uh oh!

Uh oh!

Ehplodor Oct 19, 2022 Author

Uh oh!

Uh oh!

Ehplodor Oct 19, 2022 Author

Uh oh!

Ehplodor Oct 20, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ehplodor Oct 20, 2022 Author

Uh oh!

Ehplodor Oct 20, 2022 Author

Uh oh!

Ehplodor Nov 3, 2022 Author

Uh oh!

Ehplodor Nov 4, 2022 Author

Uh oh!

Ehplodor Oct 20, 2022 Author

Replies: 71 comments 58 replies

ClashSAN
Oct 17, 2022
Collaborator

Ehplodor Oct 17, 2022
Author

Ehplodor Nov 4, 2022
Author

Ehplodor Oct 17, 2022
Author

Ehplodor Oct 17, 2022
Author

Ehplodor Oct 18, 2022
Author

Ehplodor Oct 19, 2022
Author

Ehplodor
Oct 19, 2022
Author

Ehplodor
Oct 19, 2022
Author

Ehplodor Oct 19, 2022
Author

Ehplodor
Oct 19, 2022
Author

Ehplodor
Oct 20, 2022
Author

Ehplodor Oct 20, 2022
Author

Ehplodor Oct 20, 2022
Author

Ehplodor Nov 3, 2022
Author

Ehplodor Nov 4, 2022
Author

Ehplodor
Oct 20, 2022
Author