Promote custom script to first-class plugin system with a generator/postprocess pipeline workflow (Proposal) #2028

oxysoft · 2022-10-08T21:36:01Z

oxysoft
Oct 8, 2022

Plugin Proposal

With this project's buzzing community and very active team of collaborators, being always on top of the latest optimizations and new techniques to implement, it is quickly becoming the de-facto implementation for Stable Diffusion.

I am noticing that most colab notebooks being used currently can be ported to a custom script and harness the power of the webui, even notebooks like Deforum. Custom scripts are great because they allow sub communities to establish and contribute to the project in their own vacuums. This give me the idea, along with noticing that many features are currently unrelated to one another and use different models, that we can restructure the project around external plugins instead of a single bloated repository. Then we massively gain from the combined power of open source, the same power we've seen optimizing the shit out of SD in the first weeks following its open release.

We rename to stable-core, everyone's first stop to AI art, usable out of the box with official plugins, or even better with external UI implementations.

Development Fork

https://github.com/oxysoft/stable-diffusion-webui/

From stable-diffusion-webui to stable-core ...

We split up the project down to a very solid backend core, not even StableDiffusion specific. Plugins can be just some functions, technique, library, or they can load models, and implement a well defined job pipeline. (jobs like txt2img, img2img, img2txt, etc.)

Remove all existing tabs, move the settings page to a separate corner button with gear icon
Make the image output a canvas in the center global to all plugins, so far you can think of InvokeAI or DreamStudio's interface.
Put a job list on a left sidepanel, each jobs is an expandable headers with a button to run it, and they can be color coded.
Move the txt2img and img2img tabs onto a StableDiffusion plugin, two jobs that it offers respectively.
Move every upscaler and face fix model in the extras tab each into a plugin with img2img jobs.
The "send to txt2img" or "send to extras" functionalities are no longer needed, all plugins are working as part of the same canvas now. A new session stack, maybe organized in a tab bar, with multiple ways to clone the current session, can be used to keep a parameter context to continue making outputs with a particular prompt.

Workflow

The entire UI is designed like a feature dashboard which each expose their own UI. A workflow where you're jumping between generating (stablediffusion, vqgan+clip, guided diffusion, soon txt2vid like phenaki) and postprocessing (upscalers, 2D and 3D transforms, MiDaS extract depth map, img2img) plugins to get work done and string features together. Each plugin implements models and discrete features which can be used in a manual workflow, and depended on in larger plugins like Deforum to invoke as an API.

Eventually, someone will try to make a fancy timeline sequencer, or some node based editor. Or they'll integrate it into major software like Blender, or Kdenlive, Premiere, etc.

Community Decentralization

We can automatically collect plugin repositories from GitHub and have them enabled much like you would Vim-Plug and such. Thus, we unlock the full power of open-source contributions and condense all manpower into a buzzing ecosystem of UIs and plugins to use, much like the marketplace for VS Code and Sublime Text once upon a time. Anyone can create a plugin to implement some image synthesis or transformation technique into the repo, even ebsynth or imagemagick to support them in all of supported UIs automatically, handling installation for each platform. This becomes like the npm of creative community.

Concrete use-cases

Enable wildcards (or any other) permanently.
Use wildcard (or any other) with other plugins.
Implement other models as ready to use, for example MiDaS postprocessor let's you turn images into depth map.
Implement PyTTI, VQGAN+CLIP, etc.
Implement new techniques and models in the community as minimal plugins, such as 3D POISD
GLSL shader postprocessor to deform the image with UVs and all these cool things
Users can create custom macros/workflows which is an ordered list of plugins strung together with each their own parameters, allowing them to create their own unique process of animation and save/restore it for later.
A new session management system, each session has its own output canvas and plugin parameters.
Global run count setting allows each plugin to be ran N number of times, and interruptible. A separate checkbox "LOCK INIT" to always re-run on the init input or otherwise each output is fed back an an input by default. This lets you repeat plugins, useful for animations.
Sessions could also hold an image sequence at the bottom, which can act as a history, or display images loaded from a folder to work on them as a sequence editor, e.g. re-running img2img on certain frames that look off.
Dispatch Xorg commands on Linux to switch away to TTY1 and back to TTY7 automatically, freeing ~150mb of VRAM during generation. (yes, extremely hardcore)
Absorb drama into plugins. Instead of implementing the next NovelAI leak, there's an endless supply of internet idiots that will do it for you.
Instant deploy to cloud compute with remote job deferring.
Sandboxing to prevent plugins from crashing the core when we run out of memory.

Plugin Ideas

StableDiffusion (G/P)
RESRGAN Upscaler (P)
LSDR Upscaler (P)
All other upscalers
VQGAN+CLIP (with PyTTI palette modes) (G/P)
MiDaS (P)
AdaBin (P)
2D transform (P)
3D Transform Midas+Adabin (P)
EdgeAug guide other postprocess with edge convolutions (P)
LpipsAug guide other postprocess with lpips (P)
Palette Matching (as in Deforum) (P)
Prompt Wildcards
Flow Warp (displace the output image with the motion from an init video) (P)
Plugins that simply calculate some information and make it available to use in animation properties (audio-reactive decibels, FFT frequencies, etc.)
You can try anything and share it easily!! EXPLOSION of creativity

With the macro editor, you could easily reconstruct a Deforum frame by stringing together just a few plugins, making animation available out-of-the-box without any special animation plugin necessary.

API Specs

Plugin API

title()
describe(page)
ui()
install() code to run during the core's startup to install the requirements to run it. (through pip, apt, etc.) same way you would on colab.
init() runs once per script upon first startup, allows loading resources required for enabling the script, e.g. reading from files into memory if it's required to display in the UI for selections.
load() instantiate resources/models required for processing, e.g. lpips, midas, ...
unload() frees resources from vram
generate_cost() attempt to return the estimate VRAM cost to run this generator or postprocess.
postprocess_cost() attempt to return the estimate VRAM cost to run this generator or postprocess.
generate(params)
postprocess(img) take in the current image as input, cv2 RGB. An extensive utility API provides conversions like cv2_to_pil, pil_to_cv2, pil_to_latent, latent_to_pil, cv2_to_latent, latent_to_cv2, etc. painless to use no matter what.
postprocess_prompt(prompt) Any generator prompt will be passed through, this is where we can implement wildcards, etc.

Notes: load()/unload() are managed by the core, it will usually call load before processing and unload at the end. Users can configure certain models to load on startup or remain loaded to tailor for their their performance needs.

Generators emit...

onGenStart(parameters)
onGenEnd(outputs)
onGenInterrupted() when the user manually interrupts the generator, or a plugin requests it.

StableDiffusion:Plugin emits...

onPostprocessParameters(params) Take in the full dictionary of render parameters and return the modified parameters. (I actually have not really looked at the codebase yet, I'm assuming there must be one) Runs only once before the first batch, but not subsequent ones.
onStepStart(latent) On the very first call this would be the raw noise, no denoising yet
onStepCondFn() Allow implementing new loss terms to guide diffusion. I'm not sure how it's been done in this repository but this would be the place in k-diffusion. Uses for this include CLIP conditioning, lpips to preserve perceptual similarity (as in Disco Diffusion), or preserving shapes as in PyTTI with convolutions ("edge stabilization").
onStepEnd(latent)

Extend specific plugin features...

def init(self):
  sd = plugins.get("stablediffusion")
  sd.negative_prompt.on_changed.add(self.on_sd_prompt_changed)

API design thoughts:

on prefix establishes a clear boundary between plugin event handlers and any extra functions written by the plugin developer.
All start events are positioned before running any of the code relevant to that event's description, and end come after. Otherwise, we specify with past-tense to avoid ambiguity, e.g. onImageSaved, onRunInterrupted.
All modifications of the parameters involved in the run cycle (prompt, render parameters, image pixels, save path, etc.) should be done in a post-process event. For example, you would not want to initialize some plugin state in the same function we post-process render parameters. This allows the internal architecture to be more resilient to future refactors while keeping maximum compatibility with new scripts. (no leaky abstraction) It also establishes a minimum standard for self-documentation, fostering a better culture of learning-by-example in the community.

oxysoft · 2022-10-09T14:48:00Z

oxysoft
Oct 9, 2022
Author

@AUTOMATIC1111 I can make the pull request for this feature as I would also be trialing it with my own animation plugin, but I'll need to know first if we're all on-board with this new architecture!

1 reply

oxysoft Oct 16, 2022
Author

The idea has evolved and is now too much for me to take on alone. Let's all work together towards this one vision and make this project the ultimate AI art package, well beyond StableDiffusion image.

kybercore · 2022-10-11T12:38:45Z

kybercore
Oct 11, 2022

Great idea, an extension ecosystem is always good for open source project

0 replies

TiagoTiago · 2022-10-12T17:01:16Z

TiagoTiago
Oct 12, 2022

A way to control execution order of plugins that are activated on the same trigger, possibly even individual events that may need to intercalated with other plugins or might need different orders than the rest of the script; would probably be a good idea.

0 replies

1ort · 2022-10-14T02:34:18Z

1ort
Oct 14, 2022

Excellent proposal! The middleware system of addons is what first came to my mind when I saw how scripts are implemented now

0 replies

oxysoft · 2022-10-15T19:58:03Z

oxysoft
Oct 15, 2022
Author

I have edited this proposal with an update, things have evolved quite a bit more drastically. I am suggesting a brand new approach to this whole thing, refined to an amazing core architecture which is both efficient for UIs, coding, and CLI use. There are certainly challenges, the plugin installation code I think will be tricky. Let me know how this feels to you guys

3 replies

cibernicola Oct 15, 2022

What you propose, as a concept, is just a must.

Mozoloa Oct 16, 2022
Collaborator

I have edited this proposal with an update, things have evolved quite a bit more drastically. I am suggesting a brand new approach to this whole thing, refined to an amazing core architecture which is both efficient for UIs, coding, and CLI use. There are certainly challenges, the plugin installation code I think will be tricky. Let me know how this feels to you guys

I'm tempted to say "start from scratch and it's going to be insanely faster" but I think @AUTOMATIC1111 has a great energy and a lot of knowledge about all the functions so it might be really worth it to get him onboard, he just needs to be guided in order to turn the repo into a more traditional dev hub with good open source programming practice, testing and releases. I really want to help with the code, but just diving into it last night I almost had a stroke, so obviously I'd love to work on a modular platform

oxysoft Oct 18, 2022
Author

It has a lot of singletons and global variables but the rest is fine, just needs to be re-organized a bit and some core components swapped out, I'm working on it.

whjms · 2022-10-16T14:16:18Z

whjms
Oct 16, 2022

I like your proposal, but given the scope another approach to implementing it could be:

work on a proof-of-concept in a fork of this repo, implementing just the bare minimum of plugins
share the proof-of-concept for review and comment by Automatic1111 and other community members
once feedback has been gathered, work on slowly porting behaviour over from the main repo to the proof-of-concept repo, until feature parity is (mostly) achieved

1 reply

Ronsor Oct 16, 2022

I'm already working on this. I'll be pushing my progress to https://github.com/ronsor/instability

Ronsor · 2022-10-16T18:08:19Z

Ronsor
Oct 16, 2022

I'm working on a modular refactor/rewrite called "Instability"

Big changes include (almost) no global variables, easy usage in other scripts, simple installation with pip, and a clearer licensing situation

6 replies

aliasfoxkde Oct 17, 2022

I'm curious on this too. Much of the code for AUTOMATIC1111/stable-diffusion-webui is copied and/or based on CompVis/Stable-diffusion and in that regard excluding their license is in violation of their terms per their "CreativeML Open RAIL-M" license. And many of the "modules" included in the package have been modified so are in fact, and by definition, derivative works.

Ronsor Oct 17, 2022

I said rewrite for a reason. I'm rewriting everything, incrementally, until none of the original code remains.

aliasfoxkde Oct 17, 2022

I wasn't saying you couldn't and it's a common practice to do what you are suggesting, such as a notable mention from that of Unix to Linux... but if you start with the original CompVis/Stable-diffusion code, you shouldn't strictly need to do that, at least not all at once to have usable and compliant code. Because the "CreativeML Open RAIL-M" license "Grant[s] of Patent License" to "[y]ou a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such license applies..."

I'm not a lawyer, but I too would like to see a community driven project like what you are suggesting.

Source: https://huggingface.co/spaces/CompVis/stable-diffusion-license

Ronsor Oct 17, 2022

Oop, By everything I just meant the webui changes from this repo. I consider the Stable Diffusion code to be a third party library.

oxysoft Oct 17, 2022
Author

Waste of time to be rewriting everything. I forked the original repo, and started refactoring all of it into plugins. Doesn't look like it will be that much work actually.

getorca · 2022-10-17T02:08:33Z

getorca
Oct 17, 2022

I think it would be more valuable for the pre and post processing to convert it to a pipelines type architecture, similar to what you see in data processing tools, like scrapy, jina.ai or bonobo. This would infact would support "plugins" for processing steps as python modules.

An added bonus would be to allow parallel processing on certain steps, and directing which should happen on CPU and which on GPU or even remotely on things like huggingface.

0 replies

oxysoft · 2022-10-17T15:41:26Z

oxysoft
Oct 17, 2022
Author

It seems there are a few projects currently attempting something like this, but none of them have their priorities in the right place which is the plugin ecosystem. Everyone is wasting time here, the plugin architecture is the only thing that matters, then we can all improve parts of the project as a community.

Some are warning that the repository is a mess and it can't be saved, but I looked at it quickly and I disagree. There's no architecture essentially, it's a classic singleton problem grown out of control, all of the code is still good it just needs to be called in the right order:

About half of the module files correspond directly to a plugin and we just need to rename stuff and extend the right API.
The other half are peripheral extension to a plugin (hypernetwork, deepbooru), utility functions or extension features for the core and the UI.
Most of the installation code is already implemented in python in launch.py so it's easy to port over into plugin install() functions.
A lot of shared globals thrown around haphazardly in shared.py. Many plugins intersect here as a concentrated spaghetti hairball.

Well, with that being said I'm about halfway through a refactor https://github.com/oxysoft/stable-diffusion-webui check the README.md to see progress and how you can help. I'm out of free time but I suspect another day or two will yield results. I will implement the shittiest UI that can be made around this architecture and then relay it to the community so we can all build on it.

I will need urgent help (even just comments and recommendations) with these:

The backend, I think currently it's gradio itself and it's piped through ngrok or something. I'm not super good with web development or python for that matter, so I will leave this for other folks to contribute. Flask looks like a good candidate
The UI, I don't like Gradio personally and it will probably be easier for me to prototype this with IMGUI, but the point is anyone can make a UI for it, so for a gradio UI I'd rather get external help. Plugins would simply report their properties and features so that they can all be drawn automatically with a single implementation. (or offer a specially made drawer with better formatting) Gradio is good to keep because it can be used on colab and we don't want to lose this.
How do we collect plugins from Github automatically? (to use it as a plugin repository) *Someone could work on this part of the project, and make a script you can call to list all plugins from Github or something like that.
Job system, we could hold a queue and also some tasks might be able to run in parallel if they all fit in VRAM.

The sooner we have the core/plugin architecture, the sooner we can have everyone contributing to build the most powerful AI core for both coders and artists.

@AUTOMATIC1111 I hope you will accept this development, this will unlock the full power of AI. We must continue moving forward as a community and with a permissive license.

3 replies

aliasfoxkde Oct 17, 2022

@oxysoft I agree with a refactor, what is here works and can be refined, and I also agree that I'm not a huge fan of the UI, I think it should be more simplified for anyone to use with optional advanced options; it should just work out of the box. Think something as easy to use as Google images. But for a project I'd like to do with it, the code and process would need a speed up (but there are several methods that have been outlined which I think could be combined to result in at least a 10x performance improvement).

And I had the same idea with automatically including plugins, which I think would just be done with a user config file, front end plugin manager (enabler) and it could just point to repositories that will automatically pull and copy to a default directory. On launch, the plugins will download/update if needed and automatically be included.

As to a Job/Que system, rbbrdckybk is doing something similar with a prompt builder included that can dynamically generate outputs, but I have a slightly different, and I think more saleable/manageable idea.

I would like to help, and I'd like to make more programming and AI minded friends anyways.

oxysoft Oct 18, 2022
Author

Taking into account the recent interview by Emad, the whole thing will be realtime early on next year. Someone has managed to make a 256x256 StableDiffusion that is a lot faster, and 128x128 can probably be done for more than 1 image per second and much less VRAM, allowing faster experimentation. Meaning we're gonna benefit big time from a more flexible core with better tools.

Mozoloa Oct 18, 2022
Collaborator

Taking into account the recent interview by Emad, the whole thing will be realtime early on next year. Someone has managed to make a 256x256 StableDiffusion that is a lot faster, and 128x128 can probably be done for more than 1 image per second and much less VRAM, allowing faster experimentation. Meaning we're gonna benefit big time from a more flexible core with better tools.

for this I hope the colour shift in img2img is going to be fixed unless we scale the latent but even then

horribleCodes · 2022-10-17T23:29:41Z

horribleCodes
Oct 17, 2022

Interesting concept. I've often thought about how there should be a modifier stack similar to Blender to line up processes (not just to stack custom scripts, but also for pipelines to img2img or upscaling), but this is on a much larger scale.

Honestly, I think even the input should be a plugin - I dream of an advanced text editor with highlighting, autocomplete and snippets. Alternatively, I think there could be other means of creating prompts that are much more intuitive than writing plain text we can't even imagine.

1 reply

oxysoft Oct 18, 2022
Author

You can definitely have the server receive a string of code to execute, making it possible to display a textbox in the UIs to script the pipeline yourself. (should be disabled with --share.......)

oxysoft · 2022-10-18T12:00:50Z

oxysoft
Oct 18, 2022
Author

They have announced their animation API for release next week. https://twitter.com/Plinz/status/1582200052801359872

But they don't have an open-source plugin ecosystem, so ours will be better. The same mistakes every single time. Corporate software cannot beat open-source movements. Ours will be a hundred time more feature-rich thanks to the community.

0 replies

oxysoft · 2022-10-20T16:31:42Z

oxysoft
Oct 20, 2022
Author

Major Progress

https://github.com/oxysoft/stable-diffusion-webui

The entirety of AUTOMATIC1111's webui has been refactored.
We have a rudimentary backend with plugins, jobs and web server.
The StableDiffusionPlugin is almost done, I am refactoring the text inversions and hypernetwork stuff and then we can push to a plugin repo.

After this I am making a simple UI to demo how this all works, and once that's working I will port the remaining plugin code. (upscalers, face restoration, etc.)

DreamStudio Pro

We've seen it on the big screen now and I'm not convinced, too many fancy bells and whistle like nodes and 3D scene view (wtf was that even for???) and not enough actual software being shown. I'm not optimistic about it. Their existing DreamStudio web interface already sucks, but the real meat is in embedding these features into existing software!! Blender, Photoshop, Kdenlive, no actual artist and animators wants to work in some crappy web interface. The only reason I use DreamStudio at all is to iterate on a prompt idea without any resource constraints. I will make a minimal UI in Dear IMGUI (powerful though) only so that 1. devs can test plugins easily 2. people who don't know any real art software can also have fun and enjoy a powerful workflow. The focus should on existing software.

Well it occurred to me that we can bake cloud deployment into the core. E.g. you rent an instance on Vast.ai, then you run a deploy command in the stable-core shell and enter your SSH details. Automatically, it clones stable-core onto the instance, copies your configuration file over, and launch and install everything, and all local jobs are automatically deferred to the cloud node as long as it's left up. So when you want to turbo your render performance, you can do so in just a minute and it's all seamless, no need to mess with anything more than that. Maybe you can even deploy to multiple nodes at once if you have several batches. The StableDiffusion horde nodes could also announce plugin capabilities and dispatch to a node that can take the job.

Still no comment from @AUTOMATIC1111. Having seen and refactored the code behind this webui and the way the issues are piling up, I think the project won't be maintainable for much longer. This could be the greatest learning in your entire life, the entire codebase is refactored and I left comments everywhere to detail my confusion, suggestions, frustration even, etc. I hope you'll endorse this core and transition to it, we could use your expertise in maintaining the StableDiffusion plugin and continuing to support new research papers, techniques and optimizations.

6 replies

aliasfoxkde Oct 20, 2022

Amazing! I've been crazy busy this week but I look forward to contributing to this code base.

horribleCodes Oct 20, 2022

I'll see if I can put something together. I do think a modular solution is the best way to go.

Mozoloa Oct 20, 2022
Collaborator

I'll definitely be contributing to the code if it's refactored and easily understood, I love the modular/procedural/local approach

horribleCodes Oct 20, 2022

Edit: I opened a new issue about the problem in your repo.

Mozoloa Oct 20, 2022
Collaborator

Yeah don't hesitate to enable issues in the settings on the repo, and maybe even discussions !

oxysoft · 2022-11-01T18:42:05Z

oxysoft
Nov 1, 2022
Author

Hi everyone, this is the last time I will update this discussion

Development is now well under way at https://github.com/distable/core. Plugin system is functional, we have an interactive shell to run SD from the command line, and I've begun work on a GUI. The API is settling down so I feel comfortable taking contributions from people now if you wish to help.

Let's make this the best AI art ecosystem, much bigger than Stable Diffusion

0 replies

Promote custom script to first-class plugin system with a generator/postprocess pipeline workflow (Proposal) #2028

Uh oh!

Uh oh!

Plugin Proposal

Development Fork

From stable-diffusion-webui to stable-core ...

Workflow

Community Decentralization

Concrete use-cases

Plugin Ideas

API Specs

Replies: 13 comments · 21 replies

Uh oh!

oxysoft Oct 9, 2022 Author

Uh oh!

oxysoft Oct 16, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oxysoft Oct 15, 2022 Author

Uh oh!

Uh oh!

Mozoloa Oct 16, 2022 Collaborator

Uh oh!

Uh oh!

oxysoft Oct 18, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oxysoft Oct 17, 2022 Author

Uh oh!

Uh oh!

Uh oh!

oxysoft Oct 17, 2022 Author

Uh oh!

Uh oh!

Uh oh!

oxysoft Oct 18, 2022 Author

Uh oh!

Mozoloa Oct 18, 2022 Collaborator

Uh oh!

Uh oh!

oxysoft Oct 18, 2022 Author

Uh oh!

Uh oh!

oxysoft Oct 18, 2022 Author

Uh oh!

Replies: 13 comments 21 replies

oxysoft
Oct 9, 2022
Author

oxysoft Oct 16, 2022
Author

oxysoft
Oct 15, 2022
Author

Mozoloa Oct 16, 2022
Collaborator

oxysoft Oct 18, 2022
Author

oxysoft Oct 17, 2022
Author

oxysoft
Oct 17, 2022
Author

oxysoft Oct 18, 2022
Author

Mozoloa Oct 18, 2022
Collaborator

oxysoft Oct 18, 2022
Author

oxysoft
Oct 18, 2022
Author