-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Summary
This RFC proposes a major UI and architectural direction for Invoke.
- Top-level tabs become a primary area of work (like files in Photoshop)
- Each top-level tab owns it's own graph/workflow context
- Existing "pages" like Generate, Canvas and Workflow should be rethought as:
- tab-local panels
- center-area views
- contextual tools over the same tab state
- The application moves towards a unified workspace shell rather than separate pages with partially shared behaviors
- To support this properly, the application will have a proper dockview implementation where panels can be moved into various regions.
- Panels will need to be responsive to fit within width/height constraints of vertical/horizontal regions.
- Add a bottom status panel w/ a dockview region.
Problem Statement
Invoke has multiple major surfaces that all map to workflows/graphs in different ways, which currently lead to unpredictable behaviors in the UI when switching pages to perform new tasks.
Examples:
- Some state is shared or influences generations between the pages in ways that aren't obvious
- Generate, Canvas and Workflow are treated as distinct pages/modes, even though they all ultimately produce or operate on graphs.
- The workflow page is separated from the rest of the pages, despite that being the underlying execution model
- The current dockview implementation is not built around it's initially intended purpose, so it's currently static
- Discussions around tabbed canvases and workflows are happening already as separate concerns, even though they appear to want the same thing: a top-level tab with it's own graph-backed working context
This creates UX issues:
- Hidden or surprising user workflow influence
- Unclear sources of truth
- Too much conceptual separation between pages and surfaces that are actually related
- Difficulty scaling the application without adding new pages/modes
- Reduced flexibility for future UI composition and features
Proposal
Core Idea
Invoke should adopt a unifying workspace model. Where each top-level tab represents a graph-backed workspace similar to Photoshop with a tab for each working document. Panels and center regions operate on that tab's state. The graph becomes the underlying model of both the UI and the backend. Most users will continue to interact with it through hard-coded panels rather than through direct graph edits.
In Practice
A top-level tab should own:
- prompt/generation state
- parameters state
- canvas/layers state
- a graph
The broader UI state shared across tabs:
- generation queue
- dockview state
- UI settings
- gallery
- models
The default state of the UI before user dockview preferences will become:
- Top-level tabs = workspaces
- Left rail = graph-authoring panels/controls
- Center area = views into current graph/tab state
- Right rail = contextual utilities and inspectors
Center Views
The center region will be able to switch between various views and working areas of the panel and tab outputs
- Viewer
- Canvas
- Workflow
This means Canvas is no longer a separate page/mode. It becomes another view over the tab's state.
Panels over Pages
Instead of separate top-level pages that inconsistently overlap, panels become the primary way to edit the tab state.
Examples:
Left-side panels:
- Parameters
- Upscaler
- Linear UI for custom graphs
Right-side panels:
- Queue
- Gallery
- Layers
- Model Manager
Some panels are always available, some will need to be contextual.
Example:
- Layers may only be fully interactive when within a canvas view
Why top-level tabs should own their own graph
This is the key architectural shift, and will allow Invoke to hit many birds with a single stone by tying the tab to a graph.
1. It matches the backend model
Invoke already executes graphs, and with this in mind the mental model becomes easy to work with while working within Invoke and not so difficult to implement (hopefully).
2. Creates a clean unit of work
A tab becomes a real, consistent working context, not just a unit of navigation.
This allows:
- true parallel exploration
- tab-specific histories
- tab-specific canvas states
- tab-specific workflows/graph state
- easier branching and explorations of ideas
3. Removes non-obvious cross-page influence
A significant point of UX friction present within the UI today is that different pages influence one another in ways that are not obvious.
If a tab can own it's own graph context, then any image, mask, layer, controlnet or other control input affecting generation belongs to that tab and can be made easily visible there.
4. Unifies currently separate roadmap items
Tabbed canvases should not share context with the same generation parameters and inputs, this will require more time spent switching settings back and forth than is necessary. This can be easily solvable by including canvas state in it's workflow context.
5. Scales better than adding new pages
As Invoke grows and implements new features + technologies, this will be an easier model to extend than on a per-page basis.
Source of truth: structured vs workflow panels
To keep the app approachable and predictable while preserving power-user potential, this RFC proposes a distinction between structured panels and workflow-dependent authored tabs (linear UI).
Structured panels
Examples:
- Parameters
- Canvas
- Upscale
Behavior:
- hard-coded panels are the source of truth
- directly edit top-level tab graph state
- workflow/graph is read-only
This keeps the basic tools simple, reliable and beginner-friendly.
Linear UI Panel
Examples:
- imported or custom workflows
- graph-first workflow with optional linear UI inputs
Behavior:
- graph is source of truth
- optional linear UI elements can be bound to workflow inputs
- workflow/graph can be directly edited
Why the split matters
Trying to keep an arbitrarily edited graph to work across hard-coded panels that expect certain behaviors and inputs to be defined is not an effort worth considering at this time.
Continuing to keep structured/hard-coded panels as source of truth for the tab graph avoids this trap entirely while still clearly exposing the graph to be edited and played with later in a linear UI panel.
Power users will still have a first-class path:
- inspect the generated graph
- fork/duplicate to a workflow panel
- continue editing there
How panels influence the tab graph
In structured/hard-coded panels, they should not directly individually edit nodes/edges.
Instead:
- panels edit tab graph state as a whole
- new graph state is what is queued
Workflow visibility
To make workflows more understandable and accessible in all panel contexts, there will be a workflow view to sit besides the viewer and launchpad.
Why this implies moving away from separate pages
If generate, canvas, workflow, upscale and etc. all influence or reflect a similar graph-backed backed tab state, then keeping them fully separate becomes increasingly artificial.
A more coherent model:
- one application shell
- tab-local state
- center-area view switching
- contextual panels
- graph-based execution
This does not mean everything must be visible at once, all panels will be collapsible.
"One page" should mean:
- one unified workspace architecture
- one consistent tab model
- fewer conceptual jumps
It should not mean:
- one giant always-open mega-screen
Contextual panels, view switches, and layout rules will be what make the unified shell usable and not overwhelming.
Addressing dockview
Dockview currently controls the tab and panel UI states within Invoke. The intended outcome of implementing dockview initially was to allow for tabs and panels to be moved around freely, reorganized, pulled out into windows and floating above others. Unfortunately, due to technical complications at the time the UI utilizes it's basic functions to keep a static layout.
If this RFC is taken seriously, we need to address dockview and implement it properly for the concept of one page, many panels to work.
- Left panel region
- Right panel region
- Center region
- Bottom status bar region (similar to code editors)
This will not be a side concern, but a pivotal infrastructure to the entire layout.
Impact on users
Beginners
Benefits:
- simpler mental model
- less surprising shared state
- easier understanding of "what will run"
- hard-coded/structured tabs remain safe and guided
- no need to understand graphs
Potential concerns:
- a unified application could feel dense if not carefully designed
- panel/view/tab terminology needs to be very clear
Recommendation:
- keep default layouts simple and reminiscent of current invoke defaults
- expose workflow as inspect-able and non-intimidating
- Avoid overloading the workspace
Creatives
Benefits:
- can safely work on multiple images at once with different contexts
- less mode switching
- closer behaviors to familiar pro tools
Potential concerns:
- canvas tooling needs to remain fast and focused
- layers must continue to feel natural and not a separate "bolted on" panel.
Recommendation:
- treat layers as a contextual but important panel
Power users
Benefits:
- first-class graph model
- easily inspect generated workflows with less steps
- custom workflow tabs are now seamlessly possible
- custom linear UI becomes more accessible
Potential concerns:
- read-only workflow preview from structured panels may feel restrictive
Recommendation:
- expose a "fork/duplicate" button on workflows
- store a linear UI configuration in panel graphs, so duplication feels similar to panel it was duplicated from
UX principles
This proposal should follow a few clear principles:
1. One tab, one workflow context
No hidden state leakage between other tabs.
2. Graph influence should be easily visible
If canvas, parameters, layers, etc. influence the generated graph, the UI should communicate it subtly to not be annoying to power users, yet informative enough to others.
3. Structured first, graph second
Casual users and creatives generally consider panels as the primary interface... interacting and viewing the graph is visible and learnable, but not required.
4. Power should be available
Linear UI panels should be as first-class as the rest.
5. One page doesn't mean all panels all the time
Use smart defaults, collapsed states, contextuality and view-based behavior to keep the interface clean and approachable.
Example user flows
Structured Panel
- User opens a new tab
- Default center view is viewer
- Left panel shows parameters
- User switches to canvas view
- Layers panel becomes open/available
- User creates a new layer and mask
- User switches view to viewer and clicks "invoke"
- The workspace tab's compiled graph reflects the prompt + canvas layers + mask
- User switches to workflow view to inspect the generated graph
- If they want to customize nodes directly, they choose:
- Edit workflow (opens linear panel and makes workflow editable)
Workflow panel
- User opens/imports a custom workflow
- Center view is workflow by default
- Customizable linear UI section within panel
- User edits nodes directly
- Tab does not pretend to be a structured generate/canvas tab
Open questions
- Should layout persistence be configurable to be global or per-tab?
- Should top-level tab state be exportable?
- Should the center viewing area become a dock-able region? If so what panels should allowed there.
- Should workflows/custom nodes have panels available to the UI?
- Should left/right/center/bottom regions allow to hide a panel completely and add it again later? e.g. 'remove from dock'
- How flexible should we make the dockview? Should we allow to extract to new windows? Should we restrict certain panels to certain places?
- Should the layers panel auto-open when the canvas view is opened?
- Should we take this time in this major frontend rewrite to also implement a new web canvas, split off from the
@invoke-ai/ui-library, or use a new UI library (Ark UI has been discussed)?
Request for feedback
I'd especially like feedback on:
- What concerns this raises for beginners, creatives and power users
- What contributors think the right path is for the docking/panels experience
TL;DR
Invoke should move toward a unified graph-backed workspaces model, where each top-level tab owns it's workflow context, existing pages become contextual panels, and center views display tab-local state.
