SDXL / ComfyUI Experiment: Resolution & Aspect Ratio Alter Semantic Output Even with Fixed Seeds #11302
DerekGrover
started this conversation in
Show and tell
Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
SDXL / ComfyUI Experiment: Resolution & Aspect Ratio Change Semantic Output (Even with Fixed Seeds)
Summary
I ran a controlled SDXL experiment in ComfyUI to test how resolution and aspect ratio affect semantic output when all other variables are held constant.
Result:
Changing resolution and/or aspect ratio materially changes what the model decides the subject is, not just framing, crop, or level of detail — even with fixed seeds, identical prompts, and identical sampler settings.
This behavior persists across cache purges and is fully reproducible.
Key Finding (TL;DR)
Resolution and aspect ratio are semantic inputs in SDXL, not just output formatting.
A fixed seed does not guarantee semantic consistency across resolutions or aspect ratios.
Test Setup
Environment
ComfyUI
SDXL Base + SDXL Refiner
Identical workflow graph across all runs
Controls (held constant)
Prompt (identical text)
Negative prompt (minimal / unchanged)
Seed (fixed)
Sampler type
Scheduler
CFG
Steps
Base → Refiner handoff
VAE
Hardware
Variables tested
Latent resolution
Aspect ratio
Cache state (purged vs non-purged)
Resolutions Tested
1024 × 1024
768 × 1024
(Same prompt, same seed)
Prompt (Architectural Test Case)
Ultra high resolution studio architectural photograph.
Neutral gray seamless background, infinite backdrop,
no environment, no landscape, no interior context.
Small standalone gardening shed, exact scale 10 feet wide by 15 feet long,
single-story non-residential utility structure,
solid brick masonry construction…
Straight-on frontal elevation, eye-level camera height,
orthographic architectural view,
perfectly straight vertical and horizontal lines.
Photorealistic materials, true-to-life scale, architectural documentation style.
(Full prompt omitted here for brevity if needed; identical across all runs.)
Observations
At 1024×1024, the model consistently interprets the subject as a small standalone structure
At 768×1024, the same seed and prompt frequently produce a multi-story building facade
This is not a crop or zoom difference — the entire subject identity changes
Running back-to-back generations without cache purge introduces subtle drift
Purging caches restores internal consistency at that resolution
However: cache purging does not make outputs consistent across resolutions
Resolution-dependent semantic shift remains.
A “sweet seed” is only stable within the same resolution / aspect ratio
Reusing a seed at a different resolution yields a different latent trajectory
This explains why “head-to-toe / full body” prompts often fail when users change size
Implications for Training & Usage
For Model Trainers
Training at a single resolution or AR creates semantic bias
Models should be trained across multiple resolutions and aspect ratios if consistency is required
Otherwise, resolution changes will surface as semantic drift
For Workflow Builders / Artists
Treat resolution as a creative and semantic control, not a final formatting step
Lock resolution early if character or subject consistency matters
Find and reuse resolution-specific “sweet seeds”
Expect minor variation even under ideal controls — this is intrinsic, not user error
Practical Takeaway
If you change resolution, you are not asking the model the same question anymore.
This experiment demonstrates why:
“Same seed, different result” is expected
Character consistency breaks across sizes
Aspect ratio tuning can radically alter output meaning
Reproducibility
This experiment is:
Fully reproducible
Performed with identical ComfyUI graphs
Verified with repeated runs and cache purges
Screenshots of the workflow and outputs are included for reference.
Closing
This is not a bug, and not noise — it is a structural property of how SDXL maps latent space.
I’m sharing this in the hope it helps others:
Debug “inconsistent” outputs
Design better workflows
Train more robust models
Happy to answer questions or compare results.
— Derek (derekHWD)
Beta Was this translation helpful? Give feedback.
All reactions