RTX 4090 benchmarks - FLUX model #4571
Replies: 5 comments 4 replies
-
RTX 4080 / torch==2.5.0.dev20240821+cu124 / python 3.12 / Ubuntu 24.04
|
Beta Was this translation helpful? Give feedback.
-
Flux in Q8_0 format looks very similar to FP16 ( better quality than FP8 ) and may be even faster. Probably will be even faster in the future, while generating ( #4538 (comment) ) https://github.com/city96/ComfyUI-GGUF |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Updating from pytorch 2.4.0+cu121 to latest 2.5.0.dev+cu124 boosted generation speed by about 10%. 1024x1024 20step now: With 2.52 GHz at 875mV, drawing ~290W: And 2.8 GHZ at 1000mV, drawing 400W: With pytorch 2.4 I was at about 1.9t/s and 260-270W with the heavily undervolted 2.52GHz clockspeed that I normally use. All above results are with FP8 --fast, but it seems GGUF Q8 got a similar ~10% speed bump with 2.5.0.dev+cu124. |
Beta Was this translation helpful? Give feedback.
-
3090 using 400W :( |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The problem is that everyone has different configurations, and my ComfyUI setup was a mess. The FLUX model took a long time to load, but I was able to fix it.
My PC Specifications:
Processor: Intel i9-12900K @ 3.20 GHz
Memory: 64.0 GB (63.7 GB usable)
GPU: NVIDIA RTX 4090
Comfy log:

Goal:
I want to see if my setup is one of the fastest in the community, using the same workflows and models.
Workflow:

Workflow Screenshot:

My RTX 4090 Results:
I ran the process twice each time. The first run took longer due to loading models and other initializations:
1 run: Prompt executed in 41.14 seconds
2 etc... runs: Prompt executed in 18.42 seconds
Beta Was this translation helpful? Give feedback.
All reactions