RTX 4090 benchmarks - FLUX model #4571

belenkoksas · 2024-08-23T10:12:37Z

belenkoksas
Aug 23, 2024

The problem is that everyone has different configurations, and my ComfyUI setup was a mess. The FLUX model took a long time to load, but I was able to fix it.

My PC Specifications:
Processor: Intel i9-12900K @ 3.20 GHz
Memory: 64.0 GB (63.7 GB usable)
GPU: NVIDIA RTX 4090

Comfy log:

Goal:
I want to see if my setup is one of the fastest in the community, using the same workflows and models.

Workflow:

Workflow Screenshot:

My RTX 4090 Results:
I ran the process twice each time. The first run took longer due to loading models and other initializations:

1 run: Prompt executed in 41.14 seconds
2 etc... runs: Prompt executed in 18.42 seconds

ltdrdata · 2024-08-23T22:55:11Z

ltdrdata
Aug 23, 2024
Collaborator

RTX 4080 / torch==2.5.0.dev20240821+cu124 / python 3.12 / Ubuntu 24.04
ComfyUI (7df42b9)

weight_dtype (default) (28sec)

weight_dtype (fp8_e4m3fn) without --fast (19sec)

weight_dtype (fp8_e4m3fn) with --fast (13sec)

0 replies

JorgeR81 · 2024-08-24T11:35:23Z

JorgeR81
Aug 24, 2024

Flux in Q8_0 format looks very similar to FP16 ( better quality than FP8 ) and may be even faster.
Definitely faster, while loading, for me.

Probably will be even faster in the future, while generating ( #4538 (comment) )

https://github.com/city96/ComfyUI-GGUF
https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main

0 replies

scpedicini · 2024-08-25T20:23:57Z

scpedicini
Aug 25, 2024

Using the Q8 GGUF I see 15 seconds at the fastest to 17 seconds at the slowest on my RTX 4090 with Euler 20 Steps for 1024x1024 images.

It's around 17 seconds if I'm continually changing up the prompt (I think because its having to unload/reload text encoders), for continual generation its about 15 seconds.

Previously I was using Forge with the FP8 model where I was seeing 14 seconds.

0 replies

jepjoo · 2024-08-26T07:40:48Z

jepjoo
Aug 26, 2024

Updating from pytorch 2.4.0+cu121 to latest 2.5.0.dev+cu124 boosted generation speed by about 10%.

1024x1024 20step now:

With 2.52 GHz at 875mV, drawing ~290W:
100%|██████████████| 20/20 [00:09<00:00, 2.09it/s]
Prompt executed in 10.01 seconds

And 2.8 GHZ at 1000mV, drawing 400W:
100%|██████████████| 20/20 [00:08<00:00, 2.29it/s]
Prompt executed in 9.07 seconds

With pytorch 2.4 I was at about 1.9t/s and 260-270W with the heavily undervolted 2.52GHz clockspeed that I normally use.

All above results are with FP8 --fast, but it seems GGUF Q8 got a similar ~10% speed bump with 2.5.0.dev+cu124.

0 replies

metamountain · 2024-09-01T13:13:02Z

Run GPU at 70-75% power you'll get nearly the same performance , GPU will run cooler too.

Can you tell me how you do that please ?

l33tx0 Sep 16, 2024

you can use msi afterburner , and set the power limit to 75 percent then apply

baekdoosixt Sep 16, 2024

Thanks a lot !

RTX 4090 benchmarks - FLUX model #4571

Uh oh!

Uh oh!

Replies: 5 comments · 4 replies

Uh oh!

Uh oh!

ltdrdata Aug 23, 2024 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 5 comments 4 replies

ltdrdata
Aug 23, 2024
Collaborator