Vulkan: improve mul_mat_vec_iq1_m #16907

lovedheart · 2025-11-01T00:03:16Z

./build/bin/Release/test-backend-ops.exe perf -o MUL_MAT -p type_a=iq1_m

Tested on AMD 8845HS 780M iGPU

n	PR: μs/run	PR: GFLOPS	Main: μs/run	Main: GFLOPS	Speedup vs Main
1	224.28	523.63	282.44	415.80	1.26x
2	310.53	756.38	385.04	610.01	1.24x
3	408.65	862.15	515.79	683.08	1.26x
4	589.40	797.02	1244.08	377.60	2.11x
5	1075.96	545.75	4427.85	132.62	4.11x
8	2576.61	364.64	4985.43	188.45	1.94x
512	11601.05	5180.00	11948.15	5030.00	1.03x

jeffbolznv · 2025-11-01T00:47:19Z

ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_m.comp

+        return;
+
+    // Number of rows to process for this workgroup
+    const uint rows_to_process = min(NUM_ROWS, p.stride_d - first_row);


I'm pretty surprised if it helped to make the changes in this function - this will prevent the compiler from unrolling loops.

I don't see a difference from adding this, so I would prefer to keep it as it was.

@lovedheart Can you benchmark if the changes in the main function make a difference for you?

0cc4m · 2025-11-07T18:54:29Z

I don't see much of a difference either way. Maybe slight improvement on RDNA3 for n=1, maybe slightly negative on GCN, Nvidia and Intel. Hard to tell, it's close to run-to-run variance.

AMD RX 8060S

ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

before:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               13632 runs -    74.80 us/run - 117.44 MFLOP/run -   1.57 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               11928 runs -    85.84 us/run - 234.88 MFLOP/run -   2.74 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                8804 runs -   113.87 us/run - 352.32 MFLOP/run -   3.09 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                7029 runs -   144.38 us/run - 469.76 MFLOP/run -   3.25 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                3420 runs -   300.60 us/run - 587.20 MFLOP/run -   1.95 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 856 runs -  1247.46 us/run - 939.52 MFLOP/run - 753.15 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               182 runs -  5539.74 us/run -  60.13 GFLOP/run -  10.85 TFLOPS

after:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               14484 runs -    71.63 us/run - 117.44 MFLOP/run -   1.64 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               11076 runs -    92.69 us/run - 234.88 MFLOP/run -   2.53 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                9088 runs -   113.53 us/run - 352.32 MFLOP/run -   3.10 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                7242 runs -   140.27 us/run - 469.76 MFLOP/run -   3.35 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                6156 runs -   165.65 us/run - 587.20 MFLOP/run -   3.54 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 749 runs -  1423.48 us/run - 939.52 MFLOP/run - 660.02 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               174 runs -  5764.40 us/run -  60.13 GFLOP/run -  10.43 TFLOPS

AMD Radeon Pro VII

ggml_vulkan: 0 = AMD Radeon (TM) Pro VII (RADV VEGA20) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: none

before:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               11076 runs -    97.17 us/run - 117.44 MFLOP/run -   1.21 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                7242 runs -   144.39 us/run - 234.88 MFLOP/run -   1.63 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                4260 runs -   249.32 us/run - 352.32 MFLOP/run -   1.41 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                3195 runs -   313.11 us/run - 469.76 MFLOP/run -   1.50 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                2736 runs -   368.32 us/run - 587.20 MFLOP/run -   1.59 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 428 runs -  3086.22 us/run - 939.52 MFLOP/run - 304.43 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                78 runs - 12824.58 us/run -  60.13 GFLOP/run -   4.69 TFLOPS

after:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                9372 runs -   110.16 us/run - 117.44 MFLOP/run -   1.07 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                6816 runs -   151.09 us/run - 234.88 MFLOP/run -   1.55 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                4260 runs -   236.26 us/run - 352.32 MFLOP/run -   1.49 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                3621 runs -   281.72 us/run - 469.76 MFLOP/run -   1.67 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                3078 runs -   325.78 us/run - 587.20 MFLOP/run -   1.80 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 321 runs -  3558.65 us/run - 939.52 MFLOP/run - 264.01 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                78 runs - 12860.79 us/run -  60.13 GFLOP/run -   4.68 TFLOPS

Intel A770

ggml_vulkan: 0 = Intel(R) Arc(tm) A770 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none

before:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                7668 runs -   139.32 us/run - 117.44 MFLOP/run - 842.96 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                2556 runs -   405.05 us/run - 234.88 MFLOP/run - 579.89 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                1136 runs -   956.64 us/run - 352.32 MFLOP/run - 368.29 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 426 runs -  3181.79 us/run - 469.76 MFLOP/run - 147.64 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 342 runs -  5578.36 us/run - 587.20 MFLOP/run - 105.26 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 107 runs -  9632.67 us/run - 939.52 MFLOP/run -  97.54 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                66 runs - 15407.65 us/run -  60.13 GFLOP/run -   3.90 TFLOPS

after:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                7668 runs -   143.76 us/run - 117.44 MFLOP/run - 816.93 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                2982 runs -   377.97 us/run - 234.88 MFLOP/run - 621.42 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                1420 runs -   747.32 us/run - 352.32 MFLOP/run - 471.45 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 639 runs -  1968.56 us/run - 469.76 MFLOP/run - 238.63 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 513 runs -  2413.24 us/run - 587.20 MFLOP/run - 243.33 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 107 runs -  9919.79 us/run - 939.52 MFLOP/run -  94.71 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                66 runs - 15310.42 us/run -  60.13 GFLOP/run -   3.93 TFLOPS

Nvidia RTX 3090

ggml_vulkan: 0 = NVIDIA GeForce RTX 3090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2

before:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               11928 runs -    83.89 us/run - 117.44 MFLOP/run -   1.40 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                9798 runs -   103.71 us/run - 234.88 MFLOP/run -   2.26 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                4828 runs -   208.74 us/run - 352.32 MFLOP/run -   1.69 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                5112 runs -   201.51 us/run - 469.76 MFLOP/run -   2.33 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                2907 runs -   358.87 us/run - 587.20 MFLOP/run -   1.64 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                2247 runs -   448.54 us/run - 939.52 MFLOP/run -   2.09 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               682 runs -  1467.35 us/run -  60.13 GFLOP/run -  40.98 TFLOPS

after:
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               11928 runs -    85.92 us/run - 117.44 MFLOP/run -   1.37 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                6816 runs -   148.84 us/run - 234.88 MFLOP/run -   1.58 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                5112 runs -   198.05 us/run - 352.32 MFLOP/run -   1.78 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                3621 runs -   286.45 us/run - 469.76 MFLOP/run -   1.64 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                3249 runs -   311.52 us/run - 587.20 MFLOP/run -   1.88 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                 749 runs -  1439.02 us/run - 939.52 MFLOP/run - 652.89 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):               706 runs -  1418.96 us/run -  60.13 GFLOP/run -  42.38 TFLOPS

lovedheart · 2025-11-12T19:59:31Z

The code seems to fix the performance only on Windows. In Linux, I cannot see the improvement.

In comparison with ROCm, it produced

D:\llama_latest>build\bin\test-backend-ops.exe perf -o MUL_MAT -p iq1_m
HIP Library Path: C:\WINDOWS\SYSTEM32\amdhip64_7.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon 780M Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32
Testing 2 devices

Backend 1/2: ROCm0
  Device description: AMD Radeon 780M Graphics
  Device memory: 59327 MB (59175 MB free)

  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                5112 runs -   210.56 us/run - 117.44 MFLOP/run - 557.76 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                4260 runs -   257.42 us/run - 234.88 MFLOP/run - 912.44 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                3124 runs -   323.46 us/run - 352.32 MFLOP/run -   1.09 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                2556 runs -   414.22 us/run - 469.76 MFLOP/run -   1.13 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                2052 runs -   495.51 us/run - 587.20 MFLOP/run -   1.19 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                1284 runs -   808.00 us/run - 939.52 MFLOP/run -   1.16 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                88 runs - 11595.92 us/run -  60.13 GFLOP/run -   5.19 TFLOPS
  Backend ROCm0: OK
Backend 2/2: CPU
  Skipping CPU backend
2/2 backends passed
OK

0cc4m

Overall it's fine, but please clean up the purely cosmetical code changes and check whether the main function changes are necessary.

0cc4m · 2025-11-15T10:31:45Z

ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_m.comp

+        [[unroll]] for (uint i = 0; i < NUM_ROWS; ++i)
            temp[j][i] = FLOAT_TYPE(0);
-        }
-    }


All of the above changes in this function are just code style, please revert them. It's okay to improve readability and style of code you're touching anyways, but that doesn't apply here. I also prefer to keep curly brackets after loops or ifs.

0cc4m · 2025-11-15T10:35:30Z

ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_m.comp

 FLOAT_TYPE temp[NUM_COLS][NUM_ROWS];

-void calc_superblock(const uint a_offset, const uint b_offset, const uint ib32, const uint i, const uint num_blocks_per_row, const uint first_row, const uint num_rows) {
+// ------------------ calc_superblock (final optimized version) ------------------


The comment isn't necessary.

0cc4m · 2025-11-15T10:40:13Z

ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_m.comp

+        return;
+
+    // Number of rows to process for this workgroup
+    const uint rows_to_process = min(NUM_ROWS, p.stride_d - first_row);


I don't see a difference from adding this, so I would prefer to keep it as it was.

@lovedheart Can you benchmark if the changes in the main function make a difference for you?

Optimize Vulkan shader for matrix-vector multiplication

6f60a29

lovedheart requested a review from 0cc4m as a code owner November 1, 2025 00:03

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 1, 2025

jeffbolznv reviewed Nov 1, 2025

View reviewed changes

DajanaV mentioned this pull request Nov 1, 2025

UPSTREAM PR #16907: Vulkan: improve mul_mat_vec_iq1_m auroralabs-loci/llama.cpp#29

Closed

0cc4m requested changes Nov 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vulkan: improve mul_mat_vec_iq1_m #16907

Vulkan: improve mul_mat_vec_iq1_m #16907

lovedheart commented Nov 1, 2025

Uh oh!

jeffbolznv Nov 1, 2025

Uh oh!

0cc4m Nov 15, 2025

Uh oh!

0cc4m commented Nov 7, 2025

Uh oh!

lovedheart commented Nov 12, 2025

Uh oh!

0cc4m left a comment

Uh oh!

0cc4m Nov 15, 2025

Uh oh!

0cc4m Nov 15, 2025

Uh oh!

0cc4m Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vulkan: improve mul_mat_vec_iq1_m #16907

Are you sure you want to change the base?

Vulkan: improve mul_mat_vec_iq1_m #16907

Conversation

lovedheart commented Nov 1, 2025

Uh oh!

jeffbolznv Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m commented Nov 7, 2025

AMD RX 8060S

AMD Radeon Pro VII

Intel A770

Nvidia RTX 3090

Uh oh!

lovedheart commented Nov 12, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

0cc4m Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants