Low FP16 performance for Turing w/o tensor cores (GTX 16 series)? #15203

pt13762104 · 2025-08-10T02:49:23Z

pt13762104
Aug 10, 2025

I have a GTX 1660 Ti, when I checked it using test-backend-ops, the FP16 performance is only 0.7 TFLOPs, 5x slower than FP32 and 3x slower than other types. When I enable GGML_CUDA_FORCE_CUBLAS, the performance of every type excluding BF16 and FP32 is 0.7 TFLOPs. This seems much lower than the rated 10 TFLOPs that my GPU have.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low FP16 performance for Turing w/o tensor cores (GTX 16 series)? #15203

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Low FP16 performance for Turing w/o tensor cores (GTX 16 series)? #15203

Uh oh!

Uh oh!

pt13762104 Aug 10, 2025

Replies: 0 comments

pt13762104
Aug 10, 2025