Low FP16 performance for Turing w/o tensor cores (GTX 16 series)? #15203
Unanswered
pt13762104
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a GTX 1660 Ti, when I checked it using
test-backend-ops
, the FP16 performance is only 0.7 TFLOPs, 5x slower than FP32 and 3x slower than other types. When I enableGGML_CUDA_FORCE_CUBLAS
, the performance of every type excluding BF16 and FP32 is 0.7 TFLOPs. This seems much lower than the rated 10 TFLOPs that my GPU have.Beta Was this translation helpful? Give feedback.
All reactions