You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey everyone, I need some help. I got my hands on two Titan RTX 24GB and also two RTX 3090 24GB cards. As far as I know the biggest differentiating factor between them is that the RTX 3090 has half rate FP16 tensor with FP32 accumulate. Which lands it almost 1/2 the performance of the Titan RTX in that area. In every other metric it wipes the floor with the Titan RTX.
The RTX 3090 is also ampere based so it supports flash attention 2 and therefore sample packing. As well as BFloat16. While the Titan RTX I had to run xformers and no sample packing.
This would result in a 24-step training.
This results in these training times:
Titan RTX: 248 seconds
RTX 3090: 325 seconds
But if I enable sample packing on the RTX 3090, it can do it in one step resulting in:
RTX 3090 sample packing on: 28 seconds
I understand this is because this is a super small dataset that can be optimized to be packed and done in one step instead of 24 steps originally. But the Titan RTX is inherently faster without this optimization? Is there a way to turn on sample packing with the Titan RTX? I am contemplating which of the cards to keep and sell. Thanks!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone, I need some help. I got my hands on two Titan RTX 24GB and also two RTX 3090 24GB cards. As far as I know the biggest differentiating factor between them is that the RTX 3090 has half rate FP16 tensor with FP32 accumulate. Which lands it almost 1/2 the performance of the Titan RTX in that area. In every other metric it wipes the floor with the Titan RTX.
The RTX 3090 is also ampere based so it supports flash attention 2 and therefore sample packing. As well as BFloat16. While the Titan RTX I had to run xformers and no sample packing.
In my testing, with this yaml configuration:
This would result in a 24-step training.
This results in these training times:
Titan RTX: 248 seconds
RTX 3090: 325 seconds
But if I enable sample packing on the RTX 3090, it can do it in one step resulting in:
RTX 3090 sample packing on: 28 seconds
I understand this is because this is a super small dataset that can be optimized to be packed and done in one step instead of 24 steps originally. But the Titan RTX is inherently faster without this optimization? Is there a way to turn on sample packing with the Titan RTX? I am contemplating which of the cards to keep and sell. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions