-
Prerequisite
💬 Describe the reimplementation questionsconfig中启用fp16 = dict(loss_scale=dict(init_scale=512)) Environmentmmdet 2.x Expected resultsNo response Additional information为什么我使用了fp16推理和fp32所用时间是一样的 是因为我的模型是在fp32上训练的吗。以及为什么我fp16训练 显存好像并没有减少,同时更难收敛了。使用的是htc网络,骨干网络替换成了swinv2。显卡是teslaA100 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
We recommend using English or English & Chinese for issues so that we could have broader discussion. |
Beta Was this translation helpful? Give feedback.
-
A100 uses tf32 format in computation by default, its speed is already faster than FP32 and compatible with FP16. Therefore, the speed might not be accelerated in some cases. In MMDetection, due to the complexity of the model, many parts are explicitly designed to use FP32 to avoid failure. |
Beta Was this translation helpful? Give feedback.
-
那么请问teslaT4也是这种情况嘛 我的推理是在T4上做的 |
Beta Was this translation helpful? Give feedback.
-
T4 does not support inference with TF32. |
Beta Was this translation helpful? Give feedback.
A100 uses tf32 format in computation by default, its speed is already faster than FP32 and compatible with FP16. Therefore, the speed might not be accelerated in some cases. In MMDetection, due to the complexity of the model, many parts are explicitly designed to use FP32 to avoid failure.