SyncBatchNorm problems #12001
Replies: 19 comments
-
Beta Was this translation helpful? Give feedback.
-
|
The |
Beta Was this translation helpful? Give feedback.
-
|
@nswamy Can you please add label: Question |
Beta Was this translation helpful? Give feedback.
-
|
@zhanghang1989 It seems unsuitable for non-gluon users. Any suggestions? |
Beta Was this translation helpful? Give feedback.
-
|
I am not familiar with Symbol API. Can someone help this? |
Beta Was this translation helpful? Give feedback.
-
|
@kaleidoscopical I haven't personally used SyncBatchNorm, but the operator is available in both |
Beta Was this translation helpful? Give feedback.
-
|
@safrooze I have tried both of them. While the |
Beta Was this translation helpful? Give feedback.
-
|
If using Training example of using SyncBatchNorm can be found at https://github.com/dmlc/gluon-cv/blob/master/scripts/segmentation/train.py |
Beta Was this translation helpful? Give feedback.
-
|
If using standard |
Beta Was this translation helpful? Give feedback.
-
|
@kaleidoscopical Were you able to get a suitable answer to your question ? |
Beta Was this translation helpful? Give feedback.
-
|
@kaleidoscopical Did you find a way to use syncbn by symbol api? |
Beta Was this translation helpful? Give feedback.
-
|
You can't call |
Beta Was this translation helpful? Give feedback.
-
|
According to a quick test, the described failure in this issue has disappeared in the newest version of MXNet. Further report about efficiency and performance will be updated if time is available. @zhanghang1989 Thanks for your kind help. The |
Beta Was this translation helpful? Give feedback.
-
|
@kaleidoscopical Which version of mxnet did you use ? I use mxnet version 1.3.1 but It still runs into a fail of asnumpy(). |
Beta Was this translation helpful? Give feedback.
-
|
@tranvanhoa533 which error message it shows? Hi @zhanghang1989 !
This error message disappears when adding an explicit cast of type right before (16 to 32) and after (32 to 16) It is tedious and sacrifices much more memory and speed. Any suggestion to solve it? |
Beta Was this translation helpful? Give feedback.
-
|
SyncBN does not support fp16 training yet. |
Beta Was this translation helpful? Give feedback.
-
|
I met the same problem with mxnet version 1.5.0, and I found a solution from L_xiaoming in https://discuss.gluon.ai/t/topic/7842 The solution is to specify the parameter 'key' in the SyncBatchNorm layer, and you can just use the same string as the layer's name. But I don't know why it works. |
Beta Was this translation helpful? Give feedback.
-
|
I tried with all the solutions, and neither of them is working. It is really weird When I only use
The problem still exists, after given key. There is no matching error, however, the program running is jammed and never continue. The max number of syncbn I can use is 2. Don't know why |
Beta Was this translation helpful? Give feedback.
-
|
Gluon-CV is using syncBN for training all segmentation model and YoloV3. Please take a look at how to use it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
It runs into a fail of asnumpy() when I simply replace BatchNorm() to contrib.SyncBatchNorm(). Could anyone explain how to use the new function?
Beta Was this translation helpful? Give feedback.
All reactions