what would be the best way to do multi-gpu training for snca?

Shall I do nn.parallel.DistributedDataParallel for for both "model" and "lemniscate" and mannually sync each lemniscate memory per epoch? 

I think a better solution might be use a single GPU for "lemniscate" memory and calculation, while other GPUs for data parallel "model" part? 

I am doing a comics image retrieval task, and find this project very useful. Thank you for you help.