Shall I do nn.parallel.DistributedDataParallel for for both "model" and "lemniscate" and mannually sync each lemniscate memory per epoch?
I think a better solution might be use a single GPU for "lemniscate" memory and calculation, while other GPUs for data parallel "model" part?
I am doing a comics image retrieval task, and find this project very useful. Thank you for you help.