PyTorch optimizations

Jump to bottom

Albert Zeyer edited this page Oct 17, 2023 · 9 revisions

This is just an overview and collection of references.

Potential optimizations (speed and/or memory):

Distributed / Multi GPU training
PyTorch scripting and tracing (https://github.com/rwth-i6/returnn/issues/1436)
torch.optim._multi_tensor.AdamW
apex.optimizers.FusedAdam (might be integrated into PyTorch? https://github.com/pytorch/pytorch/issues/71274)

References:

[Benchmark] HF Trainer on RTX-3090, many interesting benchmarks: fp16, bf16, tf32, grad accum, grad checkpointing, batch size, adamw apex fused