PSGD 2.0 supports both the old tri-solver based update formulae for Q and a few inverse-free matmul only methods for updating Q, including online Newton-Schulz iterations. Main files:
- psgd.py: functional APIs providing all the flexibilities.
- wrapped_as_torch_optimizer_for_ddp.py: a basic momentum whitening torch.optim.Optimizer wrapping example for DDP training.
- wrapped_as_torch_optimizer_for_dtensor.py: one more basic momentum whitening torch.optim.Optimizer wrapping example for DTensor-based distributed training.