In my setup, I am trying to run FSDP with FP16 precision. Is there any limitation that I can not use FSDP with FP16 precision? How can I convert my existing code to FSDP for FP16 precision? I believe there is ShardedGradScaler from FSDP should be used. How is it different than normal GradScaler in terms of implementation? It will be great if someone share a concise example for this.