https://github.com/bigcode-project/Megatron-LM/blob/multi-query-attention/tools/checkpoint_loader_megatron.py#L121