What is the difference between with/without mcore model in pretrain_gpt.py? [pretrain_gpt.py#L33](https://github.com/NVIDIA/Megatron-LM/blob/5f9c870f9f24b482509699d206a9dbb00958f6fc/pretrain_gpt.py#L33)