LLaMA-3 support and questions

This seems like one of the best options for quantization for the important new LLaMA-3 70B model so that it can be run on 1-2 consumer grade GPUs.  However it looks like support for MQA is not present in llama.py so it will not work I think.

Are you planning to add support for LLaMA-3?