Posted in main llama.cpp repository as https://github.com/ggml-org/llama.cpp/issues/14727, but linking here in case AMD's team has any ideas.