Reproducing HELM Arabic

Hey, I want to reproduce the results posted on https://crfm.stanford.edu/helm/arabic/latest/.  But the leading model LLM-X doesn't seem to have a public API? It is mentioned on the website that the results are reproducible. Does the HELM benchmark also apply to private models?