A Triton inference server might be useful for the open-source models https://github.com/triton-inference-server