Problem : We are currently using endpoint from our inference provider directly from their "AI studio"
Our teams dealing with rationalizing inference costs and infrastructure have built a proxy endpoint.
In the long term, this will help us with redundancy and make them do all the work if some infratructure fails. On the short term us switching would help them to see real time load.