The ability to have multiple instances (replicas) of any of the API ML components has not been tested and may not work.
The architecture needs to be verified for having multiple replicas. Considerations include:
- Currently, all the components are deployed as a Kubernetes
Deployment resource. This means when a client attempts to access any of the components the ClusterIP service will load balance between the instances - is this ok. If not a StatefulSet may be more appropriate
- There is no implementation for the caching service or shared / common storage amongst replicas. This could cause multiple issues, for example, if a user logs onto one instance of the Gateway Service, and is then routed to another replica, are they still logged in?... I'm sure there are other issues.