-
Notifications
You must be signed in to change notification settings - Fork 2
Fault Tolerance Testing
To test fault tolerance, we need to check scenarios where if one of the service crashes, how the system would behave in this scenario.
The options that we had was to use Kube-Monkey, which is a tool based on Netflix's Chaos Monkey that randomly crashes a service based on certain selection and scheduling parameters. We explored this option, and even though simple, the tool did not make a lot of sense as it needed good amount of configuration. The tool is an excellent option for complex architectures with complicated dependencies between microservices. However, since our architecture is simple and the dependencies are easily identifiable, we decided to manually decrease the number of pods of a given service to study the behavior of the system.
We used kubectl's scale command along with Postman to test whether the system is available or not when a service is crashed.
Following observations were made with regards to fault tolerance of the micro-service architecture:
-
In case of multiple replicas, if a pod is removed, subsequent requests are taken over by the replicas. This works seamlessly as all the services are stateless in nature.
-
If all replicas of gateway or react UI are crashed, the application as a whole would not be available as these two are the single point of failures for the system. This limitation would be addressed in the subsequent milestones using blue-green deployment.
When the service is crashed,
- If all replicas of auth-service are crashed, the login and sign up functionalities would stop working, however the user who is logged in would be able to still use the service for uploading and downloading images.
GET /imageList Works
POST /image Works
GET /image Works
- If all the replicas of the user service are crashed, the sign up functionality of the application would stop working.
GET /imageList Works
POST /signUp Fails
- If all the replicas of image service are crashed, the image upload, download and view list of all images stops working, however, login and sign up work independent of the failure.
POST /signin Works
POST /image Fails
- If all the replicas of the session service are killed, the loading of the landing page and sign in and sign up page works correctly, however, the application as a whole would not be able to provide the service as the session validation fails.
GET /image Fails
- If all the replicas of the session log service are killed, the application functionality remains unaltered except for the fact that no session logs would be available to re-create a lost session.
GET /imageList Works
POST /signin Works
Milestones