-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
Description
Results of the internal discussion pasted for full transparency:
@tolbrino:
We had a discussion with @jeandemeusy & @thewanderingeditor, where I explained everything related to the CT issues seen and debugged last week, and we outlined the path forwards as:
- For every externally and internally observed CT KPI a metric will be collected (Increase alerts generated by CT #641)
- For every graph data and metric collected a reasonable set of alerts will be established both in the production and staging environments to evaluate the functionality of the CT from the app perspective (Increase alerts generated by CT #641)
- Only a tagged release will be deployable to the production
- Only a CT release tested over multiple days on the staging environment will be merged to the production
- KPI metric based tests will regularly be created and run over the staging environment and prohibit merging a version with staging fails to the production
- Metrics collected knowledge, including the per safe collected data, redeem status, relaying status,... will be monitored for the proactive community announcements about possible issues with peers observed by the CT
- Post mortem including the specific reasons, solutions and future mitigations will be posted to the community for every service outage
- Dashboard and CT gitops deployment will be colocated in the same repository to simplify the workflow (@ausias-armesto)(CT deployments should be moved to
products-ci#646) - Automatic promotion pipeline will be setup to promote the successful release verified by staging to production (@ausias-armesto ) to mitigate human error (i.e. staging was testing a different version than the one deployed...) (Automate version tagging and deployment in staging #647)
- @Jaguaras and @thewanderingeditor will sync on improvement of the announcement procedure towards the community
Reactions are currently unavailable