Skip to content

Post Mortem Week #7 of 2025 #642

@Teebor-Choka

Description

@Teebor-Choka

Results of the internal discussion pasted for full transparency:

@tolbrino:
We had a discussion with @jeandemeusy & @thewanderingeditor, where I explained everything related to the CT issues seen and debugged last week, and we outlined the path forwards as:

  1. For every externally and internally observed CT KPI a metric will be collected (Increase alerts generated by CT #641)
  2. For every graph data and metric collected a reasonable set of alerts will be established both in the production and staging environments to evaluate the functionality of the CT from the app perspective (Increase alerts generated by CT #641)
  3. Only a tagged release will be deployable to the production
  4. Only a CT release tested over multiple days on the staging environment will be merged to the production
  5. KPI metric based tests will regularly be created and run over the staging environment and prohibit merging a version with staging fails to the production
  6. Metrics collected knowledge, including the per safe collected data, redeem status, relaying status,... will be monitored for the proactive community announcements about possible issues with peers observed by the CT
  7. Post mortem including the specific reasons, solutions and future mitigations will be posted to the community for every service outage
  8. Dashboard and CT gitops deployment will be colocated in the same repository to simplify the workflow (@ausias-armesto)(CT deployments should be moved to products-ci #646)
  9. Automatic promotion pipeline will be setup to promote the successful release verified by staging to production (@ausias-armesto ) to mitigate human error (i.e. staging was testing a different version than the one deployed...) (Automate version tagging and deployment in staging #647)
  10. @Jaguaras and @thewanderingeditor will sync on improvement of the announcement procedure towards the community

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions