Skip to content

Enhance tools section of incident spec #289

@Red-GV

Description

@Red-GV

Currently, the tools section is as such:

tools:
  category: sre
  selected:
    - kubernetes-topology-monitor

However, based on the grouping here, we may benefit from modifying this to be more like this.

tools:
  finops:
    enabled: boolean
    tools: array [opt: opencost]
  k8s_autoscaling:
     enabled: boolean
  k8s_events:
     enabled: boolean
     tools: array [opt: clickhouse]
  logs:
     enabled: boolean
     tools: array [opt: clickhouse]
  metrics:
     tools: array [opt: prometheus]
  traces:
    enabled: boolean
    tools: array [opt: clickhouse, jaeger]

There would be no metrics section as metrics must be enabled in order to trigger the Prometheus Alerts. However, expanding the alert process to other tools, may relax or cause a rework here. Maybe worth adding there?

I'm not sure were to put in the options for the metrics server (maybe we just always deploy) or OpenCost. Perhaps Chaos-Mesh will no longer be a selectable option, but toggled on based on where or not there is a Chaos Mesh fault that is being injected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededsreRelated to SRE scenarios

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions