|
| 1 | +:orphan: |
| 2 | + |
| 3 | +.. _aopt-metric-reference: |
| 4 | + |
| 5 | +.. include:: /private-preview/aopt/toc.rst |
| 6 | + :start-after: :orphan: |
| 7 | + |
| 8 | +********************************************************** |
| 9 | +Metric reference |
| 10 | +********************************************************** |
| 11 | + |
| 12 | +Application Optimization's workload analysis produces the following metics. All metrics have at least the same dimensions as the workload metrics (for example ``aws-region`` and so on) and use the same attribute names and values. |
| 13 | + |
| 14 | +.. find out what the prefix is and add it to the metric name. ask daniel for the name. |
| 15 | +
|
| 16 | +All metric names have a prefix of either ``sf`` or ``o11y``. |
| 17 | + |
| 18 | +.. note:: |
| 19 | + Memory is specified in GiB. CPU is specified in cores. |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +.. list-table:: |
| 24 | + :widths: 40 5 55 |
| 25 | + :width: 100% |
| 26 | + :header-rows: 1 |
| 27 | + |
| 28 | + - |
| 29 | + |
| 30 | + - **Metric** |
| 31 | + - **Scope\*** |
| 32 | + - **Description** |
| 33 | + - |
| 34 | + |
| 35 | + - ``sf.report.available`` |
| 36 | + - W |
| 37 | + - Synthetic metric. The value is ``0`` for failed, ``1`` for success. This metric may have additional attributes that represent the report outcome as a whole. At least a ``aopt.profile_report.error_reason`` code. |
| 38 | + - |
| 39 | + |
| 40 | + - ``sf.report.window_days`` |
| 41 | + - W |
| 42 | + - Number of days (possibly fractional) that were considered in the analysis. In general, this is the smaller of 14 and the number of days since the last resource configuration change for the workload. This is used to determine the validity and confidence level of the report. |
| 43 | + - |
| 44 | + |
| 45 | + - ``sf.report.coverage_ratio`` |
| 46 | + - W |
| 47 | + - Window coverage with metrics: the ratio of number of actual metrics values found compared to the number of timeslots in the window. This should represent the worst case value (in other words, the minimum of the coverage of each input timeseries we use). This is used to determine the validity and confidence level of the report. |
| 48 | + - |
| 49 | + |
| 50 | + - ``sf.report.average_replicas`` |
| 51 | + - W |
| 52 | + - Average number of replicas during the analysis window. Does not include pods that allocate resources, such as those scheduled but not started. |
| 53 | + - |
| 54 | + |
| 55 | + - ``sf.report.pod.qos_class`` |
| 56 | + - W |
| 57 | + - Pod's quality of service (QoS) class, as defined in Kubernetes docs, encoded as an integer. |
| 58 | + - |
| 59 | + |
| 60 | + - ``sf.report.footprint.cpu_cores`` |
| 61 | + - W |
| 62 | + - Number of the allocated CPU cores for all replicas (averaged based on ``average_replicas``). Does not account for usage above request (bursting). |
| 63 | + - |
| 64 | + |
| 65 | + - ``sf.report.footprint.memory_gib`` |
| 66 | + - W |
| 67 | + - GiB allocated memory for all replicas (averaged based on the average_replicas). Does not account for usage above request (bursting). |
| 68 | + - |
| 69 | + |
| 70 | + - ``sf.report.efficiency_rate`` |
| 71 | + - W |
| 72 | + - Resource efficiency rate, as percent. Weighted average of resource utilization of CPU and memory. CPU and memory weights according to AWS on-demand cost. Capped at 100%, rounded to whole percent. |
| 73 | + - |
| 74 | + |
| 75 | + - ``sf.report.starvation_risk`` |
| 76 | + - W |
| 77 | + - Resource starvation risk: Minimal, Low, Medium, High (encoded as ``0``, ``1``, ``2``, ``3`` respectively). |
| 78 | + Risk levels defined elsewhere: |
| 79 | + - Minimal: no starvation detected |
| 80 | + - Low: could benefit from more overhead |
| 81 | + - Medium: actually bursting but not being limited |
| 82 | + - High: CPU throttled and/or at resource limits. |
| 83 | + - |
| 84 | + |
| 85 | + - ``sf.recommendation.available`` |
| 86 | + - W |
| 87 | + - Indicates whether a recommendation is available for at least one container. This value is ``0`` or ``1``. |
| 88 | + - |
| 89 | + |
| 90 | + - ``sf.recommendation.confidence_level`` |
| 91 | + - W |
| 92 | + - Recommendations overall confidence level: Unknown, Low, Medium, High (encoded as ``0``, ``1``, ``2``, ``3`` respectively). Aggregated from ``container.confidence_level``, by taking the lowest confidence value (or the confidence value of the main or largest container). |
| 93 | + - |
| 94 | + |
| 95 | + - ``sf.recommendation.container.available`` |
| 96 | + - C |
| 97 | + - Indicates whether a recommendation is available: ``0`` or ``1``. A recommendation that matches the baseline is considered available. |
| 98 | + - |
| 99 | + |
| 100 | + - ``sf.recommendation.container.confidence_level`` |
| 101 | + - C |
| 102 | + - Recommendation confidence level: Unknown, Low, Medium, High (encoded as ``0``, ``1``, ``2``, ``3`` respectively). |
| 103 | + - |
| 104 | + |
| 105 | + - ``sf.recommendation.container.cpu_request`` |
| 106 | + - C |
| 107 | + - Per-container recommendation. |
| 108 | + - |
| 109 | + |
| 110 | + - ``sf.recommendation.container.memory_request`` |
| 111 | + - C |
| 112 | + - Per-container recommendation. |
| 113 | + - |
| 114 | + |
| 115 | + - ``sf.recommendation.container.cpu_limit`` |
| 116 | + - C |
| 117 | + - Per-container recommendation. |
| 118 | + - |
| 119 | + |
| 120 | + - ``sf.recommendation.container.memory_limit`` |
| 121 | + - C |
| 122 | + - Per-container recommendation. |
| 123 | + - |
| 124 | + |
| 125 | + - ``sf.recommendation.footprint.cpu_cores`` |
| 126 | + - W |
| 127 | + - Total footprint of recommendation. |
| 128 | + - |
| 129 | + |
| 130 | + - ``sf.recommendation.footprint.memory_gib`` |
| 131 | + - W |
| 132 | + - Total footprint of recommendation. |
| 133 | + - |
| 134 | + |
| 135 | + - ``sf.recommendation.footprint_change.cpu_cores`` |
| 136 | + - W |
| 137 | + - Footprint change of CPU requests, assuming the CPU request recommendations are applied for all containers. May be ``0``, ``missing``, or ``NaN`` if requests are not defined. |
| 138 | + - |
| 139 | + |
| 140 | + - ``sf.recommendation.footprint_change.memory_gib`` |
| 141 | + - W |
| 142 | + - Footprint change of memory requests, assuming the memory request recommendations are applied for all containers. May be ``0``, ``missing``, or ``NaN`` if requests are not defined. |
| 143 | + - |
| 144 | + |
| 145 | + - ``sf.baseline.pod.cpu_request`` |
| 146 | + - W |
| 147 | + - Pod-level sum of the baseline for the configuration being analyzed. The ``request`` for a container is considered defined if its ``limit`` is defined, even if the ``request`` is reported as missing or ``0``. |
| 148 | + - |
| 149 | + |
| 150 | + - ``sf.baseline.pod.memory_request`` |
| 151 | + - W |
| 152 | + - Pod-level sum of the baseline for the configuration being analyzed. The ``request`` for a container is considered defined if its ``limit`` is defined, even if the ``request`` is reported as missing or ``0``. |
| 153 | + - |
| 154 | + |
| 155 | + - ``sf.baseline.pod.cpu_limit`` |
| 156 | + - W |
| 157 | + - Pod-level sum of the baseline for the configuration being analyzed. This value is ``0`` or ``NaN`` if at least one ``limit`` is missing; as a result, the whole pod doesn't have a ``limit`` for this resource. |
| 158 | + - |
| 159 | + |
| 160 | + - ``sf.baseline.pod.memory_limit`` |
| 161 | + - W |
| 162 | + - Pod-level sum of the baseline for the configuration being analyzed. This value is ``0`` or ``NaN`` if at least one ``limit`` is missing; as a result, the whole pod doesn't have a ``limit`` for this resource. |
| 163 | + - |
| 164 | + |
| 165 | + - ``sf.baseline.container.cpu_request`` |
| 166 | + - C |
| 167 | + - Per-container baseline for the configuration being analyzed. |
| 168 | + - |
| 169 | + |
| 170 | + - ``sf.baseline.container.memory_request`` |
| 171 | + - C |
| 172 | + - Per-container baseline for the configuration being analyzed. |
| 173 | + - |
| 174 | + |
| 175 | + - ``sf.baseline.container.cpu_limit`` |
| 176 | + - C |
| 177 | + - Per-container baseline for the configuration being analyzed. |
| 178 | + - |
| 179 | + |
| 180 | + - ``sf.baseline.container.memory_limit`` |
| 181 | + - C |
| 182 | + - Per-container baseline for the configuration being analyzed. |
| 183 | + |
| 184 | + |
| 185 | + |
| 186 | +\*Scope is W for workload and C for container. See :ref:`Dimensions <aopt-derived-metrics_dimensions>` for attributes that apply to each scope. |
| 187 | + |
| 188 | + |
| 189 | + |
| 190 | +.. _aopt-derived-metrics_dimensions: |
| 191 | + |
| 192 | +Dimensions |
| 193 | +========================================================== |
| 194 | + |
| 195 | + |
| 196 | +Workload-level attributes |
| 197 | +---------------------------------------------------------- |
| 198 | + |
| 199 | +The following dimensions are applied to all metrics (both workload and container scope): |
| 200 | + |
| 201 | +.. list-table:: |
| 202 | + :widths: 40 60 |
| 203 | + :width: 100% |
| 204 | + :header-rows: 1 |
| 205 | + |
| 206 | + - |
| 207 | + |
| 208 | + - **Attribute name** |
| 209 | + - **Description** |
| 210 | + - |
| 211 | + |
| 212 | + - ``environment`` |
| 213 | + - Splunk Observability Cloud-specific attribute. |
| 214 | + - |
| 215 | + |
| 216 | + - ``k8s.cluster.name`` |
| 217 | + - Kubernetes cluster name. |
| 218 | + - |
| 219 | + |
| 220 | + - ``k8s.namespace.name`` |
| 221 | + - |
| 222 | + - |
| 223 | + |
| 224 | + - ``k8s.workload.name`` |
| 225 | + - This is our own generic workload info. |
| 226 | + - |
| 227 | + |
| 228 | + - ``k8s.workload.kind`` |
| 229 | + - Kind of workload: ``deployment``, ``statefulset`` or ``daemonset``. This is our own generic workload info. |
| 230 | + - |
| 231 | + |
| 232 | + - ``k8s.workload.uid`` |
| 233 | + - This is our own generic workload info. |
| 234 | + - |
| 235 | + |
| 236 | + - ``k8s.deployment.name`` |
| 237 | + - Present only for ``workload.kind`` == ``deployment``. Same as ``k8s.workload.name``. |
| 238 | + - |
| 239 | + |
| 240 | + - ``k8s.deployment.uid`` |
| 241 | + - Present only for ``workload.kind`` == ``deployment``. Same as ``k8s.object_uid``. |
| 242 | + - |
| 243 | + |
| 244 | + - ``k8s.statefulset.name`` |
| 245 | + - Present only for ``workload.kind`` == ``statefulset``. Same as ``k8s.workload.name``. |
| 246 | + - |
| 247 | + |
| 248 | + - ``k8s.statefulset.uid`` |
| 249 | + - Present only for ``workload.kind`` == ``statefulset``. Same as ``k8s.object_uid``. |
| 250 | + - |
| 251 | + |
| 252 | + - ``k8s.daemonset.name`` |
| 253 | + - Present only for ``workload.kind`` == ``daemonset``. Same as ``k8s.workload.name``. |
| 254 | + - |
| 255 | + |
| 256 | + - ``k8s.daemonset.uid`` |
| 257 | + - Present only for ``workload.kind`` == ``daemonset``. Same as ``k8s.object_uid``. |
| 258 | + - |
| 259 | + |
| 260 | + - ``k8s.pod.qos`` |
| 261 | + - Pod-level QoS |
| 262 | + - |
| 263 | + |
| 264 | + - ``aopt.profiler_report.success`` |
| 265 | + - Whether the analysis was successful and a report is provided. Values: ``0`` or ``1``. |
| 266 | + - |
| 267 | + |
| 268 | + - ``aopt.instant_recommendation.present`` |
| 269 | + - Whether there is a valid recommendation. Values: ``0`` or ``1``. |
| 270 | + |
| 271 | + |
| 272 | + |
| 273 | + |
| 274 | + |
| 275 | +Container-level attributes |
| 276 | +---------------------------------------------------------- |
| 277 | + |
| 278 | +The following additional dimensions are applied to per-container metrics (in other words, any metric named ``*.container.*``): |
| 279 | + |
| 280 | + |
| 281 | +.. list-table:: |
| 282 | + :widths: 40 60 |
| 283 | + :width: 100% |
| 284 | + :header-rows: 1 |
| 285 | + |
| 286 | + - |
| 287 | + |
| 288 | + - **Attribute name** |
| 289 | + - **Description** |
| 290 | + |
| 291 | + - |
| 292 | + |
| 293 | + - ``k8s.container.name`` |
| 294 | + - |
| 295 | + - |
| 296 | + |
| 297 | + - ``k8s.container.pseudo_qos`` |
| 298 | + - Container-level pseudo-QoS. |
| 299 | + |
| 300 | + |
| 301 | +.. note:: |
| 302 | + This set of additional attributes matches the set of additional attributes that per-container ``k8s`` metrics (such as memory and CPU utilization), provide on top of workload-level metrics (such as replica count). This excludes metadata attributes that are per pod instance (such ``as k8s.replica.set`` and ``k8s.pod.id``, since we always aggregate metrics across instances), as well as per container instance (such as ``k8s.container.id``) for the same reason. |
| 303 | + |
0 commit comments