Skip to content

Datadog HPA - wrong precision of metrics being returned. #7275

@RafPe

Description

@RafPe

Output of the info page (if this is a bug)

(Paste the output of the info page here)

Describe what happened:

  • created external metric

    ---
    apiVersion: datadoghq.com/v1alpha1
    kind: DatadogMetric
    metadata:
      name: service-example-requests
    spec:
      query: ceil(max:http.client.requests.count{service:service-example}.as_count().rollup(max,10))
    
  • created HPA using external metrics from DD

    ---
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    metadata:
      name: service-example-test
    spec:
      minReplicas: 1
      maxReplicas: 5
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: service-example
      metrics:
      - type: External
        external:
          metricName: datadogmetric@public:service-example-requests
          targetAverageValue: 2
    
  • Confirmed that metrics returns only integer value by calling the API endpoint for metrics

    {
      "kind": "ExternalMetricValueList",
      "apiVersion": "external.metrics.k8s.io/v1beta1",
      "metadata": {
        "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/xxxx/datadogmetric@xxxxxyyyyyyy-requests"
      },
      "items": [
        {
          "metricName": "datadogmetric@public:service-example-requests",
          "metricLabels": null,
          "timestamp": "2021-01-26T10:34:50Z",
          "value": "9"
        }
      ]
    }
    
  • confirmed external metric shows proper integer value

    ❯ k get datadogmetric service-example-requests
    NAME                         ACTIVE   VALID   VALUE   REFERENCES                              UPDATE TIME
    service-example-requests   True     True    15     public/service-example-test   2m15s
    

    result is incorrect value conversion which triggers HPA to scale when it cannot handle the non integer units

    ❯ k get hpa
    NAME                     REFERENCE                      TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
    service-example-test   Deployment/service-example   800m/2 (avg)   1         5         5          61m
    

Describe what you expected:
In this instance I expected to see value of integer ( in other case scenarios I would expect to see format of 0.8 for example ) and not suffixed unit

Steps to reproduce the issue:

Additional environment details (Operating System, Cloud provider, etc):

  • provider AWS
  • cluster agent info
    Getting the status from the agent.
    2021-01-26 10:47:16 UTC | CLUSTER | WARN | (pkg/util/log/log.go:480 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
    ==============================
    Datadog Cluster Agent (v1.9.0)
    ==============================
    
     Status date: 2021-01-26 10:47:17.122152 UTC
     Agent start: 2021-01-25 14:34:29.460245 UTC
     Pid: 1
     Go Version: go1.14.7
     Build arch: amd64
     Agent flavor: cluster_agent
     Check Runners: 4
     Log Level: INFO
    

Similar issue is still open under #4086

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions