Skip to content

"Failed to scrape node" errors after node termination or newly created #1704

@jstefankowski

Description

@jstefankowski

What happened:
Metrics-server 0.7.1 logs "Failed to scrape node" up to 2 minutes after a karpenter node is terminated or created.

What you expected to happen:
Do not log errors when EKS node is not ready to be scraped, when it is either already in Shutting-down/Terminated state or newly created.

Anything else we need to know?:
Example: Node ip-10-108-180-80.us-east-2.compute.internal was shut down but metrics servers kept logging errors for 2 minutes, every 15 seconds (--metric-resolution=15s)

See example log:
{"ts":1755621037144.3723,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621052196.1824,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621067156.2139,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621082187.7395,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621097153.6978,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621112118.9421,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621127189.2578,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621142189.1562,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621157130.2832,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621172130.848,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "

Environment:

  • Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):
    EKS

  • Container Network Setup (flannel, ca3lico, etc.):
    calico, coredns, ebs-csi-controller, r53-external-dns, karpenter,

  • Kubernetes version (use kubectl version):
    1.31

  • Metrics Server manifest
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    labels:
    k8s-app: metrics-server
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:

  • apiGroups:
    • metrics.k8s.io
      resources:
    • pods
    • nodes
      verbs:
    • get
    • list
    • watch---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
      labels:
      k8s-app: metrics-server
      name: system:metrics-server
      rules:
  • apiGroups:
    • ""
      resources:
    • nodes/metrics
      verbs:
    • get
  • apiGroups:
    • ""
      resources:
    • pods
    • nodes
    • nodes/stats
    • namespaces
    • configmaps
      verbs:
    • get
    • list
    • watch

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:

  • name: https
    port: 443
    protocol: TCP
    targetPort: https
    selector:
    k8s-app: metrics-server

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
app: ${metrics_server_label_app}
owner: ${metrics_server_label_owner}
department: ${metrics_server_label_department}
team: ${metrics_server_label_team}
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
app: ${metrics_server_label_app}
owner: ${metrics_server_label_owner}
department: ${metrics_server_label_department}
team: ${metrics_server_label_team}
spec:
tolerations:
- key: "apps"
operator: "Equal"
value: "corecomponents"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- ondemand_components
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --logging-format=json
image: ${repo}:${image_tag}
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
hostNetwork: true
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100

  • Kubelet config:
    cat /etc/kubernetes/kubelet/config.json
    {
    "address": "0.0.0.0",
    "authentication": {
    "x509": {
    "clientCAFile": "/etc/kubernetes/pki/ca.crt"
    },
    "webhook": {
    "enabled": true,
    "cacheTTL": "2m0s"
    },
    "anonymous": {
    "enabled": false
    }
    },
    "authorization": {
    "mode": "Webhook",
    "webhook": {
    "cacheAuthorizedTTL": "5m0s",
    "cacheUnauthorizedTTL": "30s"
    }
    },
    "cgroupDriver": "systemd",
    "cgroupRoot": "/",
    "clusterDNS": [
    "172.20.0.10"
    ],
    "clusterDomain": "cluster.local",
    "containerRuntimeEndpoint": "unix:///run/containerd/containerd.sock",
    "evictionHard": {
    "memory.available": "100Mi",
    "nodefs.available": "10%",
    "nodefs.inodesFree": "5%"
    },
    "featureGates": {
    "RotateKubeletServerCertificate": true
    },
    "hairpinMode": "hairpin-veth",
    "kubeReserved": {
    "cpu": "90m",
    "ephemeral-storage": "1Gi",
    "memory": "893Mi"
    },
    "kubeReservedCgroup": "/runtime",
    "logging": {
    "verbosity": 2
    },
    "maxPods": 58,
    "protectKernelDefaults": true,
    "providerID": "aws:///us-east-2a/i-077c3432936e48858",
    "readOnlyPort": 0,
    "serializeImagePulls": false,
    "serverTLSBootstrap": true,
    "systemReservedCgroup": "/system",
    "tlsCipherSuites": [
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
    "TLS_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_RSA_WITH_AES_256_GCM_SHA384"
    ],
    "kind": "KubeletConfiguration",
    "apiVersion": "kubelet.config.k8s.io/v1beta1"

  • Metrics server logs:

{"ts":1755621037144.3723,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621042104.129,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-162-175.us-east-2.compute.internal"},"err":"Get "https://10.108.162.175:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621042162.523,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"err":"Get "https://10.108.54.32:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621052196.1824,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621067156.2139,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621082187.7395,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621097153.6978,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621112118.9421,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621127189.2578,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621142189.1562,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621157130.2832,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621172130.848,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-180-80.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.180.80:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621742166.749,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621757185.5354,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621762140.344,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"err":"Get "https://10.108.183.190:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621772153.691,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621787177.745,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621802095.0845,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621817115.6096,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621832201.8843,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-161-50.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.161.50:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621882168.9373,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"err":"Get "https://10.108.178.221:10250/metrics/resource\": remote error: tls: internal error"}
{"ts":1755621892168.5889,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621897133.5642,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-166-232.us-east-2.compute.internal"},"err":"Get "https://10.108.166.232:10250/metrics/resource\": dial tcp 10.108.166.232:10250: connect: connection refused"}
{"ts":1755621907144.9084,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621922085.854,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-166-232.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.166.232:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621952125.9766,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621967104.593,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-183-190.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.183.190:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755621997106.5833,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.178.221:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622012187.4453,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-178-221.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.178.221:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622182161.2349,"caller":"scraper/scraper.go:149","msg":"Failed to scrape node","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"err":"Get "https://10.108.54.32:10250/metrics/resource\": dial tcp 10.108.54.32:10250: connect: connection refused"}
{"ts":1755622207109.7744,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.54.32:10250/metrics/resource\": context deadline exceeded"}
{"ts":1755622222112.0771,"caller":"scraper/scraper.go:147","msg":"Failed to scrape node, timeout to access kubelet","node":{"name":"ip-10-108-54-32.us-east-2.compute.internal"},"timeout":"10s","err":"Get "https://10.108.54.32:10250/metrics/resource\": context deadline exceeded"}

  • Status of Metrics API:
    kubectl describe apiservice v1beta1.metrics.k8s.io

kubectl describe apiservice v1beta1.metrics.k8s.i
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: k8s-app=metrics-server
Annotations:
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2022-07-29T08:29:06Z
Resource Version: 1171407047
UID: 1c1f0d23-9bbb-4c3e-bb99-21a7b34a0115
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2025-08-18T12:38:24Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events:

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    Backlog (stale)

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions