-
Notifications
You must be signed in to change notification settings - Fork 280
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
What happened:
I have installed the 0.17.3 version of nfd using helm. I want to get the numa node topology, so I enabled the topology updater while installing. But numa details was not added in the label. I have multiple numa while checking with lscpu. Here is the log
sdp@fl4u42:~$ kubectl logs -f nfd-node-feature-discovery-topology-updater-g8wl7
I0430 12:06:36.208275 1 nfd-topology-updater.go:163] "Node Feature Discovery Topology Updater" version="v0.17.3" nodeName="fl4u42"
I0430 12:06:36.208337 1 component.go:34] [core]original dial target is: "/host-var/lib/kubelet-podresources/kubelet.sock"
I0430 12:06:36.208357 1 component.go:34] [core][Channel #1]Channel created
I0430 12:06:36.208371 1 component.go:34] [core][Channel #1]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"//host-var/lib/kubelet-podresources/kubelet.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
I0430 12:06:36.208375 1 component.go:34] [core][Channel #1]Channel authority set to "%2Fhost-var%2Flib%2Fkubelet-podresources%2Fkubelet.sock"
I0430 12:06:36.208511 1 component.go:34] [core][Channel #1]Resolver state updated: {
"Addresses": [
{
"Addr": "/host-var/lib/kubelet-podresources/kubelet.sock",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "/host-var/lib/kubelet-podresources/kubelet.sock",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses)
I0430 12:06:36.208535 1 component.go:34] [core][Channel #1]Channel switches to new LB policy "pick_first"
I0430 12:06:36.208562 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel created
I0430 12:06:36.208569 1 component.go:34] [core][Channel #1]Channel Connectivity change to CONNECTING
I0430 12:06:36.208577 1 component.go:34] [core][Channel #1]Channel exiting idle mode
2025/04/30 12:06:36 Connected to '"/host-var/lib/kubelet-podresources/kubelet.sock"'!
I0430 12:06:36.208679 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel Connectivity change to CONNECTING
I0430 12:06:36.208720 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel picks a new address "/host-var/lib/kubelet-podresources/kubelet.sock" to connect
I0430 12:06:36.208987 1 component.go:34] [core][Channel #1 SubChannel #2]Subchannel Connectivity change to READY
I0430 12:06:36.209010 1 component.go:34] [core][Channel #1]Channel Connectivity change to READY
I0430 12:06:36.209018 1 nfd-topology-updater.go:375] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-topology-updater.conf" config={"ExcludeList":null}
I0430 12:06:36.209061 1 podresourcesscanner.go:53] "watching all namespaces"
WARNING: failed to read int from file: open /host-sys/devices/system/node/node0/cpu0/online: no such file or directory
I0430 12:06:36.209247 1 metrics.go:44] "metrics server starting" port=":8081"
I0430 12:06:36.267613 1 component.go:34] [core][Server #4]Server created
I0430 12:06:36.267645 1 nfd-topology-updater.go:145] "gRPC health server serving" port=8082
I0430 12:06:36.267690 1 component.go:34] [core][Server #4 ListenSocket #5]ListenSocket created
I0430 12:07:36.217041 1 podresourcesscanner.go:137] "podFingerprint calculated" status=<
> processing node ""
> processing 15 pods
+ aibrix-system/aibrix-kuberay-operator-55f5ddcbf4-vqrwb
+ default/nfd-node-feature-discovery-worker-w5cvn
+ aibrix-system/aibrix-redis-master-7bff9b56f5-hs5k4
+ envoy-gateway-system/envoy-gateway-5bfc954ffc-k4tf7
+ kube-system/metrics-server-5985cbc9d7-vh9pb
+ aibrix-system/aibrix-controller-manager-6489d5b587-hj2bt
+ aibrix-system/aibrix-gateway-plugins-58bdc89d9c-q67pp
+ envoy-gateway-system/envoy-aibrix-system-aibrix-eg-903790dc-54766c9758-l68wh
+ kube-system/helm-install-traefik-crd-kz6kg
+ default/nfd-node-feature-discovery-topology-updater-g8wl7
+ kube-system/svclb-envoy-aibrix-system-aibrix-eg-903790dc-1f213b6c-fdvw4
+ aibrix-system/aibrix-gpu-optimizer-75df97858d-5zb5s
+ kube-system/helm-install-traefik-j89k5
+ aibrix-system/aibrix-metadata-service-66f45c85bc-k8pzx
+ kube-system/local-path-provisioner-5cf85fd84d-hgf67
= pfp0v0011be09f6ff65dbfe0
>
I0430 12:07:36.217093 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-kuberay-operator-55f5ddcbf4-vqrwb"
I0430 12:07:36.217115 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-kuberay-operator-55f5ddcbf4-vqrwb"
I0430 12:07:36.223315 1 podresourcesscanner.go:148] "scanning pod" podName="nfd-node-feature-discovery-worker-w5cvn"
I0430 12:07:36.223325 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="nfd-node-feature-discovery-worker-w5cvn"
I0430 12:07:36.225915 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-redis-master-7bff9b56f5-hs5k4"
I0430 12:07:36.225935 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-redis-master-7bff9b56f5-hs5k4"
I0430 12:07:36.228169 1 podresourcesscanner.go:148] "scanning pod" podName="envoy-gateway-5bfc954ffc-k4tf7"
I0430 12:07:36.228195 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="envoy-gateway-5bfc954ffc-k4tf7"
I0430 12:07:36.231774 1 podresourcesscanner.go:148] "scanning pod" podName="metrics-server-5985cbc9d7-vh9pb"
I0430 12:07:36.231788 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="metrics-server-5985cbc9d7-vh9pb"
I0430 12:07:36.233367 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-controller-manager-6489d5b587-hj2bt"
I0430 12:07:36.233374 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-controller-manager-6489d5b587-hj2bt"
I0430 12:07:36.234769 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-gateway-plugins-58bdc89d9c-q67pp"
I0430 12:07:36.234779 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-gateway-plugins-58bdc89d9c-q67pp"
I0430 12:07:36.236354 1 podresourcesscanner.go:148] "scanning pod" podName="envoy-aibrix-system-aibrix-eg-903790dc-54766c9758-l68wh"
I0430 12:07:36.236361 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="envoy-aibrix-system-aibrix-eg-903790dc-54766c9758-l68wh"
I0430 12:07:36.238011 1 podresourcesscanner.go:148] "scanning pod" podName="helm-install-traefik-crd-kz6kg"
I0430 12:07:36.238017 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="helm-install-traefik-crd-kz6kg"
I0430 12:07:36.239514 1 podresourcesscanner.go:148] "scanning pod" podName="nfd-node-feature-discovery-topology-updater-g8wl7"
I0430 12:07:36.239521 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="nfd-node-feature-discovery-topology-updater-g8wl7"
I0430 12:07:36.241754 1 podresourcesscanner.go:148] "scanning pod" podName="svclb-envoy-aibrix-system-aibrix-eg-903790dc-1f213b6c-fdvw4"
I0430 12:07:36.241760 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="svclb-envoy-aibrix-system-aibrix-eg-903790dc-1f213b6c-fdvw4"
I0430 12:07:36.422134 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-gpu-optimizer-75df97858d-5zb5s"
I0430 12:07:36.422165 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-gpu-optimizer-75df97858d-5zb5s"
I0430 12:07:36.621889 1 podresourcesscanner.go:148] "scanning pod" podName="helm-install-traefik-j89k5"
I0430 12:07:36.621923 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="helm-install-traefik-j89k5"
I0430 12:07:36.821266 1 podresourcesscanner.go:148] "scanning pod" podName="aibrix-metadata-service-66f45c85bc-k8pzx"
I0430 12:07:36.821294 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="aibrix-metadata-service-66f45c85bc-k8pzx"
I0430 12:07:37.022025 1 podresourcesscanner.go:148] "scanning pod" podName="local-path-provisioner-5cf85fd84d-hgf67"
I0430 12:07:37.022057 1 podresourcesscanner.go:231] "pod doesn't have devices" podName="local-path-provisioner-5cf85fd84d-hgf67"
I0430 12:07:37.432143 1 metrics.go:51] "stopping metrics server" port=":8081"
I0430 12:07:37.432207 1 metrics.go:45] "metrics server stopped" exitCode="http: Server closed"
E0430 12:07:37.432223 1 main.go:66] "error while running" err="failed to create NodeResourceTopology: the server could not find the requested resource (post noderesourcetopologies.topology.node.k8s.io)"
lscpu snippet
NUMA:
NUMA node(s): 8
NUMA node0 CPU(s): 0-13,112-125
NUMA node1 CPU(s): 14-27,126-139
NUMA node2 CPU(s): 28-41,140-153
NUMA node3 CPU(s): 42-55,154-167
NUMA node4 CPU(s): 56-69,168-181
NUMA node5 CPU(s): 70-83,182-195
NUMA node6 CPU(s): 84-97,196-209
NUMA node7 CPU(s): 98-111,210-223
Environment:
- Kubernetes version (use
kubectl version): v1.31.3+k3s1 - Cloud provider or hardware configuration: Onprem hardware, Intel(R) Xeon(R) Platinum 8480+, 512GB
- OS (e.g:
cat /etc/os-release): Ubuntu 23.04 - Kernel (e.g.
uname -a): 6.2.0-39-generic - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.