Skip to content
This repository was archived by the owner on Dec 9, 2025. It is now read-only.

Net status#78

Merged
aojea merged 3 commits intogoogle:mainfrom
aojea:net_status
May 19, 2025
Merged

Net status#78
aojea merged 3 commits intogoogle:mainfrom
aojea:net_status

Conversation

@aojea
Copy link
Collaborator

@aojea aojea commented May 19, 2025

Added as beta in 1.33

kubectl get resourceclaims dummy-interface-static-ip  -o=jsonpath='{.status.devices[0].networkData}'
{"hardwareAddress":"8a:09:cd:f9:62:be","interfaceName":"dummy0","ips":["169.254.23.23/32","fe80::8809:cdff:fef9:62be/64"]}

aojea added 2 commits May 18, 2025 15:44
Change-Id: I4da4c1c874a40f0ee314f704b363351483bb2ec5
Change-Id: I3d50b4881c72a088ac72e598e8fe65665ddfd85a
Change-Id: Ic8fe1378d5f001db9c7dbfc1c5b5ff11e4d1cfd2
@aojea aojea merged commit 8d2fb51 into google:main May 19, 2025
6 checks passed
@guptaNswati
Copy link

@aojea can you show a diff of how the network interface resourceclaim looked like before adding the status. I am looking into porting this to our nvidia-dra-drivers and still exploring where it fits..

This is what a gpu resourceclaim looks like

status:
    allocation:
      devices:
        results:
        - adminAccess: null
          device: gpu-1
          driver: gpu.nvidia.com
          pool: sc-xx-xxx
          request: gpu

From the PR https://github.com/google/dranet/pull/78/files#diff-e8a7e777d80a14b455bdbf7aae3f28ad8082ffa0a06579e11cc1af741b5f98f7R271 i see your are writing the status whether Ready=True/not to status.allocation.devices.results based on NetworkDeviceReady which is kind of your way of health check. We need to add health checks (xid or ecc errors) to the GPUs before and after allocation to make sure only healthy GPUs are in the AllocatableDevices List.

It seems to be related to this KEP kubernetes/enhancements#4817 because of DRAResourceClaimDeviceStatus FG.

@gauravkghildiyal
Copy link
Member

@guptaNswati I'm not sure if this answers your question, but here's what this roughly like today:

status:
  allocation:
    devices:
      results:
      - device: dummy0
        driver: dra.net
        pool: my-node1
        request: req-dummy
    nodeSelector:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - my-node1
  devices:
  - conditions:
    - lastTransitionTime: "2025-08-22T19:58:22Z"
      message: ""
      reason: NetworkDeviceReady
      status: "True"
      type: Ready
    - lastTransitionTime: "2025-08-22T19:58:22Z"
      message: ""
      reason: NetworkReady
      status: "True"
      type: NetworkReady
    device: dummy0
    driver: dra.net
    networkData:
      hardwareAddress: aa:bb:cc:dd:ee:ff
      interfaceName: dummy-renamed
      ips:
      - 10.0.0.1
    pool: my-node1
  reservedFor:
  - name: my-pod1
    resource: pods
    uid: 4f9062a1-8759-420a-bb74-699c9213256c

@guptaNswati
Copy link

@gauravkghildiyal its helpful. thank you.

@guptaNswati
Copy link

this doc explains more. i wanted to know how Network readiness is decided https://dranet.dev/docs/concepts/interface-status/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants