Net status by aojea · Pull Request #78 · google/dranet

aojea · 2025-05-19T16:06:56Z

Added as beta in 1.33

kubectl get resourceclaims dummy-interface-static-ip  -o=jsonpath='{.status.devices[0].networkData}'
{"hardwareAddress":"8a:09:cd:f9:62:be","interfaceName":"dummy0","ips":["169.254.23.23/32","fe80::8809:cdff:fef9:62be/64"]}

Change-Id: I4da4c1c874a40f0ee314f704b363351483bb2ec5

Change-Id: I3d50b4881c72a088ac72e598e8fe65665ddfd85a

Change-Id: Ic8fe1378d5f001db9c7dbfc1c5b5ff11e4d1cfd2

guptaNswati · 2025-08-22T17:18:14Z

@aojea can you show a diff of how the network interface resourceclaim looked like before adding the status. I am looking into porting this to our nvidia-dra-drivers and still exploring where it fits..

This is what a gpu resourceclaim looks like

status:
    allocation:
      devices:
        results:
        - adminAccess: null
          device: gpu-1
          driver: gpu.nvidia.com
          pool: sc-xx-xxx
          request: gpu

From the PR https://github.com/google/dranet/pull/78/files#diff-e8a7e777d80a14b455bdbf7aae3f28ad8082ffa0a06579e11cc1af741b5f98f7R271 i see your are writing the status whether Ready=True/not to status.allocation.devices.results based on NetworkDeviceReady which is kind of your way of health check. We need to add health checks (xid or ecc errors) to the GPUs before and after allocation to make sure only healthy GPUs are in the AllocatableDevices List.

It seems to be related to this KEP kubernetes/enhancements#4817 because of DRAResourceClaimDeviceStatus FG.

gauravkghildiyal · 2025-08-22T20:06:47Z

@guptaNswati I'm not sure if this answers your question, but here's what this roughly like today:

status:
  allocation:
    devices:
      results:
      - device: dummy0
        driver: dra.net
        pool: my-node1
        request: req-dummy
    nodeSelector:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - my-node1
  devices:
  - conditions:
    - lastTransitionTime: "2025-08-22T19:58:22Z"
      message: ""
      reason: NetworkDeviceReady
      status: "True"
      type: Ready
    - lastTransitionTime: "2025-08-22T19:58:22Z"
      message: ""
      reason: NetworkReady
      status: "True"
      type: NetworkReady
    device: dummy0
    driver: dra.net
    networkData:
      hardwareAddress: aa:bb:cc:dd:ee:ff
      interfaceName: dummy-renamed
      ips:
      - 10.0.0.1
    pool: my-node1
  reservedFor:
  - name: my-pod1
    resource: pods
    uid: 4f9062a1-8759-420a-bb74-699c9213256c

guptaNswati · 2025-08-25T20:08:35Z

@gauravkghildiyal its helpful. thank you.

guptaNswati · 2025-08-25T23:04:04Z

this doc explains more. i wanted to know how Network readiness is decided https://dranet.dev/docs/concepts/interface-status/

aojea added 2 commits May 18, 2025 15:44

do not bring interface addresses back to the root namespace

0ada319

Change-Id: I4da4c1c874a40f0ee314f704b363351483bb2ec5

update kind

a3c281b

Change-Id: I3d50b4881c72a088ac72e598e8fe65665ddfd85a

aojea force-pushed the net_status branch from 531f33c to 5c34651 Compare May 19, 2025 16:24

set network device status data

fde310e

Change-Id: Ic8fe1378d5f001db9c7dbfc1c5b5ff11e4d1cfd2

aojea force-pushed the net_status branch from 5c34651 to fde310e Compare May 19, 2025 17:55

aojea merged commit 8d2fb51 into google:main May 19, 2025
6 checks passed

guptaNswati mentioned this pull request Aug 26, 2025

GPUs: take device offline when unhealthy (build logic in go-nvlib) NVIDIA/k8s-dra-driver-gpu#360

Closed

guptaNswati mentioned this pull request Nov 12, 2025

Demonstrate how to update ResourceClaim status kubernetes-sigs/dra-example-driver#101

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Net status#78

Net status#78
aojea merged 3 commits intogoogle:mainfrom
aojea:net_status

aojea commented May 19, 2025

Uh oh!

Uh oh!

guptaNswati commented Aug 22, 2025

Uh oh!

gauravkghildiyal commented Aug 22, 2025

Uh oh!

guptaNswati commented Aug 25, 2025

Uh oh!

guptaNswati commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aojea commented May 19, 2025

Uh oh!

Uh oh!

guptaNswati commented Aug 22, 2025

Uh oh!

gauravkghildiyal commented Aug 22, 2025

Uh oh!

guptaNswati commented Aug 25, 2025

Uh oh!

guptaNswati commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants