Multiple GPUs are visible in a container despite setting limits in Kubernetes manifest

#### Summary

Multiple GPUs are visible in a container despite setting limits in Kubernetes manifest
#### What Should Happen Instead?

Each GPU should be exclusively allocated to each container that requests a GPU and it should run on that GPU only.

#### Reproduction Steps

Here's the setup in which I am getting this problem.
I am using microk8s 1.32.3. I have 2 nodes in my cluster. The first is my local system which is the master node and the other is a server with 2 GPUs (NVIDIA RTX 3080) that is a worker. I am deploying DeepStream instances in a Pod using the manifest shown below. For multiple deployments I am changing the names and necessary labels as `app-deepstream-1`, `app-deepstream-2` and so on.
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: app-deepstream-1                                # modify
  labels:
    name: app-deepstream-1                              # modify
    family: app-deepstream
spec:
  restartPolicy: Always
  runtimeClassName: nvidia
  nodeSelector:
    nvidia.com/gpu.present: "true"
  containers:
    - name: app-ai
      image: 192.168.65.106:32000/nvcr.io/nvidia/deepstream
      securityContext:
        privileged: true
      imagePullPolicy: IfNotPresent
      tty: true
      resources:
        limits:
          nvidia.com/gpu: 1
      workingDir: /opt/app/ai-app-prod/
      command: ["bash", "run.sh"]
      volumeMounts:
        - name: app-volume
          mountPath: /opt/app/
  volumes:
    - name: app-volume
      persistentVolumeClaim:
        claimName: app-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: app-deepstream-svc-1                           # modify
  labels:
    name: app-deepstream-svc-1                         # modify
    family: app-deepstream
spec:
  type: NodePort
  selector:
    name: app-deepstream-1                             # modify
  ports:
    - port: 9000           # ClusterIP port
      targetPort: 9000     # Container port
      protocol: TCP
```

I have enabled gpu and registry add-ons in microk8s (from master node). The node with GPU is correctly labelled and I have checked that mig capability is marked as false (this will become important later).

I needed a custom auto-scaler that scales the number of Pods up or down based on the number of streams running in an instance. I have used `Python 3.10` with the `kubernetes` package for this. The upscale script and the downscale script just modify the manifest template and deploy the Pod and the service in the cluster. I face no issues when I run these scripts at all. This is just a wrapper of the `v1.create_namespaced_pod()` and `v1.delete_namespaced_pod()` provided by the Kubernetes library in a try-except block.

**Note**: The DeepStream app config (present in the mount) specifies the GPU index which is set to 0 with the expectation that a single GPU will be assigned and hence visible to the container.

**The Pods and respective services are deployed without any problems even when I deploy multiple Pods. Once multiple pods are deployed and in the `Running` phase I checked `nvidia-smi` in my GPU node. I found out that both DeepStream apps in the 2 pods that are deployed are running on the same GPU (0).**

The interesting thing is when I check `microk8s kubectl describe <node>` I can see that 2 GPUs have been allocated as shown below.
```
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                365m (2%)   0 (0%)
  memory             320Mi (0%)  5632Mi (4%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
  nvidia.com/gpu     2           2
```

Now when I go inside a container and check with `nvidia-smi` I see that 2 GPUs are visible as shown below.
```
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 535.230.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   34C    P8              16W / 340W |    564MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080        Off | 00000000:03:00.0 Off |                  N/A |
|  0%   30C    P8              18W / 340W |     12MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
```
Each GPU should be exclusively allocated to each container that requests a GPU and it should run on that GPU only. I thought GPUs are not shared by default unless we enable MIGs or time-slicing explicitly.

 1. Why is this happening? 
 2. Are there any changes to microk8s or the manifest of the Pod that might resolve this issue?

#### Introspection Report

Here's the tarball generated using the command.

[inspection-report-20250619_165030.tar.gz](https://github.com/user-attachments/files/20818184/inspection-report-20250619_165030.tar.gz)

#### Can you suggest a fix?


#### Are you interested in contributing with a fix?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple GPUs are visible in a container despite setting limits in Kubernetes manifest #5119

Summary

What Should Happen Instead?

Reproduction Steps

Introspection Report

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple GPUs are visible in a container despite setting limits in Kubernetes manifest #5119

Description

Summary

What Should Happen Instead?

Reproduction Steps

Introspection Report

Can you suggest a fix?

Are you interested in contributing with a fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions