-
Notifications
You must be signed in to change notification settings - Fork 216
Description
Describe the bug
gpu.intel.com/product only supports one type, such as 'Flex_140' or 'Flex_170'; in the case when both types of cards are installed only 'Flex_140' will be written as the label value (due to order in https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/deployments/nfd/overlays/node-feature-rules/platform-labeling-rules.yaml). This is important when selecting for differing pod preferences, ie a pod may have better performance on Flex 170 than Flex 140, so an affinity for scheduling for nodes with Flex 170 is preferred. Other logic may handle the actual consumption of that resource.
To Reproduce
Steps to reproduce the behavior:
- Node has Flex 140 card installed and Flex 170 card installed
- Device plugins is installed per instructions
- apply
platform-labeling-rules.yaml - Only
gpu.intel.com/product=Flex_140is in labels
Expected behavior
The values for gpu.intel.com/product should follow the other labelled values, such as gpu.intel.com/device-id.0380-56c0.present=true or
gpu.intel.com/device-id.0380-56c1.present=true. At present, pods may select for these labels, which are not transparent to Intel product naming. Suggest something like gpu.intel.com/product.Flex_140.present=true etc. which could indicate any presence of commonly understood Intel GPU product names.
System (please complete the following information):
- OpenShift 4.13.11
- Device plugins version: v0.28.0
- Hardware info: SPR with Flex 140 + Flex 170 in same system
Additional context
n/a