Skip to content

Commit 05880d1

Browse files
authored
New Sysman API for VF telemetry (#254)
Resolves: #248 Signed-off-by: Kumar, Sanil <[email protected]>
1 parent d00e2bb commit 05880d1

File tree

3 files changed

+361
-1
lines changed

3 files changed

+361
-1
lines changed
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
<%
2+
import re
3+
from templates import helper as th
4+
%><%
5+
OneApi=tags['$OneApi']
6+
x=tags['$x']
7+
X=x.upper()
8+
s=tags['$s']
9+
S=s.upper()
10+
%>
11+
:orphan:
12+
13+
.. _ZES_experimental_virtual_function_management:
14+
15+
========================================
16+
Virtual Function Management Extension
17+
========================================
18+
19+
API
20+
----
21+
22+
* Functions
23+
24+
* ${s}DeviceEnumActiveVFExp
25+
* ${s}VFManagementGetVFPropertiesExp
26+
* ${s}VFManagementGetVFMemoryUtilizationExp
27+
* ${s}VFManagementGetVFEngineUtilizationExp
28+
* ${s}VFManagementSetVFTelemetryModeExp
29+
* ${s}VFManagementSetVFTelemetrySamplingIntervalExp
30+
31+
* Enumerations
32+
33+
* ${s}_vf_management_exp_version_t
34+
* ${s}_vf_info_mem_type_exp_flags_t
35+
* ${s}_vf_info_util_exp_flags_t
36+
37+
* Structures
38+
39+
* ${s}_vf_exp_properties_t
40+
* ${s}_vf_util_mem_exp_t
41+
* ${s}_vf_util_engine_exp_t
42+
43+
Virtual Function Management
44+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
45+
This feature adds the ability to retrieve telemetry from PF domain for monitoring per VF memory and engine utilization.
46+
This telemetry is used to determine if a VM has oversubscribed GPU memory or observe engine business for a targeted workload.
47+
If VF has no activity value to report, then implementation shall reflect that appropriately in ${s}_vf_util_engine_exp_t struct so that percentage
48+
calculation results in value of 0.
49+
50+
The following pseudo-code demonstrates a sequence for obtaining the engine activity for all Virtual Functions from Physical Function environment:
51+
52+
.. parsed-literal::
53+
54+
// Gather count of VF handles
55+
uint32_t numVf = 0;
56+
${s}_vf_exp_properties_t vfProps {};
57+
${s}DeviceEnumActiveVFExp(hDevice, &numVf, nullptr);
58+
59+
// Allocate memory for vf handles and call back in to gather handles
60+
std::vector<${s}_vf_handle_t> vfs(numVf, nullptr);
61+
${s}DeviceEnumActiveVFExp(hDevice, &numVf, vfs.data());
62+
63+
// Gather VF properties
64+
std::vector <${s}_vf_exp_properties_t> vfProps(numVf);
65+
for (uint32_t i = 0; i < numVf; i++) {
66+
${s}VFManagementGetVFPropertiesExp(vfs[i], &vfProps[i]);
67+
}
68+
69+
// Detect the info types a particular VF supports
70+
71+
// Using VF# 0 to demonstrate how to detect engine info type and query engine util info
72+
${s}_vf_handle_t activeVf = vfs[0];
73+
uint32_t count = 1;
74+
if (vfProps[0].flags & ZES_VF_INFO_ENGINE) {
75+
${s}_vf_util_engine_exp_t engineUtil0 = {};
76+
${s}VFManagementGetVFEngineUtilizationExp(activeVf, &count, &engineUtil0);
77+
sleep(1)
78+
${s}_vf_util_engine_exp_t engineUtil1 = {};
79+
${s}VFManagementGetVFEngineUtilizationExp(activeVf, &count, &engineUtil1);
80+
// Use formula to calculate engine utilization % based on the 2 snapshots above
81+
}
82+
83+
// Demonstrate using setter to switch off Engine telemetry for VF0 and then check if Getter returns INVALID
84+
${s}VFManagementSetVFTelemetryModeExp(activeVf, ZES_VF_INFO_ENGINE, false);
85+
${x}_result_t res = ${s}VFManagementGetVFEngineUtilizationExp(activeVf, &count, &engineUtil0);
86+
if (res != ZES_RESULT_SUCCESS) {
87+
printf("Engine utilization successfully disabled for VF");
88+
}

scripts/sysman/common.yml

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,12 @@ class: $sOverclock
103103
name: "$s_overclock_handle_t"
104104
version: "1.5"
105105
--- #--------------------------------------------------------------------------
106+
type: handle
107+
desc: "Handle for a Sysman virtual function management domain"
108+
class: $sVFManagement
109+
name: "$s_vf_handle_t"
110+
version: "1.9"
111+
--- #--------------------------------------------------------------------------
106112
type: enum
107113
desc: "Defines structure types"
108114
name: $s_structure_type_t
@@ -226,7 +232,19 @@ etors:
226232
- name: SUB_DEVICE_EXP_PROPERTIES
227233
value: "0x00020004"
228234
desc: $s_subdevice_exp_properties_t
229-
version: "1.9"
235+
version: "1.9"
236+
- name: VF_EXP_PROPERTIES
237+
value: "0x00020005"
238+
desc: $s_vf_exp_properties_t
239+
version: "1.9"
240+
- name: VF_UTIL_MEM_EXP
241+
value: "0x00020006"
242+
desc: $s_vf_util_mem_exp_t
243+
version: "1.9"
244+
- name: VF_UTIL_ENGINE_EXP
245+
value: "0x00020007"
246+
desc: $s_vf_util_engine_exp_t
247+
version: "1.9"
230248
--- #-------------------------------------------------------------------------
231249
type: struct
232250
desc: "Base for all properties types"
Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
#
2+
# Copyright (C) 2024 Intel Corporation
3+
#
4+
# SPDX-License-Identifier: MIT
5+
#
6+
# See YaML.md for syntax definition
7+
#
8+
--- #--------------------------------------------------------------------------
9+
type: header
10+
desc: "Intel $OneApi Level-Zero Sysman Extension APIs for Virtual Function Management Properties"
11+
version: "1.9"
12+
--- #--------------------------------------------------------------------------
13+
type: macro
14+
desc: "Virtual Function Management Extension Name"
15+
version: "1.9"
16+
name: $S_VIRTUAL_FUNCTION_MANAGEMENT_EXP_NAME
17+
value: '"$XS_experimental_virtual_function_management"'
18+
--- #--------------------------------------------------------------------------
19+
type: enum
20+
desc: "Virtual Function Management Extension Version(s)"
21+
version: "1.9"
22+
name: $s_vf_management_exp_version_t
23+
etors:
24+
- name: "1_0"
25+
value: "$X_MAKE_VERSION( 1, 0 )"
26+
desc: "version 1.0"
27+
--- #--------------------------------------------------------------------------
28+
type: enum
29+
desc: "Virtual function memory types"
30+
version: "1.9"
31+
class: $sVFManagement
32+
name: $s_vf_info_mem_type_exp_flags_t
33+
etors:
34+
- name: MEM_TYPE_SYSTEM
35+
desc: "System memory"
36+
- name: MEM_TYPE_DEVICE
37+
desc: "Device local memory"
38+
--- #--------------------------------------------------------------------------
39+
type: enum
40+
desc: "Virtual function utilization flag bit fields"
41+
version: "1.9"
42+
class: $sVFManagement
43+
name: $s_vf_info_util_exp_flags_t
44+
etors:
45+
- name: INFO_NONE
46+
desc: "No info associated with virtual function"
47+
- name: INFO_MEM_CPU
48+
desc: "System memory utilization associated with virtual function"
49+
- name: INFO_MEM_GPU
50+
desc: "Device memory utilization associated with virtual function"
51+
- name: INFO_ENGINE
52+
desc: 'Engine utilization associated with virtual function'
53+
--- #--------------------------------------------------------------------------
54+
type: struct
55+
desc: "Virtual function management properties"
56+
version: "1.9"
57+
class: $sVFManagement
58+
name: $s_vf_exp_properties_t
59+
base: $s_base_properties_t
60+
members:
61+
- type: $s_pci_address_t
62+
name: "address"
63+
desc: "[out] Virtual function BDF address"
64+
- type: $s_uuid_t
65+
name: uuid
66+
desc: "[out] universal unique identifier of the device"
67+
- type: $s_vf_info_util_exp_flags_t
68+
name: "flags"
69+
desc: "[out] utilization flags available. May be 0 or a valid combination of $s_vf_info_util_exp_flag_t."
70+
--- #--------------------------------------------------------------------------
71+
type: struct
72+
desc: "Provides memory utilization values for a virtual function"
73+
version: "1.9"
74+
class: $sVFManagement
75+
name: $s_vf_util_mem_exp_t
76+
base: $s_base_state_t
77+
members:
78+
- type: $s_vf_info_mem_type_exp_flags_t
79+
name: "memTypeFlags"
80+
desc: "[out] Memory type flags."
81+
- type: uint64_t
82+
name: "free"
83+
desc: "[out] Free memory size in bytes."
84+
- type: uint64_t
85+
name: "size"
86+
desc: "[out] Total allocatable memory in bytes."
87+
- type: uint64_t
88+
name: "timestamp"
89+
desc: "[out] Wall clock time from VF when value was sampled."
90+
--- #--------------------------------------------------------------------------
91+
type: struct
92+
desc: "Provides engine utilization values for a virtual function"
93+
version: "1.9"
94+
class: $sVFManagement
95+
name: $s_vf_util_engine_exp_t
96+
base: $s_base_state_t
97+
members:
98+
- type: $s_engine_group_t
99+
name: "type"
100+
desc: "[out] The engine group."
101+
- type: uint64_t
102+
name: "activeCounterValue"
103+
desc: "[out] Represents active counter."
104+
- type: uint64_t
105+
name: "samplingCounterValue"
106+
desc: "[out] Represents counter value when activeCounterValue was sampled."
107+
- type: uint64_t
108+
name: "timestamp"
109+
desc: "[out] Wall clock time when the activeCounterValue was sampled."
110+
--- #--------------------------------------------------------------------------
111+
type: function
112+
desc: "Get handle of virtual function modules"
113+
version: "1.9"
114+
class: $sDevice
115+
name: EnumActiveVFExp
116+
details:
117+
- "The application may call this function from simultaneous threads."
118+
- "The implementation of this function should be lock-free."
119+
params:
120+
- type: $s_device_handle_t
121+
name: hDevice
122+
desc: "[in] Sysman handle of the device."
123+
- type: "uint32_t*"
124+
name: pCount
125+
desc: |
126+
[in,out] pointer to the number of components of this type.
127+
if count is zero, then the driver shall update the value with the total number of components of this type that are available.
128+
if count is greater than the number of components of this type that are available, then the driver shall update the value with the correct number of components.
129+
- type: "$s_vf_handle_t*"
130+
name: phVFhandle
131+
desc: |
132+
[in,out][optional][range(0, *pCount)] array of handle of components of this type.
133+
if count is less than the number of components of this type that are available, then the driver shall only retrieve that number of component handles.
134+
--- #--------------------------------------------------------------------------
135+
type: function
136+
desc: "Get virtual function management properties"
137+
version: "1.9"
138+
class: $sVFManagement
139+
name: GetVFPropertiesExp
140+
details:
141+
- "The application may call this function from simultaneous threads."
142+
- "The implementation of this function should be lock-free."
143+
params:
144+
- type: $s_vf_handle_t
145+
name: hVFhandle
146+
desc: "[in] Sysman handle for the VF component."
147+
- type: $s_vf_exp_properties_t*
148+
name: pProperties
149+
desc: "[in,out] Will contain VF properties."
150+
--- #--------------------------------------------------------------------------
151+
type: function
152+
desc: "Get memory activity stats for each available memory types associated with Virtual Function (VF)"
153+
version: "1.9"
154+
class: $sVFManagement
155+
name: GetVFMemoryUtilizationExp
156+
details:
157+
- "The application may call this function from simultaneous threads."
158+
- "The implementation of this function should be lock-free."
159+
params:
160+
- type: $s_vf_handle_t
161+
name: hVFhandle
162+
desc: "[in] Sysman handle for the component."
163+
- type: "uint32_t*"
164+
name: pCount
165+
desc: |
166+
[in,out] Pointer to the number of VF memory stats descriptors.
167+
- if count is zero, the driver shall update the value with the total number of memory stats available.
168+
- if count is greater than the total number of memory stats available, the driver shall update the value with the correct number of memory stats available.
169+
- The count returned is the sum of number of VF instances currently available and the PF instance.
170+
- type: $s_vf_util_mem_exp_t*
171+
name: pMemUtil
172+
desc: |
173+
[in,out][optional][range(0, *pCount)] array of memory group activity counters.
174+
- if count is less than the total number of memory stats available, then driver shall only retrieve that number of stats.
175+
- the implementation shall populate the vector pCount-1 number of VF memory stats.
176+
--- #--------------------------------------------------------------------------
177+
type: function
178+
desc: "Get engine activity stats for each available engine group associated with Virtual Function (VF)"
179+
version: "1.9"
180+
class: $sVFManagement
181+
name: GetVFEngineUtilizationExp
182+
details:
183+
- "The application may call this function from simultaneous threads."
184+
- "The implementation of this function should be lock-free."
185+
params:
186+
- type: $s_vf_handle_t
187+
name: hVFhandle
188+
desc: "[in] Sysman handle for the component."
189+
- type: "uint32_t*"
190+
name: pCount
191+
desc: |
192+
[in,out] Pointer to the number of VF engine stats descriptors.
193+
- if count is zero, the driver shall update the value with the total number of engine stats available.
194+
- if count is greater than the total number of engine stats available, the driver shall update the value with the correct number of engine stats available.
195+
- The count returned is the sum of number of VF instances currently available and the PF instance.
196+
- type: $s_vf_util_engine_exp_t*
197+
name: pEngineUtil
198+
desc: |
199+
[in,out][optional][range(0, *pCount)] array of engine group activity counters.
200+
- if count is less than the total number of engine stats available, then driver shall only retrieve that number of stats.
201+
- the implementation shall populate the vector pCount-1 number of VF engine stats.
202+
--- #--------------------------------------------------------------------------
203+
type: function
204+
desc: "Configure utilization telemetry enabled or disabled associated with Virtual Function (VF)"
205+
version: "1.9"
206+
class: $sVFManagement
207+
name: SetVFTelemetryModeExp
208+
details:
209+
- "The application may call this function from simultaneous threads."
210+
- "The implementation of this function should be lock-free."
211+
params:
212+
- type: $s_vf_handle_t
213+
name: hVFhandle
214+
desc: "[in] Sysman handle for the component."
215+
- type: $s_vf_info_util_exp_flags_t
216+
name: "flags"
217+
desc: "[in] utilization flags to enable or disable. May be 0 or a valid combination of $s_vf_info_util_exp_flag_t."
218+
- type: $x_bool_t
219+
name: "enable"
220+
desc: "[in] Enable utilization telemetry."
221+
--- #--------------------------------------------------------------------------
222+
type: function
223+
desc: "Set sampling interval to monitor for a particular utilization telemetry associated with Virtual Function (VF)"
224+
version: "1.9"
225+
class: $sVFManagement
226+
name: SetVFTelemetrySamplingIntervalExp
227+
details:
228+
- "The application may call this function from simultaneous threads."
229+
- "The implementation of this function should be lock-free."
230+
params:
231+
- type: $s_vf_handle_t
232+
name: hVFhandle
233+
desc: "[in] Sysman handle for the component."
234+
- type: $s_vf_info_util_exp_flags_t
235+
name: "flag"
236+
desc: "[in] utilization flags to set sampling interval. May be 0 or a valid combination of $s_vf_info_util_exp_flag_t."
237+
- type: uint64_t
238+
name: "samplingInterval"
239+
desc: "[in] Sampling interval value."
240+
--- #--------------------------------------------------------------------------
241+
242+
type: class
243+
desc: "C++ wrapper for a Sysman virtual function management group"
244+
version: "1.9"
245+
name: $sVFManagement
246+
owner: $sDevice
247+
members:
248+
- type: $s_vf_handle_t
249+
name: handle
250+
desc: "[in] handle of Sysman virtual function object"
251+
init: nullptr
252+
- type: $sDevice*
253+
name: pDevice
254+
desc: "[in] pointer to owner object"

0 commit comments

Comments
 (0)