Skip to content

Commit e85b4f6

Browse files
authored
POC: Enable Proton for XPU; basis, not ready for user use (#2635)
Closes #1145 On LTS proton doesn't work: `zelTracerCreate` return `2013265921` (ZE_RESULT_ERROR_UNINITIALIZED) code. Looks like relates to #1953. **4/2/25 status update:** There is still no necessary interface in PTI, but work on it is in progress and it seems to me there is little left (thanks @jfedorov!). I believe that at least the main points of the implementation can be reviewed and maybe even merged. The main attention should be paid to dependencies, which include sycl, level_zero (including an additional interface for calling callbacks, which is used temporarily) and PTI. I added PTI headers directly to the Triton sources, since they will be small in size and are used only for building Proton, I believe this will not interfere with the main code. I tried to add dependencies so as not to break the build if they are missing, also some paths are hardcoded, because the main use case for now is just launching UTs in CI. There is a lot of debug code left in the code, which outputs additional information to stdout, I would like to leave it for now. I plan to clean this code as soon as PTI implements the necessary functionality (the integration of which will highlight many problems, I'm sure, since I can't emulate the behavior of the PTI completely). **12/12/24 status update:** The most important problem is the lack of an interface for registering callback functions that would be called before and after kernel execution. Cupti and Roctracer backend of Proton use this interface to register (via `callbackData->correlationId`) ​​kernel calls to build a call tree and, upon profiler shutdown, to check that all created records for running kernels were written to Proton storage structures from the backend profiler storage system. The current workaround uses the interface `<level_zero/layers/zel_tracing_api.h>`, which provides functions for registering user functions via a pair of `prologue_callbacks/epilogue_callbacks` for various events. However, the problem of obtaining the record identifier in these callbacks that PTI will create for registered kernels **has not been solved yet** (a manually selected identifier is currently used for tests). How it works in Cupti? (simplified) ```c++ void CuptiProfiler::CuptiProfilerPimpl::callbackFn(void *userData, CUpti_CallbackDomain domain, CUpti_CallbackId cbId, const void *cbData) { // ... if (callbackData->callbackSite == CUPTI_API_ENTER) { // scope registration auto scopeId = threadState.record(); threadState.enterOp(scopeId); // ... // linking the internal profiler data ID to the external one that proton uses. profiler.correlation.correlate(callbackData->correlationId, numInstances); } else if (callbackData->callbackSite == CUPTI_API_EXIT) { // ... // scope exit threadState.exitOp(); // the submitted record should be taken into account when flushing data profiler.correlation.submit(callbackData->correlationId); } // ... } void CuptiProfiler::CuptiProfilerPimpl::doStart() { // subscriber is `CUpti_SubscriberHandle` cupti::subscribe<true>(&subscriber, callbackFn, nullptr); // not exists in PTI cupti::activityEnable<true>(CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL); // exists in PTI cupti::activityRegisterCallbacks<true>(allocBuffer, completeBuffer); // exists in PTI // `setGraphCallbacks` do something like: // CALLBACK_ENABLE(CUPTI_CBID_RESOURCE_GRAPHNODE_CREATED); // CALLBACK_ENABLE(CUPTI_CBID_RESOURCE_GRAPHNODE_CLONED); // ... setGraphCallbacks(subscriber, /*enable=*/true); // not exists in PTI // `setRuntimeCallbacks` do something like: // CALLBACK_ENABLE(CUPTI_RUNTIME_TRACE_CBID_cudaLaunch_v3020); // CALLBACK_ENABLE(CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000); // ... setRuntimeCallbacks(subscriber, /*enable=*/true); // not exists in PTI // `setDriverCallbacks` do something like: // CALLBACK_ENABLE(CUPTI_DRIVER_TRACE_CBID_cuLaunch); // CALLBACK_ENABLE(CUPTI_DRIVER_TRACE_CBID_cuLaunchGrid); // ... setDriverCallbacks(subscriber, /*enable=*/true); // not exists in PTI } ``` **Needs to be done:** * Think of what to do with getting the PTI profiler record identifiers in level zero callbacks or how to find a way without them. * ​​~Synchronize the device to make a correct data flush.~ (done via `sycl_queue.wait()`) * What to do with the concept of cuda graph kernels? * Obtain the device architecture. * Enable unit tests. * [x] test_api.py * [x] test_lib.py * [x] test_profile.py (partially) * [x] test_viewer.py * Enable tutorials. * Final code cleanup. https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14202903723 --------- Signed-off-by: Anatoly Myachev <[email protected]>
1 parent 272e711 commit e85b4f6

File tree

29 files changed

+1411
-46
lines changed

29 files changed

+1411
-46
lines changed

.github/workflows/build-test-reusable.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,13 @@ jobs:
152152
echo TRITON_TEST_CMD="bash -v -x scripts/test-triton.sh --warning-reports --skip-pytorch-install --reports-dir $GITHUB_WORKSPACE/reports ${{ inputs.ignore_errors && '--ignore-errors' || '' }} $skiplist"
153153
} | tee -a $GITHUB_ENV
154154
155+
- name: Run Proton tests
156+
if: ${{ inputs.driver_version == 'rolling' }}
157+
run: |
158+
cd third_party/proton/test
159+
pytest test_api.py test_lib.py test_profile.py test_viewer.py -s -v
160+
cd ..
161+
155162
- name: Run unit tests
156163
run: |
157164
${{ env.TRITON_TEST_CMD }} --unit

.pre-commit-config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,5 +135,6 @@ exclude: |
135135
^third_party/amd/backend/include/roctracer/|
136136
^third_party/amd/backend/lib/|
137137
^third_party/nvidia/backend/include/cuda.h|
138-
^third_party/f2reduce
138+
^third_party/f2reduce|
139+
^third_party/intel/backend/proton/include
139140
)

python/setup.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,10 @@ def get_proton_cmake_args(self):
432432
if roctracer_include_dir == "":
433433
roctracer_include_dir = os.path.join(get_base_dir(), "third_party", "amd", "backend", "include")
434434
cmake_args += ["-DROCTRACER_INCLUDE_DIR=" + roctracer_include_dir]
435+
xpupti_include_dir = get_env_with_keys(["TRITON_XPUPTI_INCLUDE_PATH"])
436+
if xpupti_include_dir == "":
437+
xpupti_include_dir = os.path.join(get_base_dir(), "third_party", "intel", "backend", "proton", "include")
438+
cmake_args += ["-DXPUPTI_INCLUDE_DIR=" + xpupti_include_dir]
435439
return cmake_args
436440

437441
def build_extension(self, ext):
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
//==============================================================
2+
// Copyright (C) Intel Corporation
3+
//
4+
// SPDX-License-Identifier: MIT
5+
// =============================================================
6+
7+
#ifndef INCLUDE_PTI_H_
8+
#define INCLUDE_PTI_H_
9+
10+
#include "pti/pti_export.h"
11+
#include "pti/pti_version.h"
12+
13+
/* clang-format off */
14+
#if defined(__cplusplus)
15+
extern "C" {
16+
#endif
17+
18+
/**
19+
* @brief Return/Error codes
20+
*/
21+
typedef enum {
22+
PTI_SUCCESS = 0, //!< success
23+
PTI_STATUS_END_OF_BUFFER = 1, //!< end of buffer reached, e.g., in ptiViewGetNextRecord
24+
PTI_ERROR_NOT_IMPLEMENTED = 2, //!< functionality not implemented
25+
PTI_ERROR_BAD_ARGUMENT = 3, //!< error code for invalid arguments
26+
PTI_ERROR_NO_CALLBACKS_SET = 4, //!< error due to no callbacks set via ptiViewSetCallbacks
27+
PTI_ERROR_EXTERNAL_ID_QUEUE_EMPTY = 5, //!< empty external ID-queue while working with
28+
//!< PTI_VIEW_EXTERNAL_CORRELATION
29+
PTI_ERROR_BAD_TIMESTAMP = 6, //!< error in timestamp conversion, might be related with the user
30+
//!< provided TimestampCallback
31+
PTI_ERROR_DRIVER = 50, //!< unknown driver error
32+
PTI_ERROR_TRACING_NOT_INITIALIZED = 51, //!< installed driver requires tracing enabling with
33+
//!< setting environment variable ZE_ENABLE_TRACING_LAYER
34+
//!< to 1
35+
PTI_ERROR_L0_LOCAL_PROFILING_NOT_SUPPORTED = 52, //!< no Local profiling support in the installed
36+
//!< driver
37+
38+
PTI_ERROR_INTERNAL = 200 //!< internal error
39+
} pti_result;
40+
41+
/**
42+
* @brief Helper function to return stringified enum members for pti_result
43+
*
44+
* @return const char*
45+
*/
46+
PTI_EXPORT const char* ptiResultTypeToString(pti_result result_value);
47+
48+
#if defined(__cplusplus)
49+
}
50+
#endif
51+
52+
#endif // INCLUDE_PTI_H_
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
2+
#ifndef PTI_EXPORT_H
3+
#define PTI_EXPORT_H
4+
5+
#ifdef PTI_STATIC_DEFINE
6+
# define PTI_EXPORT
7+
# define PTI_NO_EXPORT
8+
#else
9+
# ifndef PTI_EXPORT
10+
# ifdef pti_EXPORTS
11+
/* We are building this library */
12+
# define PTI_EXPORT __attribute__((visibility("default")))
13+
# else
14+
/* We are using this library */
15+
# define PTI_EXPORT __attribute__((visibility("default")))
16+
# endif
17+
# endif
18+
19+
# ifndef PTI_NO_EXPORT
20+
# define PTI_NO_EXPORT __attribute__((visibility("hidden")))
21+
# endif
22+
#endif
23+
24+
#ifndef PTI_DEPRECATED
25+
# define PTI_DEPRECATED
26+
#endif
27+
28+
#ifndef PTI_DEPRECATED_EXPORT
29+
# define PTI_DEPRECATED_EXPORT PTI_EXPORT PTI_DEPRECATED
30+
#endif
31+
32+
#ifndef PTI_DEPRECATED_NO_EXPORT
33+
# define PTI_DEPRECATED_NO_EXPORT PTI_NO_EXPORT PTI_DEPRECATED
34+
#endif
35+
36+
#if 0 /* DEFINE_NO_DEPRECATED */
37+
# ifndef PTI_NO_DEPRECATED
38+
# define PTI_NO_DEPRECATED
39+
# endif
40+
#endif
41+
42+
#endif /* PTI_EXPORT_H */
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
//==============================================================
2+
// Copyright (C) Intel Corporation
3+
//
4+
// SPDX-License-Identifier: MIT
5+
// =============================================================
6+
#ifndef INCLUDE_PTI_VERSION_H_
7+
#define INCLUDE_PTI_VERSION_H_
8+
9+
#include <stdint.h>
10+
11+
#include "pti/pti_export.h"
12+
13+
/* clang-format off */
14+
#if defined(__cplusplus)
15+
extern "C" {
16+
#endif
17+
18+
#if !defined(PTI_VERSION)
19+
#define PTI_VERSION 0.10.0
20+
#endif
21+
22+
#define PTI_VERSION_STRING "0.10.0"
23+
#define PTI_VERSION_MAJOR 0
24+
#define PTI_VERSION_MINOR 10
25+
#define PTI_VERSION_PATCH 0
26+
27+
typedef struct pti_version {
28+
uint32_t _major;
29+
uint32_t _minor;
30+
uint32_t _patch;
31+
} pti_version;
32+
33+
/**
34+
* @brief Returns the compiled version of PTI
35+
*
36+
* @return c-string with compiled version of PTI
37+
*/
38+
PTI_EXPORT const char* ptiVersionString();
39+
40+
/**
41+
* @brief Returns the compiled version of PTI
42+
*
43+
* @return pti_version struct with compiled version of PTI
44+
*/
45+
pti_version PTI_EXPORT ptiVersion();
46+
47+
#if defined(__cplusplus)
48+
}
49+
#endif
50+
51+
#endif // INCLUDE_PTI_VERSION_H_

0 commit comments

Comments
 (0)