Skip to content

Conversation

tovinkere
Copy link
Contributor

@tovinkere tovinkere commented Sep 2, 2025

Implementation of XPTI within SYCL combined debug and performance streams over time and this PR improves the performance by 15-20% for toolschains by separating the debug data from performance streams.

  • Limits the amount of debug information that is attached to the default "sycl" performance stream.
  • If addition metadata is needed, toolchains will have to subscribe to "sycl.debug" stream
  • Unified runtime created a new trace event for each function call which adds significant overhead to each call when a collector is enabled. This is now minimized by using a global event for all of UR API calls.
  • CUDA plugin implementation of XPTI events enabled both performance and debug streams all the time, irrespective of where there was a tool subscribing to the data.
  • If tools want good performance then they need to subscribe to "sycl" stream. If they want full information and performance is not as important then subscribing to "sycl.debug" stream is the right choice.

 + Improves the performance of collectors and downstream
   collectors by limiting the amount of metada that is
   attached to SYCL events
 + To get additional metadata, toolchains will have to
   also subscribe to the stream "sycl.debug" and the
   additional metadata will be sent in the orginal stream
 + Replaced all xptiMakeEvent() calls with the new
   xptiCreateTracepoint() function

Signed-off-by: Vasanth Tovinkere <[email protected]>
@againull
Copy link
Contributor

againull commented Sep 3, 2025

These failures are unrelated:

/__w/llvm/llvm/build/test/conformance/device/device-test --gtest_filter=urDeviceGetGlobalTimestampTest.SuccessSynchronizedTime/UR_BACKEND_HIP__AMD_HIP_BACKEND__AMD_Radeon_RX_6800_XT_ID0ID_5d5f47c5a7185748____
--
/__w/llvm/llvm/unified-runtime/test/conformance/device/urDeviceGetGlobalTimestamps.cpp:93: Failure
Expected: (observedDiff) <= (allowedDiff), actual: 254563 vs 81685

/__w/llvm/llvm/unified-runtime/test/conformance/device/urDeviceGetGlobalTimestamps.cpp:93
Expected: (observedDiff) <= (allowedDiff), actual: 254563 vs 81685

#18763

CMake Error: install(EXPORT "unified-runtime-targets" ...) includes target "xptifw" which requires target "emhash" that is not in any export set.
#19944

Copy link
Contributor

@bratpiorka bratpiorka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA LGTM

@againull
Copy link
Contributor

againull commented Sep 9, 2025

Jenkins failure is unrelated, see #19929 (comment)


- To support performance and debug streams, subscribing to the stream **"sycl.debug"**
allows the default streams to contain additional metadata when keeping overheads
to a minimum is not important
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to a minimum is not important
to a minimum is not important.

Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR lgtm.

Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graph changes LGTM

@againull
Copy link
Contributor

againull commented Sep 16, 2025

@sergey-semenov Could you please review this PR on behalf of @intel/llvm-reviewers-runtime. @maarquitos14 and @KseniyaTikhomirova are both on vacation at the moment.

@againull againull merged commit 1ebd224 into intel:sycl Sep 22, 2025
74 of 77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants