Skip to content

Conversation

@PietroGhg
Copy link
Contributor

@PietroGhg PietroGhg commented Oct 1, 2024

Implements ur_event_handle_t_, allowing for recording time stamps for events and asynchronous kernel execution.
Fixes bugs in handling kernel arguments that have been exposed by the asynchronous execution.

intel/llvm PR: intel/llvm#15564

@PietroGhg PietroGhg requested a review from a team as a code owner October 1, 2024 10:16
@github-actions github-actions bot added the native-cpu Native CPU adapter specific issues label Oct 1, 2024
@PietroGhg PietroGhg force-pushed the pietro/events_initial branch 3 times, most recently from 454b455 to 53e2bc8 Compare October 1, 2024 11:55
@PietroGhg PietroGhg force-pushed the pietro/events_initial branch from 53e2bc8 to ec5e8ea Compare October 2, 2024 11:35
@PietroGhg PietroGhg requested a review from a team as a code owner October 2, 2024 11:35
@github-actions github-actions bot added the conformance Conformance test suite issues. label Oct 2, 2024
Copy link
Contributor

@kbenzie kbenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTS changes LGTM

Copy link
Contributor

@uwedolinsky uwedolinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some potential performance issues (see comments) which can probably be addressed in a subsequent PR. Otherwise LGTM

using args_index_t = std::vector<void *>;
args_index_t Indices;
std::vector<size_t> ParamSizes;
std::vector<bool> OwnsMem;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably put these into one vector for performance

new ur_event_handle_t_(hQueue, UR_COMMAND_MEM_BUFFER_MAP);
event->tick_start();
*ppRetMap = hBuffer->_mem + offset;
event->tick_end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to just time fast pointer arithmetic (because it's just enqueue) which is probably less than the overhead of std::lock_guard in tick_end. Just creating the event like in urEnqueueMemUnmap below is probably fine.

@PietroGhg PietroGhg added the ready to merge Added to PR's which are ready to merge label Oct 3, 2024
@aarongreig aarongreig merged commit eddfd8e into oneapi-src:main Oct 24, 2024
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conformance Conformance test suite issues. native-cpu Native CPU adapter specific issues ready to merge Added to PR's which are ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants