Skip to content

Conversation

@wenju-he
Copy link
Contributor

On intel cpu device, the number of subdevices was the same as cpu count. The number could be large on a server. In shared subtest, all subdevices belong to the same context and device program is built for every subdevice. Device code compilation may take long time. This PR reduces test time from 10s to 0.4s on 160-core ICX. When device sanitizer is enabled, test time is reduced from ~10min to 14s.

On intel cpu device, the number of subdevices was the same as cpu count.
The number could be large on a server. In `shared` subtest, all
subdevices belong to the same context and device program is built for
every subdevice. Device code compilation may take long time.
This PR reduces test time from 10s to 0.4s on 160-core ICX. When device
sanitizer is enabled, test time is reduced from ~10min to 14s.
@wenju-he wenju-he requested a review from a team as a code owner December 12, 2024 10:28
Copy link
Contributor

@uditagarwal97 uditagarwal97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

@wenju-he
Copy link
Contributor Author

@intel/llvm-gatekeepers please merge, thanks

@martygrant martygrant merged commit f64c81a into intel:sycl Dec 13, 2024
15 checks passed
@wenju-he wenju-he deleted the subdevice_pi.cpp-time branch December 13, 2024 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants