Skip to content

Conversation

aelovikov-intel
Copy link
Contributor

This regressed due to #19924 but, apparently, we didn't have proper tests in place. I'm not sure what's causing this exactly, but having each compilation create its unique ToolchainFS instead of all of them using the same

llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> SYCLToolchain::ToolchainFS

somehow results in the test (added in this PR) passing consistently.

@aelovikov-intel aelovikov-intel requested review from a team and cperkinsintel as code owners October 14, 2025 20:48
@aelovikov-intel aelovikov-intel added the run-perf-tests Run performance tests in pre-commit (normally part of post-commit only) label Oct 14, 2025
@aelovikov-intel
Copy link
Contributor Author

aelovikov-intel commented Oct 14, 2025

Unfortunately, hits compilation times pretty hard...

Before (locally):

Extra Headers Without PCH With auto-PCH
202ms 142ms 141ms 141ms 140ms 242ms 68ms 68ms 68ms 68ms
sycl/half_type.hpp 171ms 171ms 171ms 171ms 177ms 283ms 76ms 75ms 76ms 76ms
sycl/ext/oneapi/bfloat16.hpp 179ms 179ms 179ms 179ms 179ms 292ms 78ms 82ms 78ms 77ms
sycl/marray.hpp 147ms 146ms 146ms 146ms 146ms 249ms 70ms 70ms 70ms 70ms
sycl/vector.hpp 304ms 298ms 298ms 304ms 298ms 514ms 130ms 130ms 131ms 130ms
sycl/multi_ptr.hpp 290ms 289ms 289ms 289ms 289ms 474ms 136ms 136ms 136ms 136ms
sycl/builtins.hpp 562ms 545ms 544ms 543ms 544ms 933ms 233ms 232ms 231ms 232ms

After (same system):

Extra Headers Without PCH With auto-PCH
352ms 281ms 266ms 266ms 248ms 376ms 196ms 178ms 171ms 188ms
sycl/half_type.hpp 288ms 279ms 282ms 281ms 298ms 397ms 201ms 201ms 201ms 182ms
sycl/ext/oneapi/bfloat16.hpp 290ms 292ms 291ms 306ms 306ms 418ms 204ms 203ms 194ms 186ms
sycl/marray.hpp 257ms 252ms 254ms 271ms 258ms 363ms 189ms 195ms 186ms 173ms
sycl/vector.hpp 439ms 415ms 445ms 415ms 413ms 663ms 243ms 242ms 246ms 266ms
sycl/multi_ptr.hpp 427ms 425ms 407ms 407ms 406ms 597ms 248ms 251ms 251ms 258ms
sycl/builtins.hpp 694ms 675ms 684ms 695ms 686ms 1066ms 387ms 385ms 357ms 387ms

Despite that, I think we still should proceed with this PR because stability is more important. I'll just have to follow-up on the performance later.

This regressed due to intel#19924 but,
apparently, we didn't have proper tests in place. I'm not sure what's
causing this exactly, but having each compilation create its unique
`ToolchainFS` instead of all of them using the same

`llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> SYCLToolchain::ToolchainFS`

somehow results in the test (added in this PR) passing consistently.
Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we actually tripping on this? Maybe we should slow-walk merging this until its better understood? The performance hit is sobering.

@aelovikov-intel
Copy link
Contributor Author

Are we actually tripping on this?

On the test I'm adding here and on other tests I'm adding to verify future --persistent-auto-pch option.

@aelovikov-intel aelovikov-intel merged commit fc2ce23 into intel:sycl Oct 15, 2025
68 of 73 checks passed
@aelovikov-intel aelovikov-intel deleted the rtc-race branch October 15, 2025 20:03
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Oct 16, 2025
intel#19924 essentially made it `static`
but that caused data races that were later fixed by
intel#20360 changing each use of it to
re-create this in-memory FS (essentially, "removing" `static`),
incurring significant performance costs.

This PR addresses the issue by "adding" `thread_local` instead of
"removing" `static` allowing us to have both no crashes due to data
races and minimal overhead.

No tests added as the one from intel#20360
is verifying this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-perf-tests Run performance tests in pre-commit (normally part of post-commit only)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants