Skip to content

Conversation

@ayylol
Copy link
Contributor

@ayylol ayylol commented Oct 22, 2024

Because the aspects for require lines are handled device per device, tests that require multiple devices are never ran.

For example the condition REQUIRES: cpu, gpu is never met because no device is both a cpu and gpu, instead we want to use REQUIRES: any-device-is-cpu, any-device-is-gpu in this case.

@ayylol ayylol marked this pull request as ready for review October 22, 2024 21:23
@ayylol ayylol requested review from a team as code owners October 22, 2024 21:23
@ayylol ayylol requested a review from maarquitos14 October 22, 2024 21:23
Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

//===----------------------------------------------------------------------===//

// REQUIRES: opencl-aot, ocloc, cpu, gpu, accelerator
// REQUIRES: opencl-aot, ocloc, any-device-is-cpu, any-device-is-gpu, any-device-is-accelerator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand how it is a correct fix. If , is treated as && in REQUIRES directive, then we previously had: cpu && gpu && acc which is never true. Now we have any-device-is-cpu && any-device-is-gpu && any-device-is-accelerator.

If the expression is computed per single device, then even with the latter there will only be true for just one of any-device-is, meaning the overall expression evaluates to false anyways.

Should we just drop those device requirements from the test and instead use UNSUPPORTED to filter out environments where the test can't be run (like CUDA)?

Considering that there are no device types other than CPU, GPU and ACC, do we ever need to require them all? My understanding is that it is safe to assume that check-sycl-e2e is always launched with an environment with at least one device. And if it is not the case yet, we should maybe introduce a check like this to lit.cfg.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old features are per-device, new features are per-llvm-lit invocation. They were created to target this scenario exactly.

@@ -1,4 +1,4 @@
// REQUIRES: linux, (gpu && level_zero), cpu
// REQUIRES: linux, (gpu && level_zero), any-device-is-cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// REQUIRES: linux, (gpu && level_zero), any-device-is-cpu
// REQUIRES: linux
// UNSUPPORTED: (opencl && gpu), acc

I general I have a very same question about how switching to any-device-is-cpu helps here

//===---------------------------------------------------------------------===//

// REQUIRES: cpu, gpu, accelerator
// REQUIRES: any-device-is-cpu, any-device-is-gpu, any-device-is-accelerator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, at this point I've probably realized how it works. any-device-is-* is computed not for a current device that we are running on, but for the whole environment we are in.

Therefore, in an environment where we have all three types of devices, all three any-device -is* will be evaluated to true.

Even though I appreciate the change, I don't think that it should be accepted without update to the documentation. Otherwise, someone could introduce a new incorrect test literally tomorrow.

Moreover, I would maybe think about alternative solutions here. Like, my first though is: why gpu doesn't just work as any-device-is-gpu? And the answer here is that sometimes we want a test to only be run on GPU and not just because GPU is present on a system.

Would it be better to rename any-device-is-cpu to something like cpu-device-is-present? To clarify that we don't require a test to be launched on CPU, but we require a presence of a device on the system. Or is it only me who got confused here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use C expression which is intuitive to everyone?

linux && cpu   # linux and cpu env
linux && gpu  # linux gpu env
linux && (gpu || cpu) # linux, cpu env or gpu env
linux && (cpu && gpu)  # linux, cpu & gpu must exist at the same time
linux && !acc  # linux, cpu env or gpu env or cpu&gpu env

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get rid of "," expression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I appreciate the change, I don't think that it should be accepted without update to the documentation. Otherwise, someone could introduce a new incorrect test literally tomorrow.

100% agree with the documentation change given how easy it is to make this error. Not to mention that it doesn't just affect REQUIRES, but also any lit line that doesn't have a %{run}. Any statement like %if <device/backend> never evaluates to true unless there is a %{run} in the line. For example:
https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Basic/interop/interop_all_backends.cpp#L2.

Why not just use C expression which is intuitive to everyone?

linux && cpu   # linux and cpu env
linux && gpu  # linux gpu env
linux && (gpu || cpu) # linux, cpu env or gpu env
linux && (cpu && gpu)  # linux, cpu & gpu must exist at the same time
linux && !acc  # linux, cpu env or gpu env or cpu&gpu env

This would still have the issue that this PR is trying to fix, how we currently handle requires linux && (cpu && gpu) would never be met. However if we got the conditions to work as described, i feel like this would be quite intuitive.

Moreover, I would maybe think about alternative solutions here. Like, my first though is: why gpu doesn't just work as any-device-is-gpu? And the answer here is that sometimes we want a test to only be run on GPU and not just because GPU is present on a system.

Honestly, I like this solution, we could structure the tests slightly differently in the two cases you mention. If the intention is to run because a GPU is present in the system we would likely also add another device to the requires, and include logic (either through device selectors, or env variables) to make sure we are running on the device we intend. In the case we want to specifically only run on a GPU we would use %{run} which would take care of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would still have the issue that this PR is trying to fix, how we currently handle requires linux && (cpu && gpu) would never be met.

I see. Some of my tests require both cpu and gpu device available, but it's fine to ignore these tests if we don't have this environment currently.

Copy link
Contributor

@AlexeySachkov AlexeySachkov Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, as a relatively cheap trade-off, we could consider renaming any-device-is-cpu to cpu-is-present, for example.

Should we also consider renaming of cpu to runs-on-cpu? Then everything will be self-documenting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of cpu-is-present, but I don't like the runs-on-cpu. IMO, it's even more ambiguous than before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the true intent of "REQUIRES: cpu,gpu,accelerator" is tested on any of "cpu, gpu, acc", why not just rewrite it to "REQUIRES: cpu || gpu || accelerator"?

Copy link
Contributor

@AllanZyne AllanZyne Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AllanZyne, the problem is confusion between "it required to be run on GPU" vs "it requires a GPU to be present on the system, but can be launched on other device".

Do we really need "it requires a GPU to be present on the system, but can be launched on other device" this case?
If so, I think it's more intuitive to add a new label, such as

REQUIRE: cpu
MACHINE: cpu && gpu  // or, "cpu, gpu"

It means this test needs to run on a machine which has both cpu and gpu, but this test will only run on cpu (I don't understand this need, it seems "REQUIRE: cpu && gpu" is more reasonable).

When you write REQUIRES: cpu, gpu you probably mean that the test requires both to be present on the system, but can be launched on either of them.

REQUIRE: cpu || gpu
MACHINE: cpu && gpu  // or, "cpu, gpu"

I also not quite understand this requirement, though. I think "REQUIRE: cpu || gpu" is more flexible.

And by default,

REQUIRE: cpu || gpu

It means this test can be tested on cpu or gpu, it doesn't require the machine have both cpu and gpu. If the machine only have cpu, it's fine to just test cpu. If the machine have cpu and gpu, it will test on cpu and gpu separately (run twice).

REQUIRE: cpu && gpu

It means, it needs a machine have both cpu and gpu, and this test needs these devices both available (run once).

Copy link
Contributor

@AllanZyne AllanZyne Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean "REQUIRE" is used to control "ONEAPI_DEVICE_SELECTOR",

REQUIRE: cpu || gpu

export ONEAPI_DEVICE_SELECTOR=opencl:cpu
export ONEAPI_DEVICE_SELECTOR=level_zero:gpu

REQUIRE: cpu && gpu // or, "cpu, gpu"

export ONEAPI_DEVICE_SELECTOR=level_zero:gpu,opencl:cpu

// available.
//
// REQUIRES: cpu,gpu,accelerator
// REQUIRES: any-device-is-cpu, any-device-is-gpu, any-device-is-accelerator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be an E2E test at all, it should be moved to unit-tests where we can emulate all environments and check that we query the right device, or just directly check that we parse the env variable correctly, similar to what I did in #15774

@@ -1,4 +1,4 @@
// REQUIRES: cpu, gpu, accelerator, opencl
// REQUIRES: any-device-is-cpu, any-device-is-gpu, any-device-is-accelerator, any-device-is-opencl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this should not be an E2E test. This comment can be applied to some other tests as well, but I will repeat it on those which are the first candidates for the rewrite.


// FIXME: This set is never satisfied all at once in our infrastructure.
// REQUIRES: opencl, level-zero, cpu, gpu, opencl-aot, ocloc
// REQUIRES: any-device-is-opencl, any-device-is-level_zero, opencl-aot, ocloc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly sure that this can be turned into a unit-test as well

@@ -1,7 +1,7 @@
// REQUIRES: cuda, opencl, gpu, cpu
// REQUIRES: any-device-is-cuda, any-device-is-opencl, any-device-is-gpu, any-device-is-cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more unit-tests! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants