Skip to content
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6b70797
Add method to set build/run mode
ayylol Oct 15, 2024
08297d6
Add logic to split build/running of e2e tests
ayylol Oct 15, 2024
b679fbf
Remove added comment
ayylol Oct 15, 2024
7f7d56b
Change layout of if statement checking exclusive build mode
ayylol Oct 15, 2024
37df022
Add github workflow to only build tests
ayylol Oct 16, 2024
c8b0eb6
Change test run parameters
ayylol Oct 16, 2024
53592cf
Remove split test yaml file
ayylol Oct 17, 2024
c8588fa
Add triple selection for spir64
ayylol Oct 18, 2024
a61cb32
set hip_amd and hip_nvidia to always false for spir64
ayylol Oct 18, 2024
2eac94e
Set native_cpu to always false for spir64 triple
ayylol Oct 18, 2024
bdfd111
Merge branch 'sycl' into sycl-devops-pr/ayylol/e2e-split
ayylol Oct 23, 2024
861f17a
Add Linux build runner option to manual e2e workflow
ayylol Oct 23, 2024
e8ece44
Add test build only job to nightly
ayylol Oct 24, 2024
92522fd
Merge branch 'sycl' into sycl-devops-pr/ayylol/e2e-split
ayylol Oct 24, 2024
bd1fd50
remove commented xfail code
ayylol Oct 25, 2024
07c3c40
Add comment on exceptions list and fix formatting
ayylol Oct 25, 2024
ee764e2
Fix formatting
ayylol Oct 28, 2024
b902185
Merge branch 'sycl' into sycl-devops-pr/ayylol/e2e-split
ayylol Oct 28, 2024
563c765
rename queried_features
ayylol Oct 28, 2024
5f267ec
Format change
ayylol Oct 28, 2024
201712b
Take the union of the unsupported and requires features list
ayylol Oct 28, 2024
e19c435
Remove line filtering on `REQUIRES: run-mode` tests
ayylol Oct 29, 2024
22b7583
Split exceptions dictionary in two
ayylol Oct 29, 2024
5a8d4e4
Remove triple selection logic
ayylol Oct 30, 2024
96fe4e3
Merge branch 'sycl' into sycl-devops-pr/ayylol/e2e-split
ayylol Oct 30, 2024
146afa2
Removed nightly build job
ayylol Oct 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/sycl-linux-run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ on:
- '["amdgpu"]'
- '["Linux", "arc"]'
- '["cts-cpu"]'
- '["Linux", "build"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go to a separate PR so that we don't run needless CI jobs when this PR is updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only affects the options available for the manual e2e run. I dont think i see it affecting the pre-commit checks being ran.
That being said do you know if there is a way to not run all the checks its doing due to being a sycl-devops-pr, and make it run the checks for a normal pr?

image:
description: |
Use option ending with ":build" for AMDGPU, ":latest" for the rest.
Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/sycl-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ jobs:
target_devices: ext_oneapi_hip:gpu
tests_selector: e2e

- name: Intel Build Tests Only
runner: '["Linux", "build"]'
image: ghcr.io/intel/llvm/ubuntu2204_intel_drivers:alldeps
image_options: -u 1001 --privileged --cap-add SYS_ADMIN
target_devices: opencl:cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does target_devices matter here or are we just giving it some dummy value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mostly a dummy value. However I think if there is none some AOT tests fail building (this is an area I have to do a lot more digging in tbh)

tests_selector: e2e
extra_lit_opts: --param split-mode=build

- name: Intel L0 GPU
runner: '["Linux", "gen12"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other e2e runs aren't using the tests build from the previous step yet right? if it does i dont see how that's happening :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this pr does not include any of the running side of the split mode. Basically what we would have to do is make this job act similar to how the job for building the compiler works by spitting out an artifact which would include the built tests, and then on the running side we would take that artifact and run the tests with --param split-mode=run. That requires a bit more changes with the github ci, which im not too familiar so I wanted to save it for future prs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, probably udit/myself/andrei can help with that if you are stuck

image: ghcr.io/intel/llvm/ubuntu2204_intel_drivers:latest
Expand Down
80 changes: 69 additions & 11 deletions sycl/test-e2e/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,47 @@ def getMatchedFromList(self, features, alist):
except ValueError as e:
raise ValueError("Error in UNSUPPORTED list:\n%s" % str(e))

def make_default_features_list(self, expr, triple, add_default=True):
### EXCEPTIONS LIST
# TODO: Define elsewhere?
exceptions = {}
exceptions["spir64"]={
"cuda":False, "hip":False, "hip_amd":False, "hip_nvidia":False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not totally clear to me what this represents, maybe we could rename the var or add comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The way tests are marked as unsupported is altered. Instead of testing what devices on the system are supported by the test, we test what triples the test can be built for. To do this we assume that the triple supports all features in the REQUIRES: line, unless we know for certain it does not (i.e., for spir64 we know we are not supporting the hip backend). We also do this for UNSUPPORTED: and XFAIL:, but instead we assume the triple has no feature unless we know it always supports a given feature (i.e., nvptx64-nvidia-cuda will always use sg-32).

Basically this list is meant to represent the features for a triple (or the environment) that we always know the status of. So if we have a triple and can never have a given feature, we would add it to this dictionary as "false". That's what you see with "spir64" and cuda for example. We may have features that are always true for a triple, those would go in the dictionary as "True", like having "sg-32" for "nvptx64-nvidia-cuda". In the last case we may have features that may be true or false for a given triple, in this case we would not add it to the dictionary. In this case think for example "spir64" and the feature "level_zero", since we could be using the "OpenCL" backend instead.

This list just allows us to make the REQUIRES:, UNSUPPORTED: and XFAIL: statements be as permissive as possible while still taking into account combinations of features and triples that we know always/never happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation, that totally makes sense. maybe we could try to rework the code/add comments so that's more spelled out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment about this list in the most recent commit

"native_cpu":False,
}
exceptions["system"]={
"linux":True, "windows":False, "system-windows":False,
"run-mode":False, "TEMPORARY_DISABLED":False,
}
queried_features = []
for f in expr:
queried_features = queried_features + re.findall("[-+=._a-zA-Z0-9]+", f)

features = []
for f in queried_features:
if (exceptions[triple].get(
f,exceptions["system"].get(f,add_default))):
features.append(f)
return features

def select_triples_for_test(self, test):
# Check Triples
triples = set()
possible_triples = ["spir64"]
for triple in possible_triples:
unsupported=self.make_default_features_list(test.unsupported,triple,False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the False are here, can we add a var to describe it and pass that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter for the function represents if by default we should add a feature to the list of features we have. We want to be as permissive as possible when building so any feature we see in an unsupported line is by default not added to the list of features, the opposite would be true for the features in the requires lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna be honest i dont totally understand this explanation :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have a list of features for the environment (including things like the OS) and a list of features for each device (including things such as their aspects, and what type of device it is). To check if a device is supported for a given test we join these two lists and check the unsupported and requires statements. This won't work when running the split build mode since we do not have all the devices we are building for, and therefore we do not have a list of the aspects that the devices have.

The make_default_features_list function exists to get around this, its purpose is to create a list of features that can pass the unsupported and requires statements (small caveat: so long as the statement does not include negations). To pass the unsupported statement we want none of the features in the statement to be in our features list, unless they exist in the exceptions list as always present. The opposite is true for the requires statement, we want all the features present in the line in our features list, unless they exist in the exceptions list as never present.

The parameter add_default in make_default_features_list is what captures the difference between the behavior we want in unsupported and requires lines. For unsupported lines we pass in False so that features in the line are not added by default, and for the requires line we pass in True so features in the line are added by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks. if you could add a comment explaining this that would be really helpful

required=self.make_default_features_list(test.requires,triple)
xfails=self.make_default_features_list(test.xfails,triple,False)
if test.getMissingRequiredFeaturesFromList(required):
continue
if self.getMatchedFromList(unsupported, test.unsupported):
continue
#if "*" in test.xfails or self.getMatchedFromList(xfails, test.xfails):
# continue
triples.add(triple)

return triples

def select_devices_for_test(self, test):
devices = []
for d in test.config.sycl_devices:
Expand Down Expand Up @@ -154,18 +195,26 @@ def execute(self, test, litConfig):
if isinstance(script, lit.Test.Result):
return script

devices_for_test = self.select_devices_for_test(test)
if not devices_for_test:
return lit.Test.Result(
lit.Test.UNSUPPORTED, "No supported devices to run the test on"
)

substitutions = lit.TestRunner.getDefaultSubstitutions(test, tmpDir, tmpBase)
devices_for_test = []
triples = set()
for sycl_device in devices_for_test:
(backend, _) = sycl_device.split(":")
triples.add(get_triple(test, backend))
if "run-mode" not in test.config.available_features:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe swap the logic so the first check isnt a not? could we check for build instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with doing this is that the non-split mode sets both run-mode and build-mode features. what i am doing here is specifically checking for the split build only mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok thx
build and run mode at the same time just does exactly what we do today without these changes right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok thx build and run mode at the same time just does exactly what we do today without these changes right?

Correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked through the PR yet, but from previous discussions, I'd expect that we might build more tests/for more triples that are otherwise skipped in the old mode due to XFAIL/UNSUPPORTED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean for tests that compile fine but fail/unsupported at runtime, we always compile those but don't run the test? yeah that sounds cool

triples = self.select_triples_for_test(test)
if not triples:
return lit.Test.Result(
lit.Test.UNSUPPORTED, "No supported backend to build for"
)
else:
devices_for_test = self.select_devices_for_test(test)
if not devices_for_test:
return lit.Test.Result(
lit.Test.UNSUPPORTED, "No supported devices to run the test on"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried building on one system and running on a different system yet? have you have any issues with different paths to python or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i ran some experiments to run the split side on a separate machine after running the build mode on another. I did not run into issues relating to python path, mostly issues related to the prs i've opened, and AOT tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok cool, i expected this part might have insane nonsense to work through but im glad it seems not too bad

)

for sycl_device in devices_for_test:
(backend, _) = sycl_device.split(":")
triples.add(get_triple(test, backend))

substitutions = lit.TestRunner.getDefaultSubstitutions(test, tmpDir, tmpBase)
substitutions.append(("%{sycl_triple}", format(",".join(triples))))
# -fsycl-targets is needed for CUDA/HIP, so just use it be default so
# -that new tests by default would runnable there (unless they have
Expand Down Expand Up @@ -223,6 +272,14 @@ def get_extra_env(sycl_devices):
new_script.append(directive)
continue

# Filter commands based on split-mode
is_run_line = any(i in directive.command for i in
["%{run}","%{run-unfiltered-devices}","%if run-mode"])

if ((is_run_line and "run-mode" not in test.config.available_features) or
(not is_run_line and "build-mode" not in test.config.available_features)):
directive.command=""

if "%{run}" not in directive.command:
new_script.append(directive)
continue
Expand Down Expand Up @@ -278,7 +335,8 @@ def get_extra_env(sycl_devices):
test, litConfig, useExternalSh, script, tmpBase
)

if len(devices_for_test) > 1:
if (len(devices_for_test) > 1 or
"run-mode" not in test.config.available_features):
return result

# Single device - might be an XFAIL.
Expand Down
10 changes: 10 additions & 0 deletions sycl/test-e2e/lit.cfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@

# Configuration file for the 'lit' test runner.

# split-mode: Set if tests should run normally or with split build/run
match lit_config.params.get("split-mode","both"):
case "run":
config.available_features.add("run-mode")
case "build":
config.available_features.add("build-mode")
case _:
config.available_features.add("run-mode")
config.available_features.add("build-mode")

# name: The name of this test suite.
config.name = "SYCL"

Expand Down
Loading