Skip to content

Commit 01e5697

Browse files
authored
Introduce 'subsuites' functionality to wptrunner (#138)
This is intended to allow running multiple groups of tests, corresponding to different browser configurations in the same wptrunner process.
1 parent 394101d commit 01e5697

File tree

1 file changed

+348
-0
lines changed

1 file changed

+348
-0
lines changed

rfcs/subsuites.md

Lines changed: 348 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,348 @@
1+
# RFC 138: support subsuites in wptrunner
2+
3+
## Summary
4+
5+
Support running tests in multiple different browser configurations in
6+
a single wptrunner invocation.
7+
8+
## Details
9+
10+
It is common for browser vendors to want to run web platform tests
11+
with particular configurations of the browser. For example when
12+
developing a major new feature, or refactoring existing code, they may
13+
want to run with non-default configuration to enable the new
14+
code paths. Often, for cost and efficiency reasons, these test runs
15+
will initially be limited to just the tests most likely to be affected
16+
by the changes e.g. a change to session history is highly likely to
17+
affect tests involving navigation, but highly unlikely to affect CSS
18+
layout.
19+
20+
Typically vendors wanting to enable multiple configurations had to
21+
define those configurations externally, and invoke wptrunner multiple
22+
times, once per configuration. This has some advantages; in particular
23+
it keeps the vendor-specific issue of browser configurations cleanly
24+
separated from the actual business of running tests. However there are
25+
also a couple of disadvantages:
26+
27+
* When configurations are handled externally, they are either written
28+
into a CI configuration, which is then difficult to run locally, or
29+
the vendor is responsible for writing a substantial frontend on top
30+
of wptrunner which handles invoking it for each of the different
31+
configurations.
32+
33+
* Even with a vendor frontend it's difficult to schedule the tests
34+
optimally across multiple cores. This is because you essentially
35+
have two choices on how to handle the different
36+
configurations. Option one is to run each configuration in serial,
37+
with each wptrunner process using all available cores. This can be
38+
wasteful when you have many configurations, some of which only run a
39+
small number of tests, because those configurations only use a small
40+
fraction of the available compute, but can't be scheduled in
41+
parallel. Option two is to invoke many wptrunner in parallel processes, each
42+
responsible for running a small fraction of the total tests in each
43+
configuration. However wptrunner does a lot of global setup (reading
44+
the test manifest, starting wptserve), and repeating this work per
45+
process would be very inefficient.
46+
47+
So in order to meet this use case there are two obvious options:
48+
49+
1. Refactor wptrunner to allow it to be driven by an external test
50+
scheduler without having to redo all the global setup for each
51+
group of tests.
52+
53+
1. Push the notion of multiple configurations down into wptrunner
54+
itself, and update the existing test scheduling logic to handle
55+
multiple browser configurations.
56+
57+
In practice people are currently using wptrunner as a monolith, with
58+
test selection and browser setup provided by command line options and
59+
configuration files, so although the architecture would probably allow
60+
adopting a more service-based model, it's more in keeping with existing
61+
usage patterns to implement support for multiple configurations
62+
directly in wptrunner.
63+
64+
### Implementation Strategy
65+
66+
From an implementation point of view, the main constraint is around
67+
how we associate test results with a specific configuration.
68+
69+
Our test data flow is built around mozlog. This groups tests into
70+
"suites". Each suite is started with a `suite_start` event ends with a
71+
`suite_end` event. The `suite_start` event contains information,
72+
particularly in the `run_info` key about the browser being tested. That
73+
`run_info` data is used in wptrunner metadata files to determine which
74+
metadata applies to a specific test run (e.g. to allow different
75+
expected results on different platforms). Specific test results are
76+
associated to a suite implicitly by sequence i.e. any `test_start` log
77+
event coming after a specific `suite_start` is grouped within that
78+
suite. This implicitly means that only a single suite can be running
79+
in the context of a specific logger at a specific time. That is
80+
incompatible with the requirement to run multiple configurations in
81+
parallel.
82+
83+
There are three obvious approaches we could use to address this:
84+
85+
1. Use multiple loggers, one per configuration. In this case each
86+
configuration would produce a separate stream of log events that
87+
could be written to a separate file. However mozlog is really
88+
designed around the assumption of a single top-level logger per
89+
process, which is reflected in the way output is configured with
90+
command line options. Since we'd presumably still want a single
91+
top-level output to stdout, we'd still need to solve the problem of
92+
combining multiple log streams into one top-level stream, which
93+
would require a mechanism to relax the one-suite-at-a-time
94+
constraint. We would also need to work out how to pass specific
95+
configuration (e.g. output filenames) down to specific loggers.
96+
97+
1. Put configuration information inside the test id, so that from the
98+
point of view of consumers, the same test run in different
99+
configurations look like different tests. This is how the Chromium
100+
CI system handles this problem with the concept of "virtual"
101+
testsuites. It has the advantage that it requires minimal changes
102+
to data structures; tests get an id like `/<prefix>/<configuration
103+
name>/<actual test id>`, so from the point of view of consumers
104+
these are different tests. However this is also a disadvantage;
105+
every consumer that wants to process the results (e.g. to compare
106+
results between configurations) has to know about the details of
107+
how to parse the test id to extract the "real" test and
108+
configuration name. This goes directly against the mozlog design
109+
principle of providing data in a structured form to minimise the
110+
amount of error-prone parsing required in consumers. It also makes
111+
these configurations work in a way that's markedly different to
112+
other kinds of configuration data (e.g. platform name, whether
113+
we're running a debug build, etc.); that data is represented in the
114+
`run_info` rather than as part of the test id.
115+
116+
1. Add the concept of multiple "subsuites" directly in the mozlog data
117+
model and explicitly link other log actions which depend on the
118+
configuration to the subsuite. This has the disadvantage that it
119+
requires updates to consumers which care about subsuite membership,
120+
but the advantage that over the longer term there's a clearly
121+
defined data model which doesn't require additional parsing to
122+
understand test identity. It also allows us to treat subsuite
123+
configuration like other kinds of configuration data, by
124+
associating each subsuite with particular `run_info` data.
125+
126+
The proposal is to adopt the third approach, and update mozlog to
127+
allow it to represent the concept of subsuites, and wptrunner to allow
128+
scheduling tests from multiple subsuites in parallel.
129+
130+
### Mozlog Changes
131+
132+
For mozlog, we propose allowing each top level test suite (represented
133+
by a `suite_start` / `suite_end` event pair) to also have associated
134+
subsuites. Each subsuite will define updates to the top-level suite
135+
`run_info` data for the configuration in that subsuite. Log actions
136+
which are associated with a specific test or set of tests will get a
137+
`subsuite` property that explicitly names the subsuite they're
138+
associated with.
139+
140+
In detail this means one extra log event:
141+
142+
```py
143+
add_subsuite(name: str, run_info: Mapping[str, Any])
144+
```
145+
146+
`name` is required and must be unique within a given suite. `run_info`
147+
is optional and represents an update to the `run_info` of the
148+
suite. In addition the logger adds a `subsuite` key to the `run_info`,
149+
set to the name of the `subsuite` (TODO: should this be implicit, or
150+
should consumers get freedom in how to handle this?).
151+
152+
Plus `subsuite` keys added to the following log events:
153+
154+
* `test_start`
155+
* `test_end`
156+
* `test_status`
157+
* `assertion_count`
158+
* `lsan_leak`
159+
* `lsan_summary`
160+
* `mozleak_object`
161+
* `mozleak_total`
162+
163+
In all cases this key is optional, and when missing it's assumed that
164+
the event is associated with the default (unnamed) subsuite. Mozlog
165+
will enforce the property that the subsuite name corresponds to a
166+
subsuite that has previously been added with `add_subsuite`.
167+
168+
### wptrunner changes
169+
170+
wptrunner already has the concept of test groups. These are sets of
171+
tests which run in a single browser process, between which the browser
172+
is restarted. Parallel test running is enabled by scheduling multiple
173+
test groups, each of which is associated with a single browser
174+
instance.
175+
176+
It also has the concept of multiple test types (e.g. testharness,
177+
reftest), each of which has a different implementation. Each test
178+
group is associated with a single type of test (i.e. a single group
179+
can contain either testharness tests or reftests, but not both)
180+
i.e. we can form a mapping from a test type to a list of test
181+
groups. Groups of different test types can be run in parallel.
182+
183+
For subsuites the different browser configurations in each subsuite
184+
mean that it's generally desirable not to reuse browser instances
185+
between different subsuites either by necessity (e.g. because the
186+
configuration of the browser for the subsuite has to happen at browser
187+
startup) or because of isolation (so that we don't accidentally get
188+
shared browser state between subsuites affecting the results of
189+
later-run tests). Therefore we can assume that each test group should
190+
be associated with exactly one subsuite. This leaves a small amount of
191+
parallelism on the table in cases where it would be possible to run
192+
multiple subsuites in the same group, but the resulting model is
193+
relatively easy to understand and satisfies the main use cases where
194+
changing configuration at runtime is not possible.
195+
196+
Therefore the implementation will work as follows:
197+
198+
* Test groups are associated with a pair `(subsuite, test_type)`
199+
rather than a test type alone.
200+
201+
* Each `(subsuite, test_type)` combination has a unique
202+
`TestImplementation`, which corresponds to a unique instance of the
203+
product-specific `Browser` and `Executor` classes.
204+
205+
* Each product implementation is responsible for configuring the
206+
browser and executor classes as required for the subsuite. To
207+
facilitate this the subsuite details are passed to the product
208+
specific functions for getting the browser and executor kwargs.
209+
210+
* Any global state (e.g. reftest screenshot cache) that depends on the
211+
test id is instead keyed on `(subsuite, test.id)`.
212+
213+
#### Subsuite definition
214+
215+
The definition of subsuites contains the following properties:
216+
217+
* `name` (required) - The name of the subsuite.
218+
219+
* `config` (required) - Product-specific configuration options for the
220+
subsuite. This is used to e.g. set the command line arguments used
221+
when launching the browser, or the prefs set in the browser profile.
222+
223+
* `run_info` (optional) - Any updates to the `run_info` associated with the
224+
subsuite, other than its name (the `run_info["subsuite"] = name` key
225+
is set implicitly).
226+
227+
* `include` (optional) - A list of test id prefixes to include when
228+
running the subsuite.
229+
230+
* `tags` (optional) - A list of test tags (as defined in the metadata
231+
files) to include when running the subsuite.
232+
233+
* `expires` (optional) - A date after which the subsuite is no longer
234+
run (required for compatibility with Chromium's existing
235+
infrastructure).
236+
237+
In the first instance the subsuite definitions will be provided in a
238+
JSON file of the form
239+
240+
```json
241+
{
242+
subsuite_name: {properties}
243+
}
244+
```
245+
246+
This will be passed to wptrunner using a new `--subsuite-file` command
247+
line argument.
248+
249+
#### Test selection
250+
251+
The test selection mechanism in wptrunner is complex, and tries to
252+
meet many use cases. Adding subsuites into the mix inevitably
253+
increases that complexity.
254+
255+
Typically in wptrunner test loading happens in multiple stages:
256+
257+
1. One or more `MANIFEST.json` files are loaded, providing a complete list of
258+
all the tests in the suite (multiple `MANIFEST.json` files are used
259+
to allow vendors to add vendor-specific tests to the suite).
260+
261+
1. The entries of each manifest are iterated. These are filtered
262+
according to a test id filter built from the `--include`,
263+
`--exclude`, `--include-file` and `--include-manifest` command line
264+
arguments.
265+
266+
1. For each included test, the corresponding metadata is parsed.
267+
268+
1. Further filters based on the metadata values are applied. The
269+
`--tag` command line argument, and `--skip-implementation-status`
270+
command line arguments are appleid at this stage, since they depend
271+
on metadata values.
272+
273+
1. The tests are split into test groups, according to the
274+
configuration.
275+
276+
The `--test-groups` command line argument changes the behaviour here
277+
slightly; it directly provides a list of tests that will run, and
278+
names groups for the tests in the format `{group_name:
279+
[tests_ids]}`. This can be used in combination with the `--include`
280+
command line argument to subset the tests in the file, but trying to
281+
specify more tests than in the file will cause an error.
282+
283+
The current metadata processing means that we need to read the
284+
metadata files once per subsuite (since we resolve the metadata
285+
properties when reading the file, and the metadata can be different
286+
for different subsuites).
287+
288+
The simplest solution here is to read the `MANIFEST.json` files once,
289+
apply the top-level filters that apply to all subsuites (e.g. those
290+
from `--include`), and then run the remaining steps for each subsuite,
291+
adding the subsuite include filters after parsing the metadata. This
292+
isn't the optimal solution, since we can end up reading metadata
293+
multiple times even though it's not used. However further
294+
optimisations are possible (e.g. separating out the parse and
295+
evaluation phase of reading metadata, as we do with the update tool,
296+
making the top-level filters per-subsuite to avoid reading metadata
297+
files for tests won't be included in the subsuite). However these
298+
optimistations should be possible without backwards-incompatible
299+
changes.
300+
301+
Since the `--test-groups` file is directly providing a list of tests
302+
that run, rather than a filter, the format of that file is modified in
303+
a backwards compatible way so that instead of the keys being just
304+
group name, they are of the form `<subsuite_name>:<group_name>`, with
305+
the part up to and including the colon omitted for the default subsuite.
306+
307+
Running individual subsuites will be possible with the `--subsuite`
308+
command line argument. This can be provided multiple times to run
309+
multiple subsuites.
310+
311+
#### wptreport.json
312+
313+
The wptreport.json format needs to be updated to allow specifying
314+
which subsuite each test belongs to. To do this two principal changes
315+
are proposed:
316+
317+
1. A new top-level key `subsuites` which contains a mapping from
318+
subsuite name to the `run_info` data specific to that subsuite.
319+
320+
1. Entries in the `results`, `asserts`, `lsan_leaks`, and `mozleak`
321+
top-level fields get a `subsuite` property which is the name of a
322+
subsuite.
323+
324+
#### Metadata update
325+
326+
The automatic metadata update works by collecting test results from
327+
multiple files, grouped by their `run_info` data. This can naturally
328+
be extended to support subsuites, just by extending the processing to
329+
create a `run_info` entry per subsuite rather than assuming the
330+
`run_info` is the same for all tests in a file.
331+
332+
The main difficulty here is that the metadata expression language
333+
doesn't have a concept of `null`. So we need to represent the default
334+
subsuite name as the empty string in `run_info`, or add a `null`
335+
concept to allow specifying metadata specific to the default subsuite.
336+
337+
## Risks
338+
339+
The main trade-offs of the proposed approach have been discussed in the
340+
Details section.
341+
342+
The main risk is that by making changes to the underlying data model
343+
we need to update tooling that consumes the generated files. However,
344+
because the changes are purely additive, and the use-cases for
345+
subsuites are limited (e.g. we don't currently have any plans to use
346+
these on the GitHub infrastructure), it should be possible to upgrade
347+
consumers as required. In particular no changes proposed here should
348+
affect wpt.fyi in the foreseeable future.

0 commit comments

Comments
 (0)