-
Notifications
You must be signed in to change notification settings - Fork 603
Description
The way we handle certain TODO-ed tests in our test suite is IMO sub-optimal.
Consider t/run/todo.t. If I build an unthreaded perl on Linux (e.g., on Ubuntu 24.04 LTS) and run that program through the harness, I get:
$ sh ./Configure -des -Dusedevel && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -
ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
ok 3 - No assertion failure # TODO GH 16876
ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Test Summary Report
-------------------
run/todo.t (Wstat: 0 Tests: 6 Failed: 0)
TODO passed: 3-4
Files=1, Tests=6, 0 wallclock secs ( 0.00 usr 0.00 sys + 0.02 cusr 0.01 csys = 0.03 CPU)
Result: PASS
4 out of 6 unit tests were marked TODO -- but 2 of those tests were reported as TODO passed. I looked at that result and thought, "We should un-TODO those two tests." (Indeed, test 6 above had once been a TODO-ed test.) That led me to spend several hours preparing #22862. As part of the preparation on that p.r., I successfully tested my branch on an unthreaded build on Linux and a threaded build on FreeBSD. However, once I created the p.r., it failed many of its test runs in our GH CI setup. (See https://github.com/Perl/perl5/pull/22862/checks.) Fortunately, our long-lived smoke-testing setup (http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Freposition-todo-pass-tests-20241215) gave me valuable feedback, which, after several more hours of work led me back to t/run/todo.t -- only this time run on a -DDEBUGGING build.
$ sh ./Configure -des -Dusedevel -DDEBUGGING && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -
ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
not ok 3 - No assertion failure # TODO GH 16876
not ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Files=1, Tests=6, 2 wallclock secs ( 0.00 usr 0.00 sys + 0.02 cusr 0.01 csys = 0.03 CPU)
Result: PASS
4 out of 6 unit tests remain TODO-ed, but each of them actually FAILs. The file as a whole PASSes because the failing tests have been TODO-ed. No tests are reported as TODO passed.
Now, you have to be fairly familiar with our test suite to recognize that if a test has No assertion failure in its description (label), that means its PASS/FAIL status on -DDEBUGGING builds is ... (how shall we put it?) ... unresolved. Such a unit test cannot really be un-TODO-ed until its code passes on both non-debugging and debugging builds. But if someone sees tests reported as TODO passed, they are likely to expend considerable effort (as I did this weekend), un-TODO-ing them prematurely. Many TODO-ed tests are, of course, classified as such because they are failing against both non-debugging and debugging builds. But how should we indicate that a particular unit test may not be ready to be un-TODO-ed even if on some builds, it is reported as TODO passed?