Suboptimal handling of TODO-ed tests

The way we handle certain `TODO`-ed tests in our test suite is IMO sub-optimal.

Consider `t/run/todo.t`.  If I build an unthreaded perl on Linux (*e.g.,* on Ubuntu 24.04 LTS) and run that program through the harness, I get:
```
$ sh ./Configure -des -Dusedevel && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
ok 3 - No assertion failure # TODO GH 16876
ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.

Test Summary Report
-------------------
run/todo.t (Wstat: 0 Tests: 6 Failed: 0)
  TODO passed:   3-4
Files=1, Tests=6,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS
```
4 out of 6 unit tests were marked `TODO` -- but 2 of those tests were reported as `TODO passed`.  I looked at that result and thought, "We should un-`TODO` those two tests."  (Indeed, test 6 above had once been a `TODO`-ed test.)  That led me to spend several hours preparing https://github.com/Perl/perl5/pull/22862.  As part of the preparation on that p.r., I successfully tested my branch on an unthreaded build on Linux and a threaded build on FreeBSD.  However, once I created the p.r., it failed many of its test runs in our GH CI setup.  (See https://github.com/Perl/perl5/pull/22862/checks.)  Fortunately, our long-lived smoke-testing setup (http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Freposition-todo-pass-tests-20241215) gave me valuable feedback, which, after several more hours of work led me back to `t/run/todo.t` -- only this time run on a `-DDEBUGGING` build.

```
$ sh ./Configure -des -Dusedevel -DDEBUGGING && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
not ok 3 - No assertion failure # TODO GH 16876
not ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Files=1, Tests=6,  2 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS
```
4 out of 6 unit tests remain `TODO`-ed, but each of them actually FAILs.  The file as a whole PASSes because the failing tests have been `TODO`-ed.  No tests are reported as `TODO passed`.

Now, you have to be fairly familiar with our test suite to recognize that if a test has *No assertion failure* in its description (label), that means its PASS/FAIL status on `-DDEBUGGING` builds is ... (how shall we put it?) ... unresolved.  Such a unit test cannot really be un-`TODO`-ed until its code passes on both non-debugging and debugging builds.  But if someone sees tests reported as `TODO passed`, they are likely to expend considerable effort (as I did this weekend), un-`TODO`-ing them prematurely.  Many `TODO`-ed tests are, of course, classified as such because they are failing against both non-debugging and debugging builds.  But how should we indicate that a particular unit test may not be ready to be un-`TODO`-ed even if on some builds, it is reported as `TODO passed`?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suboptimal handling of TODO-ed tests #22863

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suboptimal handling of TODO-ed tests #22863

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions