Skip to content

Suboptimal handling of TODO-ed tests #22863

@jkeenan

Description

@jkeenan

The way we handle certain TODO-ed tests in our test suite is IMO sub-optimal.

Consider t/run/todo.t. If I build an unthreaded perl on Linux (e.g., on Ubuntu 24.04 LTS) and run that program through the harness, I get:

$ sh ./Configure -des -Dusedevel && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
ok 3 - No assertion failure # TODO GH 16876
ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.

Test Summary Report
-------------------
run/todo.t (Wstat: 0 Tests: 6 Failed: 0)
  TODO passed:   3-4
Files=1, Tests=6,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS

4 out of 6 unit tests were marked TODO -- but 2 of those tests were reported as TODO passed. I looked at that result and thought, "We should un-TODO those two tests." (Indeed, test 6 above had once been a TODO-ed test.) That led me to spend several hours preparing #22862. As part of the preparation on that p.r., I successfully tested my branch on an unthreaded build on Linux and a threaded build on FreeBSD. However, once I created the p.r., it failed many of its test runs in our GH CI setup. (See https://github.com/Perl/perl5/pull/22862/checks.) Fortunately, our long-lived smoke-testing setup (http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Freposition-todo-pass-tests-20241215) gave me valuable feedback, which, after several more hours of work led me back to t/run/todo.t -- only this time run on a -DDEBUGGING build.

$ sh ./Configure -des -Dusedevel -DDEBUGGING && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
not ok 3 - No assertion failure # TODO GH 16876
not ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Files=1, Tests=6,  2 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS

4 out of 6 unit tests remain TODO-ed, but each of them actually FAILs. The file as a whole PASSes because the failing tests have been TODO-ed. No tests are reported as TODO passed.

Now, you have to be fairly familiar with our test suite to recognize that if a test has No assertion failure in its description (label), that means its PASS/FAIL status on -DDEBUGGING builds is ... (how shall we put it?) ... unresolved. Such a unit test cannot really be un-TODO-ed until its code passes on both non-debugging and debugging builds. But if someone sees tests reported as TODO passed, they are likely to expend considerable effort (as I did this weekend), un-TODO-ing them prematurely. Many TODO-ed tests are, of course, classified as such because they are failing against both non-debugging and debugging builds. But how should we indicate that a particular unit test may not be ready to be un-TODO-ed even if on some builds, it is reported as TODO passed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closable?We might be able to close this ticket, but we need to check with the reporter

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions