-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Semi-related, and inspired by #331 , but also unrelated.
This is something i have encountered while reviewing for JOSS a few times - JOSS requires that a project have docs and have tests, but it makes no/few requirements about what those docs and tests should cover. So a submission could technically have one test that's just test_nothing = lambda: print("it works")
and that would technically be in-bounds at JOSS (up to reviewer discretion)
Our current language in the checklist is:
- Tests cover essential functions of the package and a reasonable range of inputs and conditions.
which i think is pretty good. I don't think we should set some minimum coverage percentage or anything, but it might be good idea to ask authors to include a coverage report with their submission for a few reasons
- Many packages tests don't do this automatically, which is fine! so it would be hard for a reviewer to get a sense of whether the essential functions of a package are covered. They may be able to see if there are the major integration tests that cover the functionality described in the readme, but probably not read every unit test and compare with the source. A coverage report is a an (imperfect) shorthand for places where the reviewer might want to pay attention to , e.g. if there is low coverage in some critical module.
- One thing that LLMs love to do when in vibe code mode is to leave a bunch of dead code around. It's not a great diagnostic tool by any means (low sensitivity and specificity) but again, as a signal for where the reviewer might want to take a look, areas that are uncovered by tests might warrant further attention to also ask "hey what's with all this dead code did an ai write this or what"
This might be hard to automate as like a CI action or bot command since some packages have pretty nonstandard test systems/requirements, so we could do a few things there. first just have docs describing how to do it, if someone is using pytest it's pretty straightforward to do a one-off run of pytest-cov
. If someone has a test setup that doesn't easily allow them to use coverage for whatever reason, then they can just say that.
We might also have some "baseline" action for packages that use pytest "normally" where someone could make a comment beneath their submission like
@ pyOsBot run coverage
setup: |
python -m pip install pdm
pdm install --with tests
paths:
src: src/mypackage
tests: somewhere/tests
which runs an action, setting up their package, running pytest with pytest-cov, and printing the results in a reply. We can keep it from being ununtrisive by putting it in a details
tag like
Expand/collapse coverage
---------- coverage: platform darwin, python 3.13.1-final-0 ----------
Name Stmts Miss Cover Missing
----------------------------------------------------------------------
src/numpydantic/__init__.py 3 0 100%
src/numpydantic/dtype.py 68 1 99% 26
src/numpydantic/exceptions.py 7 0 100%
src/numpydantic/interface/__init__.py 7 0 100%
src/numpydantic/interface/dask.py 56 2 96% 92-94
src/numpydantic/interface/hdf5.py 181 1 99% 61
src/numpydantic/interface/interface.py 209 3 99% 269, 479, 503
src/numpydantic/interface/numpy.py 51 3 94% 63, 87-89
src/numpydantic/interface/video.py 133 5 96% 34, 160, 271-274
src/numpydantic/interface/xarray.py 0 0 100%
src/numpydantic/interface/zarr.py 104 1 99% 160
src/numpydantic/maps.py 10 0 100%
src/numpydantic/ndarray.py 92 2 98% 170-171
src/numpydantic/schema.py 95 0 100%
src/numpydantic/serialization.py 76 6 92% 83-84, 98-99, 119, 148
src/numpydantic/testing/__init__.py 2 0 100%
src/numpydantic/testing/cases.py 41 2 95% 26-28
src/numpydantic/testing/helpers.py 126 2 98% 28, 189
src/numpydantic/testing/interfaces.py 138 2 99% 77, 112
src/numpydantic/types.py 12 0 100%
src/numpydantic/validation/__init__.py 3 0 100%
src/numpydantic/validation/dtype.py 25 2 92% 17, 62
src/numpydantic/validation/shape.py 85 4 95% 96, 211-213
----------------------------------------------------------------------
TOTAL 1524 36 98%
At the risk of losing focus, along the same lines, we might consider some static analysis reports that should be easier to automate: e.g. using pmd to detect duplicated lines or using modulefinder to find package modules that were never imported. giving some basic code health reports seems like it would be helpful for both authors and reviewers