Skip to content

Commit 3687553

Browse files
committed
DOC: page on adding / using test data
Document guidelines for adding test data to main repository and as a submodule.
1 parent 1a3dcdd commit 3687553

File tree

3 files changed

+144
-1
lines changed

3 files changed

+144
-1
lines changed

doc/source/devel/add_test_data.rst

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
################
2+
Adding test data
3+
################
4+
5+
#. We really, really like test images, but
6+
#. We are rather conservative about the size of our code repository.
7+
8+
So, we have two different ways of adding test data.
9+
10+
#. Small, open licensed files can go in the ``nibabel/tests/data`` directory
11+
(see below);
12+
#. Larger files or files with extra licensing terms can go in their own git
13+
repositories and be added as submodules to the ``nibabel-data`` directory.
14+
15+
***********
16+
Small files
17+
***********
18+
19+
Small files are around 50K or less when compressed. By "compressed", we mean,
20+
compressed with zlib, which is what git uses when storing the file in the
21+
repository. You can check the exact length directly with Python and a script
22+
like::
23+
24+
import sys
25+
import zlib
26+
27+
for fname in sys.argv[1:]:
28+
with open(fname, 'rb') as fobj:
29+
contents = fobj.read()
30+
compressed = zlib.compress(contents)
31+
print(fname, len(compressed) / 1024.)
32+
33+
One way of making files smaller when compressed is to set uninteresting values
34+
to zero or some other number so that the compression algorithm can be more
35+
effective.
36+
37+
Please don't compress the file yourself before committing to a git repo unless
38+
there's a really good reason; git will do this for you when adding to the
39+
repository, and it's a shame to make git compress a compressed file.
40+
41+
************************
42+
Files with open licenses
43+
************************
44+
45+
We very much prefer files with completely open licenses such as the `PDDL
46+
1.0`_ or the CC0_ license.
47+
48+
The files in the ``nibabel/tests/data`` will get distributed with the nibabel
49+
source code, and this can easily get installed without the user having an
50+
opportunity to review the full license. We don't think this is compatible
51+
with extra license terms like agreeing to cite the people who provided the
52+
data or agreeing not to try and work out the identity of the person who has
53+
been scanned, because it would be too easy to miss these requirements when
54+
using nibabel. It is fine to use files with these kind of licenses, but they
55+
should go in their own repository to be used as a submodule, so they do not
56+
need to be distributed with nibabel.
57+
58+
*****************************************
59+
Adding the file to ``nibabel/tests/data``
60+
*****************************************
61+
62+
If the file is less then about 50K compressed, and the license is open, then
63+
you might want to commit the file under ``nibabel/tests/data``.
64+
65+
Put the license for any new files in the COPYING file at the top level of the
66+
nibabel repo. You'll see some examples in that file already.
67+
68+
*****************************************
69+
Adding as a submodule to ``nibabel-data``
70+
*****************************************
71+
72+
Make a new git repository with the data.
73+
74+
There are example repos at
75+
76+
* https://github.com/yarikoptic/nitest-balls1
77+
* https://github.com/matthew-brett/nitest-minc2
78+
79+
Despite the fact that both the examples are on github, Bitbucket_ is good for
80+
repos like this because they don't enforce repository size limits.
81+
82+
Don't forget to include a LICENSE and README file in the repo.
83+
84+
When all is done, and the repository is safely on the internet and accessible,
85+
add the repo as a submodule to the ``nitests-data`` directory, with something
86+
like this::
87+
88+
git submodule add https://bitbucket.org/nipy/rosetta-samples.git nitests-data/rosetta-samples
89+
90+
You should now have a checked out copy of the ``rosetta-samples`` repository
91+
in the ``nibabel-data/rosetta-samples`` directory. Commit this submodule add
92+
to nibabel.
93+
94+
If you are writing tests using files from this repository, you should use the
95+
``needs_nibabel_data`` decorator to skip the tests if the data has not been
96+
checked out into the submodules. See ``nibabel/tests/test_parrec_data.py``
97+
for an example. For our example repository above it might look something
98+
like::
99+
100+
from .nibabel_data import get_nibabel_data, needs_nibabel_data
101+
102+
ROSETTA_DATA = pjoin(get_nibabel_data(), 'rosetta-samples')
103+
104+
@needs_nibabel_data('rosetta-samples')
105+
def test_something():
106+
# Some test using the data
107+
108+
Using submodules for tests
109+
==========================
110+
111+
Tests run via `nibabel on travis`_ start with an automatic checkout of all
112+
submodules in the project, so all test data submodules get checked out by
113+
default.
114+
115+
If you are running the tests locally, you may well want to do::
116+
117+
git submodule update --init
118+
119+
from the root nibabel directory. This will checkout all the test data
120+
repositories.
121+
122+
How much data should go in a single submodule?
123+
==============================================
124+
125+
The limiting factor is how long it takes travis-ci_ to checkout the data for
126+
the tests. Up to a hundred megabytes in one repository should be OK. The joy
127+
of submodules is we can always drop a submodule, split the repository into two
128+
and add only one back, so you aren't committing us to anything awful if you
129+
accidentally put some very large files into your own data repository.
130+
131+
If in doubt
132+
===========
133+
134+
If you are not sure, try us with a pull request to `nibabel github`_, or on the
135+
`nipy mailing list`_, we will try to help.
136+
137+
.. include:: ../links_names.txt

doc/source/devel/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@ Developer documentation page
1010
:maxdepth: 2
1111

1212
devguide
13-
image_design
13+
add_test_data
1414
devdiscuss
1515
make_release

doc/source/links_names.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@
3030
.. _nipy development guidelines: http://nipy.org/devel
3131
.. _`nipy github`: http://github.com/nipy/nipy
3232
.. _nipy buildbot: http://nipy.bic.berkeley.edu
33+
.. _nibabel on travis: http://travis-ci.org/nipy/nibabel
34+
.. _travis-ci: http://travis-ci.org
3335

3436
.. Documentation tools
3537
.. _graphviz: http://www.graphviz.org/
@@ -44,6 +46,9 @@
4446
.. _BSD: http://www.opensource.org/licenses/bsd-license.php
4547
.. _LGPL: http://www.gnu.org/copyleft/lesser.html
4648
.. _MIT License: http://www.opensource.org/licenses/mit-license.php
49+
.. _PDDL 1.0: http://opendatacommons.org/licenses/pddl/1.0/
50+
.. _CC0: http://opendefinition.org/licenses/cc-zero
51+
4752

4853
.. Installation
4954
.. _pypi: http://pypi.python.org/pypi
@@ -76,6 +81,7 @@
7681
.. _doctest-mode: http://www.cis.upenn.edu/~edloper/projects/doctestmode/
7782
.. _nose: http://somethingaboutorange.com/mrl/projects/nose
7883
.. _`python coverage tester`: http://nedbatchelder.com/code/modules/coverage.html
84+
.. _bitbucket: http://bitbucket.org
7985

8086
.. Other python projects
8187
.. _numpy: http://www.scipy.org/NumPy

0 commit comments

Comments
 (0)