Design of data packages for the nibabel and the nipy suite¶
See Principles of data package for a more general discussion of design issues.
When developing or using nipy, many data files can be useful. We divide the data files nipy uses into at least 3 categories
test data - data files required for routine code testing
template data - data files required for algorithms to function, such as templates or atlases
example data - data files for running examples, or optional tests
Files used for routine testing are typically very small data files. They are
shipped with the software, and live in the code repository. For example, in
the case of nipy
itself, there are some test files that live in the module
path nipy.testing.data
. Nibabel ships data files in
nibabel.tests.data
. See Adding test data for discussion.
template data and example data are example of data packages. What follows is a discussion of the design and use of data packages.
Use cases for data packages¶
Using the data package¶
The programmer can use the data like this:
from nibabel.data import make_datasource
templates = make_datasource(dict(relpath='nipy/templates'))
fname = templates.get_filename('ICBM152', '2mm', 'T1.nii.gz')
where fname
will be the absolute path to the template image
ICBM152/2mm/T1.nii.gz
.
The programmer can insist on a particular version of a datasource
:
>>> if templates.version < '0.4':
... raise ValueError('Need datasource version at least 0.4')
Traceback (most recent call last):
...
ValueError: Need datasource version at least 0.4
If the repository cannot find the data, then:
>>> make_datasource(dict(relpath='nipy/implausible'))
Traceback (most recent call last):
...
nibabel.data.DataError: ...
where DataError
gives a helpful warning about why the data was not
found, and how it should be installed.
Warnings during installation¶
The example data and template data may be important, and so we want to warn the user if NIPY cannot find either of the two sets of data when installing the package. Thus:
python setup.py install
will import nipy after installation to check whether these raise an error:
>>> from nibabel.data import make_datasource
>>> templates = make_datasource(dict(relpath='nipy/templates'))
>>> example_data = make_datasource(dict(relpath='nipy/data'))
and warn the user accordingly, with some basic instructions for how to install the data.
Finding the data¶
The routine make_datasource
will look for data packages that have been
installed. For the following call:
>>> templates = make_datasource(dict(relpath='nipy/templates'))
the code will:
Get a list of paths where data is known to be stored with
nibabel.data.get_data_path()
For each of these paths, search for directory
nipy/templates
. If found, and of the correct format (see below), return a datasource, otherwise raise an Exception
The paths collected by nibabel.data.get_data_paths()
are constructed from
‘:’ (Unix) or ‘;’ separated strings. The source of the strings (in the order
in which they will be used in the search above) are:
The value of the
NIPY_DATA_PATH
environment variable, if setA section =
DATA
, parameter =path
entry in aconfig.ini
file innipy_dir
wherenipy_dir
is$HOME/.nipy
or equivalent.Section =
DATA
, parameter =path
entries in configuration.ini
files, where the.ini
files are found byglob.glob(os.path.join(etc_dir, '*.ini')
andetc_dir
is/etc/nipy
on Unix, and some suitable equivalent on Windows.The result of
os.path.join(sys.prefix, 'share', 'nipy')
If
sys.prefix
is/usr
, we add/usr/local/share/nipy
. We need this because Python >= 2.6 in Debian / Ubuntu does default installs to/usr/local
.The result of
get_nipy_user_dir()
Requirements for a data package¶
To be a valid NIPY project data package, you need to satisfy:
The installer installs the data in some place that can be found using the method defined in Finding the data.
We recommend that:
By default, you install data in a standard location such as
<prefix>/share/nipy
where<prefix>
is the standard Python prefix obtained by>>> import sys; print sys.prefix
Remember that there is a distinction between the NIPY project - the
umbrella of neuroimaging in python - and the NIPY package - the main
code package in the NIPY project. Thus, if you want to install data
under the NIPY package umbrella, your data might go to
/usr/share/nipy/nipy/packagename
(on Unix). Note nipy
twice -
once for the project, once for the package. If you want to install data
under - say - the pbrain
package umbrella, that would go in
/usr/share/nipy/pbrain/packagename
.
Data package format¶
The following tree is an example of the kind of pattern we would expect
in a data directory, where the nipy-data
and nipy-templates
packages have been installed:
<ROOT>
`-- nipy
|-- data
| |-- config.ini
| `-- placeholder.txt
`-- templates
|-- ICBM152
| `-- 2mm
| `-- T1.nii.gz
|-- colin27
| `-- 2mm
| `-- T1.nii.gz
`-- config.ini
The <ROOT>
directory is the directory that will appear somewhere in
the list from nibabel.data.get_data_path()
. The nipy
subdirectory
signifies data for the nipy
package (as opposed to other
NIPY-related packages such as pbrain
). The data
subdirectory of
nipy
contains files from the nipy-data
package. In the
nipy/data
or nipy/templates
directories, there is a
config.ini
file, that has at least an entry like this:
[DEFAULT]
version = 0.2
giving the version of the data package.
Installing the data¶
We use python distutils to install data packages, and the data_files
mechanism to install the data. On Unix, with the following command:
python setup.py install --prefix=/my/prefix
data will go to:
/my/prefix/share/nipy
For the example above this will result in these subdirectories:
/my/prefix/share/nipy/nipy/data
/my/prefix/share/nipy/nipy/templates
because nipy
is both the project, and the package to which the data
relates.
If you install to a particular location, you will need to add that location to
the output of nibabel.data.get_data_path()
using one of the mechanisms
above, for example, in your system configuration:
export NIPY_DATA_PATH=/my/prefix/share/nipy
Packaging for distributions¶
For a particular data package - say nipy-templates
- distributions
will want to:
Install the data in set location. The default from
python setup.py install
for the data packages will be/usr/share/nipy
on Unix.Point a system installation of NIPY to these data.
For the latter, the most obvious route is to copy an .ini
file named for
the data package into the NIPY etc_dir
. In this case, on Unix, we will
want a file called /etc/nipy/nipy_templates.ini
with contents:
[DATA]
path = /usr/share/nipy
Current implementation¶
This section describes how we (the nipy community) implement data packages at the moment.
The data in the data packages will not usually be under source control. This is because images don’t compress very well, and any change in the data will result in a large extra storage cost in the repository. If you’re pretty clear that the data files aren’t going to change, then a repository could work OK.
The data packages will be available at a central release location. For now this will be: http://nipy.org/data-packages/ .
A package, such as nipy-templates-0.2.tar.gz
will have the following sort
of structure:
<ROOT>
|-- setup.py
|-- README.txt
|-- MANIFEST.in
`-- templates
|-- ICBM152
| |-- 1mm
| | `-- T1_brain.nii.gz
| `-- 2mm
| `-- T1.nii.gz
|-- colin27
| `-- 2mm
| `-- T1.nii.gz
`-- config.ini
There should be only one nipy/packagename
directory delivered by a
particular package. For example, this package installs nipy/templates
,
but does not contain nipy/data
.
Making a new package tarball is simply:
Downloading and unpacking e.g.
nipy-templates-0.1.tar.gz
to form the directory structure above;Making any changes to the directory;
Running
setup.py sdist
to recreate the package.
The process of making a release should be:
Increment the major or minor version number in the
config.ini
file;Make a package tarball as above;
Upload to distribution site.
There is an example nipy data package nipy-examplepkg
in the
examples
directory of the NIPY repository.
The machinery for creating and maintaining data packages is available at https://github.com/nipy/data-packaging.
See the README.txt
file there for more information.