Design of data packages for the nibabel and the nipy suite

See Principles of data package for a more general discussion of design issues.

When developing or using nipy, many data files can be useful. We divide the data files nipy uses into at least 3 categories

  1. test data - data files required for routine code testing

  2. template data - data files required for algorithms to function, such as templates or atlases

  3. example data - data files for running examples, or optional tests

Files used for routine testing are typically very small data files. They are shipped with the software, and live in the code repository. For example, in the case of nipy itself, there are some test files that live in the module path nipy.testing.data. Nibabel ships data files in nibabel.tests.data. See Adding test data for discussion.

template data and example data are example of data packages. What follows is a discussion of the design and use of data packages.

Use cases for data packages

Using the data package

The programmer can use the data like this:

from nibabel.data import make_datasource

templates = make_datasource(dict(relpath='nipy/templates'))
fname = templates.get_filename('ICBM152', '2mm', 'T1.nii.gz')

where fname will be the absolute path to the template image ICBM152/2mm/T1.nii.gz.

The programmer can insist on a particular version of a datasource:

>>> if templates.version < '0.4':
...     raise ValueError('Need datasource version at least 0.4')
Traceback (most recent call last):
...
ValueError: Need datasource version at least 0.4

If the repository cannot find the data, then:

>>> make_datasource(dict(relpath='nipy/implausible'))
Traceback (most recent call last):
 ...
nibabel.data.DataError: ...

where DataError gives a helpful warning about why the data was not found, and how it should be installed.

Warnings during installation

The example data and template data may be important, and so we want to warn the user if NIPY cannot find either of the two sets of data when installing the package. Thus:

python setup.py install

will import nipy after installation to check whether these raise an error:

>>> from nibabel.data import make_datasource
>>> templates = make_datasource(dict(relpath='nipy/templates'))
>>> example_data = make_datasource(dict(relpath='nipy/data'))

and warn the user accordingly, with some basic instructions for how to install the data.

Finding the data

The routine make_datasource will look for data packages that have been installed. For the following call:

>>> templates = make_datasource(dict(relpath='nipy/templates'))

the code will:

  1. Get a list of paths where data is known to be stored with nibabel.data.get_data_path()

  2. For each of these paths, search for directory nipy/templates. If found, and of the correct format (see below), return a datasource, otherwise raise an Exception

The paths collected by nibabel.data.get_data_paths() are constructed from ‘:’ (Unix) or ‘;’ separated strings. The source of the strings (in the order in which they will be used in the search above) are:

  1. The value of the NIPY_DATA_PATH environment variable, if set

  2. A section = DATA, parameter = path entry in a config.ini file in nipy_dir where nipy_dir is $HOME/.nipy or equivalent.

  3. Section = DATA, parameter = path entries in configuration .ini files, where the .ini files are found by glob.glob(os.path.join(etc_dir, '*.ini') and etc_dir is /etc/nipy on Unix, and some suitable equivalent on Windows.

  4. The result of os.path.join(sys.prefix, 'share', 'nipy')

  5. If sys.prefix is /usr, we add /usr/local/share/nipy. We need this because Python >= 2.6 in Debian / Ubuntu does default installs to /usr/local.

  6. The result of get_nipy_user_dir()

Requirements for a data package

To be a valid NIPY project data package, you need to satisfy:

  1. The installer installs the data in some place that can be found using the method defined in Finding the data.

We recommend that:

  1. By default, you install data in a standard location such as <prefix>/share/nipy where <prefix> is the standard Python prefix obtained by >>> import sys; print sys.prefix

Remember that there is a distinction between the NIPY project - the umbrella of neuroimaging in python - and the NIPY package - the main code package in the NIPY project. Thus, if you want to install data under the NIPY package umbrella, your data might go to /usr/share/nipy/nipy/packagename (on Unix). Note nipy twice - once for the project, once for the package. If you want to install data under - say - the pbrain package umbrella, that would go in /usr/share/nipy/pbrain/packagename.

Data package format

The following tree is an example of the kind of pattern we would expect in a data directory, where the nipy-data and nipy-templates packages have been installed:

<ROOT>
`-- nipy
    |-- data
    |   |-- config.ini
    |   `-- placeholder.txt
    `-- templates
        |-- ICBM152
        |   `-- 2mm
        |       `-- T1.nii.gz
        |-- colin27
        |   `-- 2mm
        |       `-- T1.nii.gz
        `-- config.ini

The <ROOT> directory is the directory that will appear somewhere in the list from nibabel.data.get_data_path(). The nipy subdirectory signifies data for the nipy package (as opposed to other NIPY-related packages such as pbrain). The data subdirectory of nipy contains files from the nipy-data package. In the nipy/data or nipy/templates directories, there is a config.ini file, that has at least an entry like this:

[DEFAULT]
version = 0.2

giving the version of the data package.

Installing the data

We use python distutils to install data packages, and the data_files mechanism to install the data. On Unix, with the following command:

python setup.py install --prefix=/my/prefix

data will go to:

/my/prefix/share/nipy

For the example above this will result in these subdirectories:

/my/prefix/share/nipy/nipy/data
/my/prefix/share/nipy/nipy/templates

because nipy is both the project, and the package to which the data relates.

If you install to a particular location, you will need to add that location to the output of nibabel.data.get_data_path() using one of the mechanisms above, for example, in your system configuration:

export NIPY_DATA_PATH=/my/prefix/share/nipy

Packaging for distributions

For a particular data package - say nipy-templates - distributions will want to:

  1. Install the data in set location. The default from python setup.py install for the data packages will be /usr/share/nipy on Unix.

  2. Point a system installation of NIPY to these data.

For the latter, the most obvious route is to copy an .ini file named for the data package into the NIPY etc_dir. In this case, on Unix, we will want a file called /etc/nipy/nipy_templates.ini with contents:

[DATA]
path = /usr/share/nipy

Current implementation

This section describes how we (the nipy community) implement data packages at the moment.

The data in the data packages will not usually be under source control. This is because images don’t compress very well, and any change in the data will result in a large extra storage cost in the repository. If you’re pretty clear that the data files aren’t going to change, then a repository could work OK.

The data packages will be available at a central release location. For now this will be: http://nipy.org/data-packages/ .

A package, such as nipy-templates-0.2.tar.gz will have the following sort of structure:

<ROOT>
  |-- setup.py
  |-- README.txt
  |-- MANIFEST.in
  `-- templates
      |-- ICBM152
      |   |-- 1mm
      |   |   `-- T1_brain.nii.gz
      |   `-- 2mm
      |       `-- T1.nii.gz
      |-- colin27
      |   `-- 2mm
      |       `-- T1.nii.gz
      `-- config.ini

There should be only one nipy/packagename directory delivered by a particular package. For example, this package installs nipy/templates, but does not contain nipy/data.

Making a new package tarball is simply:

  1. Downloading and unpacking e.g. nipy-templates-0.1.tar.gz to form the directory structure above;

  2. Making any changes to the directory;

  3. Running setup.py sdist to recreate the package.

The process of making a release should be:

  1. Increment the major or minor version number in the config.ini file;

  2. Make a package tarball as above;

  3. Upload to distribution site.

There is an example nipy data package nipy-examplepkg in the examples directory of the NIPY repository.

The machinery for creating and maintaining data packages is available at https://github.com/nipy/data-packaging.

See the README.txt file there for more information.