fileslice

Utilities for getting array slices out of file-like objects

calc_slicedefs(sliceobj, in_shape, itemsize, ...)

Return parameters for slicing array with sliceobj given memory layout

canonical_slicers(sliceobj, shape[, check_inds])

Return canonical version of sliceobj for array shape shape

fileslice(fileobj, sliceobj, shape, dtype[, ...])

Slice array in fileobj using sliceobj slicer and array definitions

fill_slicer(slicer, in_len)

Return slice object with Nones filled out to match in_len

is_fancy(sliceobj)

Returns True if sliceobj is attempting fancy indexing

optimize_read_slicers(sliceobj, in_shape, ...)

Calculates slices to read from disk, and apply after reading

optimize_slicer(slicer, dim_len, all_full, ...)

Return maybe modified slice and post-slice slicing for slicer

predict_shape(sliceobj, in_shape)

Predict shape of array from slicing array shape shape with sliceobj

read_segments(fileobj, segments, n_bytes[, lock])

Read n_bytes byte data implied by segments from fileobj

slice2len(slicer, in_len)

Output length after slicing original length in_len with slicer Parameters ---------- slicer : slice object in_len : int

slice2outax(ndim, sliceobj)

Matching output axes for input array ndim ndim and slice sliceobj

slicers2segments(read_slicers, in_shape, ...)

Get segments from read_slicers given in_shape and memory steps

strided_scalar(shape[, scalar])

Return array shape shape where all entries point to value scalar

threshold_heuristic(slicer, dim_len, stride)

Whether to force full axis read or contiguous read of stepped slice

calc_slicedefs

nibabel.fileslice.calc_slicedefs(sliceobj, in_shape, itemsize, offset, order, heuristic=<function threshold_heuristic>)

Return parameters for slicing array with sliceobj given memory layout

Calculate the best combination of skips / (read + discard) to use for reading the data from disk / memory, then generate corresponding segments, the disk offsets and read lengths to read the memory. If we have chosen some (read + discard) optimization, then we need to discard the surplus values from the read array using post_slicers, a slicing tuple that takes the array as read from a file-like object, and returns the array we want.

Parameters:
sliceobjobject

something that can be used to slice an array as in arr[sliceobj]

in_shapesequence

shape of underlying array to be sliced

itemsizeint

element size in array (in bytes)

offsetint

offset of array data in underlying file or memory buffer

order{‘C’, ‘F’}

memory layout of underlying array

heuristiccallable, optional

function taking slice object, dim_len, stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See optimize_slicer() and threshold_heuristic()

Returns:
segmentslist

list of 2 element lists where lists are (offset, length), giving absolute memory offset in bytes and number of bytes to read

read_shapetuple

shape with which to interpret memory as read from segments. Interpreting the memory read from segments with this shape, and a dtype, gives an intermediate array - call this R

post_slicerstuple

Any new slicing to be applied to the array R after reading via segments and reshaping via read_shape. Slices are in terms of read_shape. If empty, no new slicing to apply

canonical_slicers

nibabel.fileslice.canonical_slicers(sliceobj, shape, check_inds=True)

Return canonical version of sliceobj for array shape shape

sliceobj is a slicer for an array A implied by shape.

  • Expand sliceobj with slice(None) to add any missing (implied) axes in sliceobj

  • Find any slicers in sliceobj that do a full axis slice and replace by slice(None)

  • Replace any floating point values for slicing with integers

  • Replace negative integer slice values with equivalent positive integers.

Does not handle fancy indexing (indexing with arrays or array-like indices)

Parameters:
sliceobjobject

something that can be used to slice an array as in arr[sliceobj]

shapesequence

shape of array that will be indexed by sliceobj

check_inds{True, False}, optional

Whether to check if integer indices are out of bounds

Returns:
can_slicerstuple

version of sliceobj for which Ellipses have been expanded, missing (implied) dimensions have been appended, and slice objects equivalent to slice(None) have been replaced by slice(None), integer axes have been checked, and negative indices set to positive equivalent

fileslice

nibabel.fileslice.fileslice(fileobj, sliceobj, shape, dtype, offset=0, order='C', heuristic=<function threshold_heuristic>, lock=None)

Slice array in fileobj using sliceobj slicer and array definitions

fileobj contains the contiguous binary data for an array A of shape, dtype, memory layout shape, dtype, order, with the binary data starting at file offset offset.

Our job is to return the sliced array A[sliceobj] in the most efficient way in terms of memory and time.

Sometimes it will be quicker to read memory that we will later throw away, to save time we might lose doing short seeks on fileobj. Call these alternatives: (read + discard); and skip. This routine guesses when to (read+discard) or skip using the callable heuristic, with a default using a hard threshold for the memory gap large enough to prefer a skip.

Parameters:
fileobjfile-like object

file-like object, opened for reading in binary mode. Implements read and seek.

sliceobjobject

something that can be used to slice an array as in arr[sliceobj].

shapesequence

shape of full array inside fileobj.

dtypedtype specifier

dtype of array inside fileobj, or input to numpy.dtype to specify array dtype.

offsetint, optional

offset of array data within fileobj

order{‘C’, ‘F’}, optional

memory layout of array in fileobj.

heuristiccallable, optional

function taking slice object, axis length, stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See optimize_slicer() and see threshold_heuristic() for an example.

lock{None, threading.Lock, lock-like} optional

If provided, used to ensure that paired calls to seek and read cannot be interrupted by another thread accessing the same fileobj. Each thread which accesses the same file via read_segments must share a lock in order to ensure that the file access is thread-safe. A lock does not need to be provided for single-threaded access. The default value (None) results in a lock-like object (a _NullLock) which does not do anything.

Returns:
sliced_arrarray

Array in fileobj as sliced with sliceobj

fill_slicer

nibabel.fileslice.fill_slicer(slicer, in_len)

Return slice object with Nones filled out to match in_len

Also fixes too large stop / start values according to slice() slicing rules.

The returned slicer can have a None as slicer.stop if slicer.step is negative and the input slicer.stop is None. This is because we can’t represent the stop as an integer, because -1 has a different meaning.

Parameters:
slicerslice object
in_lenint

length of axis on which slicer will be applied

Returns:
can_slicerslice object

slice with start, stop, step set to explicit values, with the exception of stop for negative step, which is None for the case of slicing down through the first element

is_fancy

nibabel.fileslice.is_fancy(sliceobj)

Returns True if sliceobj is attempting fancy indexing

Parameters:
sliceobjobject

something that can be used to slice an array as in arr[sliceobj]

Returns:
tf: bool

True if sliceobj represents fancy indexing, False for basic indexing

optimize_read_slicers

nibabel.fileslice.optimize_read_slicers(sliceobj, in_shape, itemsize, heuristic)

Calculates slices to read from disk, and apply after reading

Parameters:
sliceobjobject

something that can be used to slice an array as in arr[sliceobj]. Can be assumed to be canonical in the sense of canonical_slicers

in_shapesequence

shape of underlying array to be sliced. Array for in_shape assumed to be already in ‘F’ order. Reorder shape / sliceobj for slicing a ‘C’ array before passing to this function.

itemsizeint

element size in array (bytes)

heuristiccallable

function taking slice object, axis length, and stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See optimize_slicer(); see threshold_heuristic() for an example.

Returns:
read_slicerstuple

sliceobj maybe rephrased to fill out dimensions that are better read from disk and later trimmed to their original size with post_slicers. read_slicers implies a block of memory to be read from disk. The actual disk positions come from slicers2segments run over read_slicers. Includes any newaxis dimensions in sliceobj

post_slicerstuple

Any new slicing to be applied to the read array after reading. The post_slicers discard any memory that we read to save time, but that we don’t need for the slice. Include any newaxis dimension added by sliceobj

optimize_slicer

nibabel.fileslice.optimize_slicer(slicer, dim_len, all_full, is_slowest, stride, heuristic=<function threshold_heuristic>)

Return maybe modified slice and post-slice slicing for slicer

Parameters:
slicerslice object or int
dim_lenint

length of axis along which to slice

all_fullbool

Whether dimensions up until now have been full (all elements)

is_slowestbool

Whether this dimension is the slowest changing in memory / on disk

strideint

size of one step along this axis

heuristiccallable, optional

function taking slice object, dim_len, stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See threshold_heuristic() for an example.

Returns:
to_readslice object or int

maybe modified slice based on slicer expressing what data should be read from an underlying file or buffer. to_read must always have positive step (because we don’t want to go backwards in the buffer / file)

post_sliceslice object

slice to be applied after array has been read. Applies any transformations in slicer that have not been applied in to_read. If axis will be dropped by to_read slicing, so no slicing would make sense, return string dropped

Notes

This is the heart of the algorithm for making segments from slice objects.

A contiguous slice is a slice with slice.step in (1, -1)

A full slice is a continuous slice returning all elements.

The main question we have to ask is whether we should transform to_read, post_slice to prefer a full read and partial slice. We only do this in the case of all_full==True. In this case we might benefit from reading a continuous chunk of data even if the slice is not continuous, or reading all the data even if the slice is not full. Apply a heuristic heuristic to decide whether to do this, and adapt to_read and post_slice slice accordingly.

Otherwise (apart from constraint to be positive) return to_read unaltered and post_slice as slice(None)

predict_shape

nibabel.fileslice.predict_shape(sliceobj, in_shape)

Predict shape of array from slicing array shape shape with sliceobj

Parameters:
sliceobjobject

something that can be used to slice an array as in arr[sliceobj]

in_shapesequence

shape of array that could be sliced by sliceobj

Returns:
out_shapetuple

predicted shape arising from slicing array shape in_shape with sliceobj

read_segments

nibabel.fileslice.read_segments(fileobj, segments, n_bytes, lock=None)

Read n_bytes byte data implied by segments from fileobj

Parameters:
fileobjfile-like object

Implements seek and read

segmentssequence

list of 2 sequences where sequences are (offset, length), giving absolute file offset in bytes and number of bytes to read

n_bytesint

total number of bytes that will be read

lock{None, threading.Lock, lock-like} optional

If provided, used to ensure that paired calls to seek and read cannot be interrupted by another thread accessing the same fileobj. Each thread which accesses the same file via read_segments must share a lock in order to ensure that the file access is thread-safe. A lock does not need to be provided for single-threaded access. The default value (None) results in a lock-like object (a _NullLock) which does not do anything.

Returns:
bufferbuffer object

object implementing buffer protocol, such as byte string or ndarray or mmap or ctypes c_char_array

slice2len

nibabel.fileslice.slice2len(slicer, in_len)

Output length after slicing original length in_len with slicer Parameters ———- slicer : slice object in_len : int

Returns:
out_lenint

Length after slicing

Notes

Returns same as len(np.arange(in_len)[slicer])

slice2outax

nibabel.fileslice.slice2outax(ndim, sliceobj)

Matching output axes for input array ndim ndim and slice sliceobj

Parameters:
ndimint

number of axes in input array

sliceobjobject

something that can be used to slice an array as in arr[sliceobj]

Returns:
out_ax_indstuple

Say A` is a (pretend) input array of `ndim` dimensions. Say ``B = A[sliceobj]. out_ax_inds has one value per axis in A giving corresponding axis in B.

slicers2segments

nibabel.fileslice.slicers2segments(read_slicers, in_shape, offset, itemsize)

Get segments from read_slicers given in_shape and memory steps

Parameters:
read_slicersobject

something that can be used to slice an array as in arr[sliceobj] Slice objects can by be assumed canonical as in canonical_slicers, and positive as in _positive_slice

in_shapesequence

shape of underlying array on disk before reading

offsetint

offset of array data in underlying file or memory buffer

itemsizeint

element size in array (in bytes)

Returns:
segmentslist

list of 2 element lists where lists are [offset, length], giving absolute memory offset in bytes and number of bytes to read

strided_scalar

nibabel.fileslice.strided_scalar(shape, scalar=0.0)

Return array shape shape where all entries point to value scalar

Parameters:
shapesequence

Shape of output array.

scalarscalar

Scalar value with which to fill array.

Returns:
strided_arrarray

Array of shape shape for which all values == scalar, built by setting all strides of strided_arr to 0, so the scalar is broadcast out to the full array shape. strided_arr is flagged as not writeable.

The array is set read-only to avoid a numpy error when broadcasting - see https://github.com/numpy/numpy/issues/6491

threshold_heuristic

nibabel.fileslice.threshold_heuristic(slicer, dim_len, stride, skip_thresh=256)

Whether to force full axis read or contiguous read of stepped slice

Allows fileslice() to sometimes read memory that it will throw away in order to get maximum speed. In other words, trade memory for fewer disk reads.

Parameters:
slicerslice object, or int

If slice, can be assumed to be full as in fill_slicer

dim_lenint

length of axis being sliced

strideint

memory distance between elements on this axis

skip_threshint, optional

Memory gap threshold in bytes above which to prefer skipping memory rather than reading it and later discarding.

Returns:
action{‘full’, ‘contiguous’, None}

Gives the suggested optimization for reading the data

  • ‘full’ - read whole axis

  • ‘contiguous’ - read all elements between start and stop

  • None - read only memory needed for output

Notes

Let’s say we are in the middle of reading a file at the start of some memory length $B$ bytes. We don’t need the memory, and we are considering whether to read it anyway (then throw it away) (READ) or stop reading, skip $B$ bytes and restart reading from there (SKIP).

After trying some more fancy algorithms, a hard threshold (skip_thresh) for the maximum skip distance seemed to work well, as measured by times on nibabel.benchmarks.bench_fileslice