SPM DICOM conversion

These are some notes on the algorithms that SPM uses to convert from DICOM to nifti. There are other notes in Siemens mosaic format.

The relevant SPM files are spm_dicom_headers.m, spm_dicom_dict.mat and spm_dicom_convert.m. These notes refer the version in SPM8, as of around January 2010.

spm_dicom_dict.mat

This is obviously a Matlab .mat file. It contains variables group and element, and values, where values is a struct array, one element per (group, element) pair, with fields name and vr (the last a cell array).

spm_dicom_headers.m

Reads the given DICOM files into a struct. It looks like this was written by John Ahsburner (JA). Relevant fixes are:

File opening

When opening the DICOM file, SPM (subfunction readdicomfile)

  1. opens as little endian

  2. reads 4 characters starting at pos 128

  3. checks if these are DICM; if so then continues file read; otherwise, tests to see if this is what SPM calls truncated DICOM file format - lacking 128 byte lead in and DICM string:

    1. Seeks to beginning of file

    2. Reads two unsigned short values into group and tag

    3. If the (group, element) pair exist in spm_dicom_dict.mat, then set file pointer to 0 and continue read with read_dicom subfunction..

    4. If group == 8 and element == 0, this is apparently the signature for a ‘GE Twin+excite’ for which JA notes there is no documentation; set file pointer to 0 and continue read with read_dicom subfunction.

    5. Otherwise - crash out with error saying that this is not DICOM file.

tag read for Philips Integra

The read_dicom subfunction reads a tag, then has a loop during which the tag is processed (by setting values into the return structure). At the end of the loop, it reads the next tag. The loop breaks when the current tag is empty, or is the item delimitation tag (group=FFFE, element=E00D).

After it has broken out of the loop, if the last tag was (FFFE, E00D) (item delimitation tag), and the tag length was not 0, then SPM sets the file pointer back by 4 bytes from the current position. JA comments that he didn’t find that in the standard, but that it seemed to be needed for the Philips Integra.

Tag length

Tag lengths as read in read_tag subfunction. If current format is explicit (as in ‘explicit little endian’):

  1. For VR of x00x00, then group, element must be (FFFE, E00D) (item delimitation tag). JA comments that GE ‘ImageDelimitationItem’ has no VR, just 4 0 bytes. In this case the tag length is zero, and we read another two bytes ahead.

There’s a check for not-even tag length. If not even:

  1. 4294967295 appears to be OK - and decoded as Inf for tag length.

  2. 13 appears to mean 10 and is reset to be 10

  3. Any other odd number is not valid and gives a tag length of 0

SQ VR type (Sequence of items type)

tag length of 13 set to tag length 10.

spm_dicom_convert.m

Written by John Ashburner and Jesper Andersson.

File categorization

SPM makes a special case of Siemens ‘spectroscopy images’. These are images that have ‘SOPClassUID’ == ‘1.3.12.2.1107.5.9.1’ and the private tag of (29, 1210); for these it pulls out the affine, and writes a volume of ones corresponding to the acquisition planes.

For images that are not spectroscopy:

  • Discards images that do not have any of (‘MR’, ‘PT’, ‘CT’) in ‘Modality’ field.

  • Discards images lacking any of ‘StartOfPixelData’, ‘SamplesperPixel’, ‘Rows’, ‘Columns’, ‘BitsAllocated’, ‘BitsStored’, ‘HighBit’, ‘PixelRespresentation’

  • Discards images lacking any of ‘PixelSpacing’, ‘ImagePositionPatient’, ‘ImageOrientationPatient’ - presumably on the basis that SPM cannot reconstruct the affine.

  • Fields ‘SeriesNumber’, ‘AcquisitionNumber’ and ‘InstanceNumber’ are set to 1 if absent.

Next SPM distinguishes between Siemens mosaic format and standard DICOM.

Mosaic images are those with the Siemens private tag:

(0029, 1009) [CSA Image Header Version]          OB: '20100114'

and a readable CSA header (see Siemens mosaic format), and with non-empty fields from that header of ‘AcquisitionMatrixText’, ‘NumberOfImagesInMosaic’, and with non-zero ‘NumberOfImagesInMosaic’. The rest are standard DICOM.

For converting mosaic format, see Siemens mosaic format. The rest of this page refers to standard (slice by slice) DICOMs.

Sorting files into volumes

First pass

Take first header, put as start of first volume. For each subsequent header:

  1. Get ICE_Dims if present. Look for Siemens ‘CSAImageHeaderInfo’, check it has a ‘name’ field, then pull dimensions out of ‘ICE_Dims’ field in form of 9 integers separated by ‘_’, where ‘X’ in this string replaced by ‘-1’ - giving ‘ICE1’

Then, for each currently identified volume:

  1. If we have ICE1 above, and we do have ‘CSAIMageHeaderInfo’, with a ‘name’, in the first header in this volume, then extract ICE dims in the same way as above, for the first header in this volume, and check whether all but ICE1[6:8] are the same as ICE2. Set flag that all ICE dims are identical for this volume. Set this flag to True if we did not have ICE1 or CSA information.

  2. Match the current header to the current volume iff the following match:

    1. SeriesNumber

    2. Rows

    3. Columns

    4. ImageOrientationPatient (to tolerance of sum squared difference 1e-4)

    5. PixelSpacing (to tolerance of sum squared difference 1e-4)

    6. ICE dims as defined above

    7. ImageType (iff imagetype exists in both)

    8. SequenceName (iff sequencename exists in both)

    9. SeriesInstanceUID (iff exists in both)

    10. EchoNumbers (iff exists in both)

  3. If the current header matches the current volume, insert it there, otherwise make a new volume for this header

Second pass

We now have a list of volumes, where each volume is a list of headers that may match.

For each volume:

  1. Estimate the z direction cosine by (effectively) finding the cross product of the x and y direction cosines contained in ‘ImageOrientationPatient’ - call this z_dir_cos

  2. For each header in this volume, get the z coordinate by taking the dot product of the ‘ImagePositionPatient’ vector and z_dir_cos (see Working out the Z coordinates for a set of slices).

  3. Sort the headers according to this estimated z coordinate.

  4. If this volume is more than one slice, and there are any slices with the same z coordinate (as defined above), run the Possible volume resort on this volume - on the basis that it may have caught more than one volume-worth of slices. Return one or more volume’s worth of lists.

Final check

For each volume, recalculate z coordinate as above. Calculate the z gaps. Subtract the mean of the z gaps from all z gaps. If the average of the (gap-mean(gap)) is greater than 1e-4, then print a warning that there are missing DICOM files.

Possible volume resort

This step happens if there were volumes with slices having the same z coordinate in the Second pass step above. The resort is on the set of DICOM headers that were in the volume, for which there were slices with identical z coordinates. We’ll call the list of headers that the routine is still working on - work_list.

  1. If there is no ‘InstanceNumber’ field for the first header in work_list, bail out.

  2. Print a message about the ‘AcquisitionNumber’ not changing from volume to volume. This may be a relic from previous code, because this version of SPM does not use the ‘AcquisitionNumber’ field except for making filenames.

  3. Calculate the z coordinate as for Second pass, for each DICOM header.

  4. Sort the headers by ‘InstanceNumber’

  5. If any headers have the same ‘InstanceNumber’, then discard all but the first header with the same number. At this point the remaining headers in work_list will have different ‘InstanceNumber’s, but may have the same z coordinate.

  6. Now sort by z coordinate

  7. If there are N headers, make a N length vector of flags is_processed, for which all values == False

  8. Make an output list of header lists, call it hdr_vol_out, set to empty.

  9. While there are still any False elements in is_processed:

    1. Find first header for which corresponding is_processed is False - call this hdr_to_check

    2. Collect indices (in work_list) of headers which have the same z coordinate as hdr_to_check, call this list z_same_indices.

    3. Sort work_list[z_same_indices] by ‘InstanceNumber’

    4. For each index in z_same_indices such that i indexes the indices, and zsind is z_same_indices[i]: append header corresponding to zsind to hdr_vol_out[i]. This assumes that the original work_list contained two or more volumes, each with an identical set of z coordinates.

    5. Set corresponding is_processed flag to True for all z_same_indices.

  10. Finally, if the headers in work_list have ‘InstanceNumber’s that cannot be sorted to a sequence ascending in units of 1, or if any of the lists in hdr_vol_out have different lengths, emit a warning about missing DICOM files.

Writing DICOM volumes

This means - writing DICOM volumes from standard (slice by slice) DICOM datasets rather than Siemens mosaic format.

Making the affine

We need the (4,4) affine $A$ going from voxel (array) coordinates in the DICOM pixel data, to mm coordinates in the DICOM patient coordinate system.

This section tries to explain how SPM achieves this, but I don’t completely understand their method. See Getting a 3D affine from a DICOM slice or list of slices for what I believe to be a simpler explanation.

First define the constants, matrices and vectors as in DICOM affine Definitions.

$N$ is the number of slices in the volume.

Then define the following matrices:

\[ \begin{align}\begin{aligned}\begin{split}R = \left(\begin{smallmatrix}1 & a & 1 & 0\\1 & b & 0 & 1\\1 & c & 0 & 0\\1 & d & 0 & 0\end{smallmatrix}\right)\end{split}\\\begin{split}L = \left(\begin{smallmatrix}T^{1}_{{1}} & e & F_{{11}} \Delta{r} & F_{{12}} \Delta{c}\\T^{1}_{{2}} & f & F_{{21}} \Delta{r} & F_{{22}} \Delta{c}\\T^{1}_{{3}} & g & F_{{31}} \Delta{r} & F_{{32}} \Delta{c}\\1 & h & 0 & 0\end{smallmatrix}\right)\end{split}\end{aligned}\end{align} \]

For a volume with more than one slice (header), then $a=1; b=1, c=N, d=1$. $e, f, g$ are the values from $T^N$, and $h == 1$.

For a volume with only one slice (header) $a=0, b=0, c=1, d=0$ and $e, f, g, h$ are $n_1 Delta{s}, n_2 Delta{s}, n_3 Delta{s}, 0$.

The full transform appears to be $A_{spm} = R L^{-1}$.

Now, SPM, don’t forget, is working in terms of Matlab array indexing, which starts at (1,1,1) for a three dimensional array, whereas DICOM expects a (0,0,0) start (see DICOM affine formula). In this particular part of the SPM DICOM code, somewhat confusingly, the (0,0,0) to (1,1,1) indexing is dealt with in the $A$ transform, rather than the analyze_to_dicom transformation used by SPM in other places. So, the transform $A_{spm}$ goes from (1,1,1) based voxel indices to mm. To get the (0, 0, 0)-based transform we want, we need to pre-apply the transform to take 0-based voxel indices to 1-based voxel indices:

\[\begin{split}A = R L^{-1} \left(\begin{smallmatrix}1 & 0 & 0 & 1\\0 & 1 & 0 & 1\\0 & 0 & 1 & 1\\0 & 0 & 0 & 1\end{smallmatrix}\right)\end{split}\]

This formula with the definitions above result in the single and multi slice formulae in 3D affine formulae.

See derivations/spm_dicom_orient.py for the derivations and some explanations.

Writing the voxel data

Just apply scaling and offset from ‘RescaleSlope’ and ‘RescaleIntercept’ for each slice and write volume.