SPM DICOM conversion¶
These are some notes on the algorithms that SPM uses to convert from DICOM to nifti. There are other notes in Siemens mosaic format.
The relevant SPM files are spm_dicom_headers.m
,
spm_dicom_dict.mat
and spm_dicom_convert.m
. These notes refer
the version in SPM8, as of around January 2010.
spm_dicom_dict.mat
¶
This is obviously a Matlab .mat
file. It contains variables
group
and element
, and values
, where values
is a struct
array, one element per (group, element) pair, with fields name
and
vr
(the last a cell array).
spm_dicom_headers.m
¶
Reads the given DICOM files into a struct. It looks like this was written by John Ahsburner (JA). Relevant fixes are:
File opening¶
When opening the DICOM file, SPM (subfunction readdicomfile
)
opens as little endian
reads 4 characters starting at pos 128
checks if these are
DICM
; if so then continues file read; otherwise, tests to see if this is what SPM calls truncated DICOM file format - lacking 128 byte lead in andDICM
string:Seeks to beginning of file
Reads two unsigned short values into
group
andtag
If the (
group
,element
) pair exist inspm_dicom_dict.mat
, then set file pointer to 0 and continue read withread_dicom
subfunction..If
group
== 8 andelement
== 0, this is apparently the signature for a ‘GE Twin+excite’ for which JA notes there is no documentation; set file pointer to 0 and continue read withread_dicom
subfunction.Otherwise - crash out with error saying that this is not DICOM file.
tag read for Philips Integra¶
The read_dicom
subfunction reads a tag, then has a loop during which
the tag is processed (by setting values into the return structure). At
the end of the loop, it reads the next tag. The loop breaks when the
current tag is empty, or is the item delimitation tag (group=FFFE,
element=E00D).
After it has broken out of the loop, if the last tag was (FFFE, E00D) (item delimitation tag), and the tag length was not 0, then SPM sets the file pointer back by 4 bytes from the current position. JA comments that he didn’t find that in the standard, but that it seemed to be needed for the Philips Integra.
Tag length¶
Tag lengths as read in read_tag
subfunction. If current format is
explicit (as in ‘explicit little endian’):
For VR of x00x00, then group, element must be (FFFE, E00D) (item delimitation tag). JA comments that GE ‘ImageDelimitationItem’ has no VR, just 4 0 bytes. In this case the tag length is zero, and we read another two bytes ahead.
There’s a check for not-even tag length. If not even:
4294967295 appears to be OK - and decoded as Inf for tag length.
13 appears to mean 10 and is reset to be 10
Any other odd number is not valid and gives a tag length of 0
SQ
VR type (Sequence of items type)¶
tag length of 13 set to tag length 10.
spm_dicom_convert.m
¶
Written by John Ashburner and Jesper Andersson.
File categorization¶
SPM makes a special case of Siemens ‘spectroscopy images’. These are images that have ‘SOPClassUID’ == ‘1.3.12.2.1107.5.9.1’ and the private tag of (29, 1210); for these it pulls out the affine, and writes a volume of ones corresponding to the acquisition planes.
For images that are not spectroscopy:
Discards images that do not have any of (‘MR’, ‘PT’, ‘CT’) in ‘Modality’ field.
Discards images lacking any of ‘StartOfPixelData’, ‘SamplesperPixel’, ‘Rows’, ‘Columns’, ‘BitsAllocated’, ‘BitsStored’, ‘HighBit’, ‘PixelRespresentation’
Discards images lacking any of ‘PixelSpacing’, ‘ImagePositionPatient’, ‘ImageOrientationPatient’ - presumably on the basis that SPM cannot reconstruct the affine.
Fields ‘SeriesNumber’, ‘AcquisitionNumber’ and ‘InstanceNumber’ are set to 1 if absent.
Next SPM distinguishes between Siemens mosaic format and standard DICOM.
Mosaic images are those with the Siemens private tag:
(0029, 1009) [CSA Image Header Version] OB: '20100114'
and a readable CSA header (see Siemens mosaic format), and with non-empty fields from that header of ‘AcquisitionMatrixText’, ‘NumberOfImagesInMosaic’, and with non-zero ‘NumberOfImagesInMosaic’. The rest are standard DICOM.
For converting mosaic format, see Siemens mosaic format. The rest of this page refers to standard (slice by slice) DICOMs.
Sorting files into volumes¶
First pass¶
Take first header, put as start of first volume. For each subsequent header:
Get
ICE_Dims
if present. Look for Siemens ‘CSAImageHeaderInfo’, check it has a ‘name’ field, then pull dimensions out of ‘ICE_Dims’ field in form of 9 integers separated by ‘_’, where ‘X’ in this string replaced by ‘-1’ - giving ‘ICE1’
Then, for each currently identified volume:
If we have ICE1 above, and we do have ‘CSAIMageHeaderInfo’, with a ‘name’, in the first header in this volume, then extract ICE dims in the same way as above, for the first header in this volume, and check whether all but ICE1[6:8] are the same as ICE2. Set flag that all ICE dims are identical for this volume. Set this flag to True if we did not have ICE1 or CSA information.
Match the current header to the current volume iff the following match:
SeriesNumber
Rows
Columns
ImageOrientationPatient (to tolerance of sum squared difference 1e-4)
PixelSpacing (to tolerance of sum squared difference 1e-4)
ICE dims as defined above
ImageType (iff imagetype exists in both)
SequenceName (iff sequencename exists in both)
SeriesInstanceUID (iff exists in both)
EchoNumbers (iff exists in both)
If the current header matches the current volume, insert it there, otherwise make a new volume for this header
Second pass¶
We now have a list of volumes, where each volume is a list of headers that may match.
For each volume:
Estimate the z direction cosine by (effectively) finding the cross product of the x and y direction cosines contained in ‘ImageOrientationPatient’ - call this
z_dir_cos
For each header in this volume, get the z coordinate by taking the dot product of the ‘ImagePositionPatient’ vector and
z_dir_cos
(see Working out the Z coordinates for a set of slices).Sort the headers according to this estimated z coordinate.
If this volume is more than one slice, and there are any slices with the same z coordinate (as defined above), run the Possible volume resort on this volume - on the basis that it may have caught more than one volume-worth of slices. Return one or more volume’s worth of lists.
Final check¶
For each volume, recalculate z coordinate as above. Calculate the z gaps. Subtract the mean of the z gaps from all z gaps. If the average of the (gap-mean(gap)) is greater than 1e-4, then print a warning that there are missing DICOM files.
Possible volume resort¶
This step happens if there were volumes with slices having the same z
coordinate in the Second pass step above. The resort is on the
set of DICOM headers that were in the volume, for which there were
slices with identical z coordinates. We’ll call the list of headers
that the routine is still working on - work_list
.
If there is no ‘InstanceNumber’ field for the first header in
work_list
, bail out.Print a message about the ‘AcquisitionNumber’ not changing from volume to volume. This may be a relic from previous code, because this version of SPM does not use the ‘AcquisitionNumber’ field except for making filenames.
Calculate the z coordinate as for Second pass, for each DICOM header.
Sort the headers by ‘InstanceNumber’
If any headers have the same ‘InstanceNumber’, then discard all but the first header with the same number. At this point the remaining headers in
work_list
will have different ‘InstanceNumber’s, but may have the same z coordinate.Now sort by z coordinate
If there are
N
headers, make aN
length vector of flagsis_processed
, for which all values == FalseMake an output list of header lists, call it
hdr_vol_out
, set to empty.While there are still any False elements in
is_processed
:Find first header for which corresponding
is_processed
is False - call thishdr_to_check
Collect indices (in
work_list
) of headers which have the same z coordinate ashdr_to_check
, call this listz_same_indices
.Sort
work_list[z_same_indices]
by ‘InstanceNumber’For each index in
z_same_indices
such thati
indexes the indices, andzsind
isz_same_indices[i]
: append header corresponding tozsind
tohdr_vol_out[i]
. This assumes that the originalwork_list
contained two or more volumes, each with an identical set of z coordinates.Set corresponding
is_processed
flag to True for allz_same_indices
.
Finally, if the headers in
work_list
have ‘InstanceNumber’s that cannot be sorted to a sequence ascending in units of 1, or if any of the lists inhdr_vol_out
have different lengths, emit a warning about missing DICOM files.
Writing DICOM volumes¶
This means - writing DICOM volumes from standard (slice by slice) DICOM datasets rather than Siemens mosaic format.
Making the affine¶
We need the (4,4) affine $A$ going from voxel (array) coordinates in the DICOM pixel data, to mm coordinates in the DICOM patient coordinate system.
This section tries to explain how SPM achieves this, but I don’t completely understand their method. See Getting a 3D affine from a DICOM slice or list of slices for what I believe to be a simpler explanation.
First define the constants, matrices and vectors as in DICOM affine Definitions.
$N$ is the number of slices in the volume.
Then define the following matrices:
For a volume with more than one slice (header), then $a=1; b=1, c=N, d=1$. $e, f, g$ are the values from $T^N$, and $h == 1$.
For a volume with only one slice (header) $a=0, b=0, c=1, d=0$ and $e, f, g, h$ are $n_1 Delta{s}, n_2 Delta{s}, n_3 Delta{s}, 0$.
The full transform appears to be $A_{spm} = R L^{-1}$.
Now, SPM, don’t forget, is working in terms of Matlab array indexing,
which starts at (1,1,1) for a three dimensional array, whereas DICOM
expects a (0,0,0) start (see DICOM affine formula). In this
particular part of the SPM DICOM code, somewhat confusingly, the (0,0,0)
to (1,1,1) indexing is dealt with in the $A$ transform, rather than the
analyze_to_dicom
transformation used by SPM in other places. So, the
transform $A_{spm}$ goes from (1,1,1) based voxel indices to mm. To
get the (0, 0, 0)-based transform we want, we need to pre-apply the
transform to take 0-based voxel indices to 1-based voxel indices:
This formula with the definitions above result in the single and multi slice formulae in 3D affine formulae.
See derivations/spm_dicom_orient.py
for the derivations and
some explanations.
Writing the voxel data¶
Just apply scaling and offset from ‘RescaleSlope’ and ‘RescaleIntercept’ for each slice and write volume.