This is a python implementation of virtual disk format inspection routines gathered from various public specification documents, as well as qemu disk driver code. It attempts to store and parse the minimum amount of data required, and in a streaming-friendly manner to collect metadata about complex-format images.
Bases: object
Represents a region of a file we want to capture.
A region of a file we want to capture requires a byte offset into the file and a length. This is expected to be used by a data processing loop, calling capture() with the most recently-read chunk. This class handles the task of grabbing the desired region of data across potentially multiple fractional and unaligned reads.
offset – Byte offset into the file starting the region
length – The length of the region
Process a chunk of data.
This should be called for each chunk in the read loop, at least until complete returns True.
chunk – A chunk of bytes in the file
current_position – The position of the file processed by the read loop so far. Note that this will be the position in the file after the chunk being presented.
Returns True when we have captured the desired data.
Bases: object
A stream-based disk image inspector.
This base class works on raw images and is subclassed for more complex types. It is to be presented with the file to be examined one chunk at a time, during read processing and will only store as much data as necessary to determine required attributes of the file.
Returns the total size of the file.
This is usually smaller than virtual_size. NOTE: this will only be accurate if the entire file is read and processed.
Returns True if we have all the information needed.
Return info on amount of data held in memory for auditing.
This is a dict of region:sizeinbytes items that the inspector uses to examine the file.
Call this to present chunks of the file to the inspector.
Returns True if the file appears to be the expected format.
Read as much of a file as necessary to complete inspection.
NOTE: Because we only read as much of the file as necessary, the actual_size property will not reflect the size of the file, but the amount of data we read before we satisfied the inspector.
Raises ImageFormatError if we cannot parse the file.
Returns True if named region has been defined.
Add a new CaptureRegion by name.
Post-read hook to process what has been read so far.
This will be called after each chunk is read and potentially captured by the defined regions. If any regions are defined by this call, those regions will be presented with the current chunk in case it is within one of the new regions.
Get a CaptureRegion by name.
Perform some checks to determine if this file is safe.
Returns True if safe, False otherwise. It may raise ImageFormatError if safety cannot be guaranteed because of parsing or other errors.
Returns the virtual size of the disk image, or zero if unknown.
Bases: Exception
An unrecoverable image format error that aborts the process.
Bases: object
A file-like object that wraps another and updates a format inspector.
This passes chunks to the format inspector while reading. If the inspector fails, it logs the error and stops calling it, but continues proxying data from the source to its user.
Bases: FileInspector
Returns True if the file appears to be the expected format.
Perform some checks to determine if this file is safe.
Returns True if safe, False otherwise. It may raise ImageFormatError if safety cannot be guaranteed because of parsing or other errors.
Bases: FileInspector
QEMU QCOW2 Format
This should only require about 32 bytes of the beginning of the file to determine the virtual size, and 104 bytes to perform the safety check.
Returns True if the file appears to be the expected format.
Perform some checks to determine if this file is safe.
Returns True if safe, False otherwise. It may raise ImageFormatError if safety cannot be guaranteed because of parsing or other errors.
Returns the virtual size of the disk image, or zero if unknown.
Bases: object
A logger-like thing that swallows tracing when we do not want it.
Bases: FileInspector
VirtualBox VDI format
This only needs to store the first 512 bytes of the image.
Returns True if the file appears to be the expected format.
Returns the virtual size of the disk image, or zero if unknown.
Bases: FileInspector
Connectix/MS VPC VHD Format
This should only require about 512 bytes of the beginning of the file to determine the virtual size.
Returns True if the file appears to be the expected format.
Returns the virtual size of the disk image, or zero if unknown.
Bases: FileInspector
MS VHDX Format
This requires some complex parsing of the stream. The first 256KiB of the image is stored to get the header and region information, and then we capture the first metadata region to read those records, find the location of the virtual size data and parse it. This needs to store the metadata table entries up until the VDS record, which may consist of up to 2047 32-byte entries at max. Finally, it must store a chunk of data at the offset of the actual VDS uint64.
Returns True if the file appears to be the expected format.
Post-read hook to process what has been read so far.
This will be called after each chunk is read and potentially captured by the defined regions. If any regions are defined by this call, those regions will be presented with the current chunk in case it is within one of the new regions.
Returns the virtual size of the disk image, or zero if unknown.
Bases: FileInspector
vmware VMDK format (monolithicSparse and streamOptimized variants only)
This needs to store the 512 byte header and the descriptor region which should be just after that. The descriptor region is some variable number of 512 byte sectors, but is just text defining the layout of the disk.
Returns True if the file appears to be the expected format.
Post-read hook to process what has been read so far.
This will be called after each chunk is read and potentially captured by the defined regions. If any regions are defined by this call, those regions will be presented with the current chunk in case it is within one of the new regions.
Perform some checks to determine if this file is safe.
Returns True if safe, False otherwise. It may raise ImageFormatError if safety cannot be guaranteed because of parsing or other errors.
Returns the virtual size of the disk image, or zero if unknown.
Attempts to detect the format of a file.
This runs through a file one time, running all the known inspectors in parallel. It stops reading the file once one of them matches or all of them are sure they don’t match.
Returns the FileInspector that matched, if any. None if ‘raw’.
Returns a FormatInspector class based on the given name.
format_name – The name of the disk_format (raw, qcow2, etc).
A FormatInspector or None if unsupported.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.