GDAL export example

This tutorial explains how to use PyEPR to generate a file in GDAL Virtual Format (VRT) that can be used to access data with the powerful and popular GDAL library. GDAL already has support for ENVISAT products but this example is interesting for two reasons:

  • it exploits some low level feature (like e.g. offset management) that are rarely used but that can be very useful in some cases

  • the generated VRT file uses raw raster access and it can be opened in update mode to modify the ENVISAT product data. This feature is not supported by the native ENVISAT driver of the GDAL library

export_gdalvrt module

The complete code of the example is available in the examples/export_gdalvrt.py file. It is organized so that it can also be imported as a module in any program.

The export_gdalvrt module provides two functions:

export_gdalvrt.epr2gdal_band(band, vrt)

Takes in input an epr.Band object and a VRT dataset and add a GDAL band to the VRT dataset

export_gdalvrt.epr2gdal(product, vrt, overwrite_existing=False)

Takes in input a PyEPR Product (or a filename) and the file name of the output VRT file and generates the VRT file itself containing a band for each epr.Band present in the original epr.Product and also associated metadata.

The epr2gdal() function first creates the VRT dataset

        filename = product
        product = epr.open(filename)

    ysize = product.get_scene_height()
    xsize = product.get_scene_width()

    if os.path.exists(vrt) and not overwrite_existing:
        raise ValueError(f"unable to create {vrt!r}. Already exists")

    driver = gdal.GetDriverByName("VRT")
    if driver is None:
        raise RuntimeError("unable to get driver 'VRT'")

    gdal_ds = driver.Create(vrt, xsize, ysize, 0)
    if gdal_ds is None:
        raise RuntimeError(f"unable to create {vrt!r} dataset")

and then loops on all epr.Bands of the PyEPR epr.Product calling the epr2gdal_band() function on each of them:

        epr2gdal_band(band, gdal_ds)

The export_gdalvrt module also provides a epr_to_gdal_type mapping between EPR and GDAL data type identifiers.

Generating VRTRawRasterBands

The core of the example is the part of the code in the epr2gdal_band() function that generates the GDAL VRTRawRasterBand. It is a description of a raster file that the GDAL library uses for low level data access. Of course the entire machinery works because data in epr.Bands and epr.Datasets of ENVISAT products are stored as contiguous rasters.

    filename = os.pathsep.join(product.file_path.split("/"))  # denormalize
    offset = dataset.get_dsd().ds_offset + field.get_offset()
    line_offset = record.tot_size
    pixel_offset = epr.get_data_type_size(field.get_type())

    if band.sample_model == epr.E_SMOD_1OF2:
        pixel_offset *= 2
    elif band.sample_model == epr.E_SMOD_2OF2:
        offset += pixel_offset
        pixel_offset *= 2

    options = [
        "subClass=VRTRawRasterBand",
        f"SourceFilename={filename}",
        f"ImageOffset={offset}",
        f"LineOffset={line_offset}",
        f"PixelOffset={pixel_offset}",
        "ByteOrder=MSB",
    ]

    gtype = epr_to_gdal_type[field.get_type()]
    ret = gdal_ds.AddBand(gtype, options=options)
    if ret != gdal.CE_None:
        raise RuntimeError(f"unable to add VRTRawRasterBand to {vrt!r}")

The fundamental part is the computation of the:

ImageOffset:

the offset in bytes to the beginning of the first pixel of data with respect to the beginning of the file.

In the example it is computed using

offset = dataset.get_dsd().ds_offset + field.get_offset()

LineOffset:

the offset in bytes from the beginning of one scanline of data and the next scanline of data. In the example it is set to the epr.Record size in bytes using the epr.Record.tot_size attribute:

line_offset = record.tot_size

PixelOffset:

the offset in bytes from the beginning of one pixel and the next on the same line. Usually it corresponds to the size in bytes of the elementary data type. It is set using the epr.Field.get_type() method and the epr.get_data_type_size() function:

pixel_offset = epr.get_data_type_size(field.get_type())

The band size in lines and columns of the GDAL bands is fixed at GDAL dataset level when it is created:

        gdal_ds = driver.Create(vrt, xsize, ysize, 0)
        if gdal_ds is None:
            raise RuntimeError(f"unable to create {vrt!r} dataset")

Please note that in case of epr.Datasets storing complex values, like in MDS1 epr.Dataset of ASAR IMS epr.Products, pixels of real and imaginary parts are interleaved, so to represent epr.Bands of the two components the pixel offset have to be doubled and an additional offset (one pixel) must be added to the ImageOffset of the epr.Band representing the imaginary part:

    if band.sample_model == epr.E_SMOD_1OF2:
        pixel_offset *= 2
    elif band.sample_model == epr.E_SMOD_2OF2:
        offset += pixel_offset
        pixel_offset *= 2

Note

the PyEPR API does not supports complex Bands. epr.Datasets containing complex data, like the MDS1 epr.Dataset of ASAR IMS epr.Products, are associated to two distinct epr.Bands containing the real (I) and the imaginary (Q) component respectively.

GDAL, instead, supports complex data types, so it is possible to map a complex ENVISAT epr.Dataset onto a single GDAL bands with complex data type.

This case is not handled in the example.

Metadata

The epr2gdal_band() function also stores a small set of metadata for each epr.Band:

    gdal_band.SetDescription(band.description)
    metadata = {
        "name": band.get_name(),
        "dataset_name": dataset.get_name(),
        "dataset_description": dataset.description,
        "lines_mirrored": str(band.lines_mirrored),
        "sample_model": epr.get_sample_model_name(band.sample_model),
        "scaling_factor": str(band.scaling_factor),
        "scaling_offset": str(band.scaling_offset),
        "scaling_method": epr.get_scaling_method_name(band.scaling_method),
        "spectr_band_index": str(band.spectr_band_index),
        "unit": band.unit if band.unit else "",
        "bm_expr": band.bm_expr if band.bm_expr else "",
    }
    gdal_band.SetMetadata(metadata)

Metadata are also stored at GDAL dataset level by the epr2gdal() function:

        "id_string": product.id_string,
        "meris_iodd_version": str(product.meris_iodd_version),
        "dataset_names": ",".join(product.get_dataset_names()),
        "num_datasets": str(product.get_num_datasets()),
        "num_dsds": str(product.get_num_dsds()),
    }
    gdal_ds.SetMetadata(metadata)

The epr2gdal() function also stores the contents of the MPH and the SPH records as GDAL dataset matadata in custom domains:

    metadata = str(mph).replace(" = ", "=").split("\n")
    gdal_ds.SetMetadata(metadata, "MPH")

    sph = product.get_sph()
    metadata = str(sph).replace(" = ", "=").split("\n")
    gdal_ds.SetMetadata(metadata, "SPH")

Complete listing

#!/usr/bin/env python3

import os
import epr
from osgeo import gdal


epr_to_gdal_type = {
    epr.E_TID_UNKNOWN: gdal.GDT_Unknown,
    epr.E_TID_UCHAR: gdal.GDT_Byte,
    epr.E_TID_CHAR: gdal.GDT_Byte,
    epr.E_TID_USHORT: gdal.GDT_UInt16,
    epr.E_TID_SHORT: gdal.GDT_Int16,
    epr.E_TID_UINT: gdal.GDT_UInt32,
    epr.E_TID_INT: gdal.GDT_Int32,
    epr.E_TID_FLOAT: gdal.GDT_Float32,
    epr.E_TID_DOUBLE: gdal.GDT_Float64,
    # epr.E_TID_STRING: gdal.GDT_Unknown,
    # epr.E_TID_SPARE: gdal.GDT_Unknown,
    # epr.E_TID_TIME: gdal.GDT_Unknown,
}


def epr2gdal_band(band, vrt):
    product = band.product
    dataset = band.dataset
    record = dataset.read_record(0)
    field = record.get_field_at(band._field_index - 1)

    ysize = product.get_scene_height()
    xsize = product.get_scene_width()

    if isinstance(vrt, gdal.Dataset):
        if (vrt.RasterYSize, vrt.RasterXSize) != (ysize, xsize):
            raise ValueError("dataset size do not match")
        gdal_ds = vrt
    elif os.path.exists(vrt):
        gdal_ds = gdal.Open(vrt, gdal.GA_Update)
        if gdal_ds is None:
            raise RuntimeError(f"unable to open {vrt!r}")
        driver = gdal_ds.GetDriver()
        if driver.ShortName != "VRT":
            raise TypeError(f"unexpected GDAL driver ({driver.ShortName}). "
                            f"VRT driver expected")
    else:
        driver = gdal.GetDriverByName("VRT")
        if driver is None:
            raise RuntimeError("unable to get driver 'VRT'")

        gdal_ds = driver.Create(vrt, xsize, ysize, 0)
        if gdal_ds is None:
            raise RuntimeError(f"unable to create {vrt!r} dataset")

    filename = os.pathsep.join(product.file_path.split("/"))  # denormalize
    offset = dataset.get_dsd().ds_offset + field.get_offset()
    line_offset = record.tot_size
    pixel_offset = epr.get_data_type_size(field.get_type())

    if band.sample_model == epr.E_SMOD_1OF2:
        pixel_offset *= 2
    elif band.sample_model == epr.E_SMOD_2OF2:
        offset += pixel_offset
        pixel_offset *= 2

    options = [
        "subClass=VRTRawRasterBand",
        f"SourceFilename={filename}",
        f"ImageOffset={offset}",
        f"LineOffset={line_offset}",
        f"PixelOffset={pixel_offset}",
        "ByteOrder=MSB",
    ]

    gtype = epr_to_gdal_type[field.get_type()]
    ret = gdal_ds.AddBand(gtype, options=options)
    if ret != gdal.CE_None:
        raise RuntimeError(f"unable to add VRTRawRasterBand to {vrt!r}")

    gdal_band = gdal_ds.GetRasterBand(gdal_ds.RasterCount)
    gdal_band.SetDescription(band.description)
    metadata = {
        "name": band.get_name(),
        "dataset_name": dataset.get_name(),
        "dataset_description": dataset.description,
        "lines_mirrored": str(band.lines_mirrored),
        "sample_model": epr.get_sample_model_name(band.sample_model),
        "scaling_factor": str(band.scaling_factor),
        "scaling_offset": str(band.scaling_offset),
        "scaling_method": epr.get_scaling_method_name(band.scaling_method),
        "spectr_band_index": str(band.spectr_band_index),
        "unit": band.unit if band.unit else "",
        "bm_expr": band.bm_expr if band.bm_expr else "",
    }
    gdal_band.SetMetadata(metadata)

    return gdal_ds


def epr2gdal(product, vrt, overwrite_existing=False):
    if isinstance(product, str):
        filename = product
        product = epr.open(filename)

    ysize = product.get_scene_height()
    xsize = product.get_scene_width()

    if os.path.exists(vrt) and not overwrite_existing:
        raise ValueError(f"unable to create {vrt!r}. Already exists")

    driver = gdal.GetDriverByName("VRT")
    if driver is None:
        raise RuntimeError("unable to get driver 'VRT'")

    gdal_ds = driver.Create(vrt, xsize, ysize, 0)
    if gdal_ds is None:
        raise RuntimeError(f"unable to create {vrt!r} dataset")

    metadata = {
        "id_string": product.id_string,
        "meris_iodd_version": str(product.meris_iodd_version),
        "dataset_names": ",".join(product.get_dataset_names()),
        "num_datasets": str(product.get_num_datasets()),
        "num_dsds": str(product.get_num_dsds()),
    }
    gdal_ds.SetMetadata(metadata)

    mph = product.get_mph()
    metadata = str(mph).replace(" = ", "=").split("\n")
    gdal_ds.SetMetadata(metadata, "MPH")

    sph = product.get_sph()
    metadata = str(sph).replace(" = ", "=").split("\n")
    gdal_ds.SetMetadata(metadata, "SPH")

    for band in product.bands():
        epr2gdal_band(band, gdal_ds)

    # @TODO: set geographic info

    return gdal_ds


if __name__ == "__main__":
    filename = "MER_LRC_2PTGMV20000620_104318_00000104X000_00000_00000_0001.N1"
    vrtfilename = os.path.splitext(filename)[0] + ".vrt"

    gdal_ds = epr2gdal(filename, vrtfilename)

    with epr.open(filename) as product:
        band_index = product.get_band_names().index("water_vapour")
        band = product.get_band("water_vapour")
        eprdata = band.read_as_array()
        unit = band.unit
        lines_mirrored = band.lines_mirrored
        scaling_offset = band.scaling_offset
        scaling_factor = band.scaling_factor

    gdal_band = gdal_ds.GetRasterBand(band_index + 1)
    vrtdata = gdal_band.ReadAsArray()

    if lines_mirrored:
        vrtdata = vrtdata[:, ::-1]

    vrtdata = vrtdata * scaling_factor + scaling_offset

    print("Max absolute error:", abs(vrtdata - eprdata).max())

    # plot
    from matplotlib import pyplot as plt

    plt.figure()
    plt.subplot(2, 1, 1)
    plt.imshow(eprdata)
    plt.grid(True)
    cb = plt.colorbar()
    cb.set_label(unit)
    plt.title("EPR data")
    plt.subplot(2, 1, 2)
    plt.imshow(vrtdata)
    plt.grid(True)
    cb = plt.colorbar()
    cb.set_label(unit)
    plt.title("VRT data")
    plt.show()