GDAL export example
This tutorial explains how to use PyEPR to generate a file in GDAL Virtual Format (VRT) that can be used to access data with the powerful and popular GDAL library. GDAL already has support for ENVISAT products but this example is interesting for two reasons:
it exploits some low level feature (like e.g. offset management) that are rarely used but that can be very useful in some cases
the generated VRT file uses raw raster access and it can be opened in update mode to modify the ENVISAT product data. This feature is not supported by the native ENVISAT driver of the GDAL library
export_gdalvrt module
The complete code of the example is available in the
examples/export_gdalvrt.py
file.
It is organized so that it can also be imported as a module in any program.
The export_gdalvrt
module provides two functions:
- export_gdalvrt.epr2gdal_band(band, vrt)
Takes in input an
epr.Band
object and a VRT dataset and add a GDAL band to the VRT dataset
- export_gdalvrt.epr2gdal(product, vrt, overwrite_existing=False)
Takes in input a PyEPR
Product
(or a filename) and the file name of the output VRT file and generates the VRT file itself containing a band for eachepr.Band
present in the originalepr.Product
and also associated metadata.
The epr2gdal()
function first creates the VRT dataset
filename = product
product = epr.open(filename)
ysize = product.get_scene_height()
xsize = product.get_scene_width()
if os.path.exists(vrt) and not overwrite_existing:
raise ValueError(f"unable to create {vrt!r}. Already exists")
driver = gdal.GetDriverByName("VRT")
if driver is None:
raise RuntimeError("unable to get driver 'VRT'")
gdal_ds = driver.Create(vrt, xsize, ysize, 0)
if gdal_ds is None:
raise RuntimeError(f"unable to create {vrt!r} dataset")
and then loops on all epr.Band
s of the PyEPR epr.Product
calling the epr2gdal_band()
function on each of them:
epr2gdal_band(band, gdal_ds)
The export_gdalvrt
module also provides a epr_to_gdal_type
mapping between EPR and GDAL data type identifiers.
Generating VRTRawRasterBands
The core of the example is the part of the code in the epr2gdal_band()
function that generates the GDAL VRTRawRasterBand.
It is a description of a raster file that the GDAL library uses for low level
data access.
Of course the entire machinery works because data in epr.Band
s and
epr.Dataset
s of ENVISAT products are stored as contiguous
rasters.
filename = os.pathsep.join(product.file_path.split("/")) # denormalize
offset = dataset.get_dsd().ds_offset + field.get_offset()
line_offset = record.tot_size
pixel_offset = epr.get_data_type_size(field.get_type())
if band.sample_model == epr.E_SMOD_1OF2:
pixel_offset *= 2
elif band.sample_model == epr.E_SMOD_2OF2:
offset += pixel_offset
pixel_offset *= 2
options = [
"subClass=VRTRawRasterBand",
f"SourceFilename={filename}",
f"ImageOffset={offset}",
f"LineOffset={line_offset}",
f"PixelOffset={pixel_offset}",
"ByteOrder=MSB",
]
gtype = epr_to_gdal_type[field.get_type()]
ret = gdal_ds.AddBand(gtype, options=options)
if ret != gdal.CE_None:
raise RuntimeError(f"unable to add VRTRawRasterBand to {vrt!r}")
The fundamental part is the computation of the:
ImageOffset:
the offset in bytes to the beginning of the first pixel of data with respect to the beginning of the file.
In the example it is computed using
the
epr.DSD.ds_offset
attribute, that represents the offset in bytes of theepr.Dataset
from the beginning of the file, andthe
epr.Field.get_offset()
method that returns the offset in bytes of theepr.Field
containingepr.Band
data from the beginning of theepr.Record
offset = dataset.get_dsd().ds_offset + field.get_offset()
LineOffset:
the offset in bytes from the beginning of one scanline of data and the next scanline of data. In the example it is set to the
epr.Record
size in bytes using theepr.Record.tot_size
attribute:line_offset = record.tot_size
PixelOffset:
the offset in bytes from the beginning of one pixel and the next on the same line. Usually it corresponds to the size in bytes of the elementary data type. It is set using the
epr.Field.get_type()
method and theepr.get_data_type_size()
function:pixel_offset = epr.get_data_type_size(field.get_type())
The band size in lines and columns of the GDAL bands is fixed at GDAL dataset level when it is created:
gdal_ds = driver.Create(vrt, xsize, ysize, 0)
if gdal_ds is None:
raise RuntimeError(f"unable to create {vrt!r} dataset")
Please note that in case of epr.Dataset
s storing complex values,
like in MDS1 epr.Dataset
of ASAR IMS epr.Product
s,
pixels of real and imaginary parts are interleaved, so to represent
epr.Band
s of the two components the pixel offset have to be
doubled and an additional offset (one pixel) must be added to the
ImageOffset of the epr.Band
representing the imaginary part:
if band.sample_model == epr.E_SMOD_1OF2:
pixel_offset *= 2
elif band.sample_model == epr.E_SMOD_2OF2:
offset += pixel_offset
pixel_offset *= 2
Note
the PyEPR API does not supports complex Band
s.
epr.Dataset
s containing complex data, like the MDS1
epr.Dataset
of ASAR IMS epr.Product
s, are associated
to two distinct epr.Band
s containing the real (I) and the
imaginary (Q) component respectively.
GDAL, instead, supports complex data types, so it is possible to map a
complex ENVISAT epr.Dataset
onto a single GDAL bands with
complex data type.
This case is not handled in the example.
Metadata
The epr2gdal_band()
function also stores a small set of metadata for
each epr.Band
:
gdal_band.SetDescription(band.description)
metadata = {
"name": band.get_name(),
"dataset_name": dataset.get_name(),
"dataset_description": dataset.description,
"lines_mirrored": str(band.lines_mirrored),
"sample_model": epr.get_sample_model_name(band.sample_model),
"scaling_factor": str(band.scaling_factor),
"scaling_offset": str(band.scaling_offset),
"scaling_method": epr.get_scaling_method_name(band.scaling_method),
"spectr_band_index": str(band.spectr_band_index),
"unit": band.unit if band.unit else "",
"bm_expr": band.bm_expr if band.bm_expr else "",
}
gdal_band.SetMetadata(metadata)
Metadata are also stored at GDAL dataset level by the epr2gdal()
function:
"id_string": product.id_string,
"meris_iodd_version": str(product.meris_iodd_version),
"dataset_names": ",".join(product.get_dataset_names()),
"num_datasets": str(product.get_num_datasets()),
"num_dsds": str(product.get_num_dsds()),
}
gdal_ds.SetMetadata(metadata)
The epr2gdal()
function also stores the contents of the MPH and the
SPH records as GDAL dataset matadata in custom domains:
metadata = str(mph).replace(" = ", "=").split("\n")
gdal_ds.SetMetadata(metadata, "MPH")
sph = product.get_sph()
metadata = str(sph).replace(" = ", "=").split("\n")
gdal_ds.SetMetadata(metadata, "SPH")
Complete listing
#!/usr/bin/env python3
import os
import epr
from osgeo import gdal
epr_to_gdal_type = {
epr.E_TID_UNKNOWN: gdal.GDT_Unknown,
epr.E_TID_UCHAR: gdal.GDT_Byte,
epr.E_TID_CHAR: gdal.GDT_Byte,
epr.E_TID_USHORT: gdal.GDT_UInt16,
epr.E_TID_SHORT: gdal.GDT_Int16,
epr.E_TID_UINT: gdal.GDT_UInt32,
epr.E_TID_INT: gdal.GDT_Int32,
epr.E_TID_FLOAT: gdal.GDT_Float32,
epr.E_TID_DOUBLE: gdal.GDT_Float64,
# epr.E_TID_STRING: gdal.GDT_Unknown,
# epr.E_TID_SPARE: gdal.GDT_Unknown,
# epr.E_TID_TIME: gdal.GDT_Unknown,
}
def epr2gdal_band(band, vrt):
product = band.product
dataset = band.dataset
record = dataset.read_record(0)
field = record.get_field_at(band._field_index - 1)
ysize = product.get_scene_height()
xsize = product.get_scene_width()
if isinstance(vrt, gdal.Dataset):
if (vrt.RasterYSize, vrt.RasterXSize) != (ysize, xsize):
raise ValueError("dataset size do not match")
gdal_ds = vrt
elif os.path.exists(vrt):
gdal_ds = gdal.Open(vrt, gdal.GA_Update)
if gdal_ds is None:
raise RuntimeError(f"unable to open {vrt!r}")
driver = gdal_ds.GetDriver()
if driver.ShortName != "VRT":
raise TypeError(f"unexpected GDAL driver ({driver.ShortName}). "
f"VRT driver expected")
else:
driver = gdal.GetDriverByName("VRT")
if driver is None:
raise RuntimeError("unable to get driver 'VRT'")
gdal_ds = driver.Create(vrt, xsize, ysize, 0)
if gdal_ds is None:
raise RuntimeError(f"unable to create {vrt!r} dataset")
filename = os.pathsep.join(product.file_path.split("/")) # denormalize
offset = dataset.get_dsd().ds_offset + field.get_offset()
line_offset = record.tot_size
pixel_offset = epr.get_data_type_size(field.get_type())
if band.sample_model == epr.E_SMOD_1OF2:
pixel_offset *= 2
elif band.sample_model == epr.E_SMOD_2OF2:
offset += pixel_offset
pixel_offset *= 2
options = [
"subClass=VRTRawRasterBand",
f"SourceFilename={filename}",
f"ImageOffset={offset}",
f"LineOffset={line_offset}",
f"PixelOffset={pixel_offset}",
"ByteOrder=MSB",
]
gtype = epr_to_gdal_type[field.get_type()]
ret = gdal_ds.AddBand(gtype, options=options)
if ret != gdal.CE_None:
raise RuntimeError(f"unable to add VRTRawRasterBand to {vrt!r}")
gdal_band = gdal_ds.GetRasterBand(gdal_ds.RasterCount)
gdal_band.SetDescription(band.description)
metadata = {
"name": band.get_name(),
"dataset_name": dataset.get_name(),
"dataset_description": dataset.description,
"lines_mirrored": str(band.lines_mirrored),
"sample_model": epr.get_sample_model_name(band.sample_model),
"scaling_factor": str(band.scaling_factor),
"scaling_offset": str(band.scaling_offset),
"scaling_method": epr.get_scaling_method_name(band.scaling_method),
"spectr_band_index": str(band.spectr_band_index),
"unit": band.unit if band.unit else "",
"bm_expr": band.bm_expr if band.bm_expr else "",
}
gdal_band.SetMetadata(metadata)
return gdal_ds
def epr2gdal(product, vrt, overwrite_existing=False):
if isinstance(product, str):
filename = product
product = epr.open(filename)
ysize = product.get_scene_height()
xsize = product.get_scene_width()
if os.path.exists(vrt) and not overwrite_existing:
raise ValueError(f"unable to create {vrt!r}. Already exists")
driver = gdal.GetDriverByName("VRT")
if driver is None:
raise RuntimeError("unable to get driver 'VRT'")
gdal_ds = driver.Create(vrt, xsize, ysize, 0)
if gdal_ds is None:
raise RuntimeError(f"unable to create {vrt!r} dataset")
metadata = {
"id_string": product.id_string,
"meris_iodd_version": str(product.meris_iodd_version),
"dataset_names": ",".join(product.get_dataset_names()),
"num_datasets": str(product.get_num_datasets()),
"num_dsds": str(product.get_num_dsds()),
}
gdal_ds.SetMetadata(metadata)
mph = product.get_mph()
metadata = str(mph).replace(" = ", "=").split("\n")
gdal_ds.SetMetadata(metadata, "MPH")
sph = product.get_sph()
metadata = str(sph).replace(" = ", "=").split("\n")
gdal_ds.SetMetadata(metadata, "SPH")
for band in product.bands():
epr2gdal_band(band, gdal_ds)
# @TODO: set geographic info
return gdal_ds
if __name__ == "__main__":
filename = "MER_LRC_2PTGMV20000620_104318_00000104X000_00000_00000_0001.N1"
vrtfilename = os.path.splitext(filename)[0] + ".vrt"
gdal_ds = epr2gdal(filename, vrtfilename)
with epr.open(filename) as product:
band_index = product.get_band_names().index("water_vapour")
band = product.get_band("water_vapour")
eprdata = band.read_as_array()
unit = band.unit
lines_mirrored = band.lines_mirrored
scaling_offset = band.scaling_offset
scaling_factor = band.scaling_factor
gdal_band = gdal_ds.GetRasterBand(band_index + 1)
vrtdata = gdal_band.ReadAsArray()
if lines_mirrored:
vrtdata = vrtdata[:, ::-1]
vrtdata = vrtdata * scaling_factor + scaling_offset
print("Max absolute error:", abs(vrtdata - eprdata).max())
# plot
from matplotlib import pyplot as plt
plt.figure()
plt.subplot(2, 1, 1)
plt.imshow(eprdata)
plt.grid(True)
cb = plt.colorbar()
cb.set_label(unit)
plt.title("EPR data")
plt.subplot(2, 1, 2)
plt.imshow(vrtdata)
plt.grid(True)
cb = plt.colorbar()
cb.set_label(unit)
plt.title("VRT data")
plt.show()