GDAL
|
(GDAL >= 1.10.0)
When dealing with some file formats, particularly the drivers relying on third-party (potentially closed-source) libraries, it is difficult to ensure that those third-party libraries will be robust to hostile/corrupted datasource.
The implemented solution is to have a (private) API_PROXY driver that will expose a GDALClientDataset object, which will forward all the GDAL API calls to another process ("server"), where the real driver will be effectively run. This way, if the server aborts due to a fatal error, the calling process will be unaffected and will report a clean error instead of aborting itself.
The API_PROXY mechanism can be enabled by setting the GDAL_API_PROXY config option to YES. The option can also be set to a list of file extensions that must be the only ones to trigger this mechanism (e.g. GDAL_API_PROXY=ecw,sid).
When enabled, datasets can be handled with GDALOpen(), GDALCreate() or GDALCreateCopy() with their nominal filename (or connection string).
Alternatively, the API_PROXY mechanism can be used selectively on a datasource by prefixing its name with API_PROXY:, for example GDALOpen("API_PROXY:foo.tif", GA_ReadOnly).
For now, the server launched is the gdalserver executable on Windows. On Unix, the default behaviour is to just fork() the current process. It is also possible to launch the gdalserver executable by forcing GDAL_API_PROXY_SERVER=YES. The full filename of the gdalserver executable can also be specified in the GDAL_API_PROXY_SERVER.
It is also possible to connect to a gdalserver in TCP, possibly on a remote host. In that case, gdalserver must be launched on a host with "gdalserver -tcpserver the_tcp_port". And the client must set GDAL_API_PROXY_SERVER="hostname:the_tcp_port", where hostname is a string or a IP address.
On Unix, gdalserver can also be launched on a Unix socket, which "gdalserver -unixserver /a/filename". Clients should then set GDAL_API_PROXY_SERVER to "/a/filename".
In case of many dataset opening or creation, to avoid the cost of repeated process forking, a pool of unused connections is established. Each time a dataset is closed, the associated connection is pushed in the pool (if there's an empty bucket). When a following dataset is to be opened, one of those connections will be reused. This behaviour is controlled with the GDAL_API_PROXY_CONN_POOL config option that is set to YES by default, and will keep a maximum of 4 unused connections. GDAL_API_PROXY_CONN_POOL can be set to a integer value to specify the maximum number of unused connections.
Datasets stored in the memory virtual file system (/vsimem) or handled by the MEM driver are excluded from the API Proxy mechanism.
Additionnaly, for GDALCreate() or GDALCreateCopy(), the VRT driver is also excluded from that mechanism.
Currently, the client dataset returned is not protected by a mutex, so it is unsafe to use it concurrently from multiple threads. However, it is safe to use several client datasets from multiple threads.
Starting with GDAL 2.1 (Unix only), if the gdalserver executable is launched in TCP (or Unix socket) mode, and with the -nofork flag, clients that will open the same dataset name through the API Proxy will be associated with the same dataset object on the server, thus enabling, for example, safe "concurrent" write from several clients.
But in that mode, only one thread is used in the server, hence reducing scalability and client isolation. Furthermore some operations, like "gdal_translate api_proxy:in.tif api_proxy:out.tif" are not possible, since they would deadlock the server.