ESRI Shapefile / DBF

All varieties of ESRI Shapefiles should be available for reading, and simple 3D files can be created. The driver can also handle standalone DBF files without associated .shp files.

Normally the OGR Shapefile driver treats a whole directory of shapefiles as a dataset, and a single shapefile within that directory as a layer. In this case the directory name should be used as the dataset name. However, it is also possible to use one of the files (.shp, .shx or .dbf) in a shapefile set as the dataset name, and then it will be treated as a dataset with one layer.

Note that when reading a Shapefile of type SHPT_ARC, the corresponding layer will be reported as of type wkbLineString, but depending on the number of parts of each geometry, the actual type of the geometry for each feature can be either OGRLineString or OGRMultiLineString. The same applies for SHPT_POLYGON shapefiles, reported as layers of type wkbPolygon, but depending on the number of parts of each geometry, the actual type can be either OGRPolygon or OGRMultiPolygon.

Starting with GDAL 2.1 measures (M coordinate) are supported. A Shapefile with measures is created if the specified geometry type is measured or an appropriate layer creation option is used. When a shapefile which may have measured geometries is opened, the first shape is examined and if it uses measures, the geometry type of the layer is set accordingly. This behaviour can be changed with the ADJUST_GEOM_TYPE open option.

MultiPatch files are read and each patch geometry is turned into a multi-polygon representation with one polygon per triangle in triangle fans and meshes.

If a .prj files in old Arc/Info style or new ESRI OGC WKT style is present, it will be read and used to associate a projection with features. Starting with GDAL 2.3, a match will be attempted with the EPSG databases to identify the SRS of the .prj with an entry in the catalog.

The read driver assumes that multipart polygons follow the specification, that is to say the vertices of outer rings should be oriented clockwise on the X/Y plane, and those of inner rings counterclockwise. If a Shapefile is broken w.r.t. that rule, it is possible to define the configuration option OGR_ORGANIZE_POLYGONS=DEFAULT to proceed to a full analysis based on topological relationships of the parts of the polygons so that the resulting polygons are correctly defined in the OGC Simple Feature convention.

An attempt is made to read the code page setting in the .cpg file, or as a fallback in the LDID/codepage setting from the .dbf file, and use it to translate string fields to UTF-8 on read, and back when writing. LDID "87 / 0x57" is treated as ISO-8859-1 which may not be appropriate. The SHAPE_ENCODING configuration option may be used to override the encoding interpretation of the shapefile with any encoding supported by CPLRecode or to "" to avoid any recoding. (Recoding support is new for GDAL/OGR 1.9.0)

Open options

Starting with GDAL 2.0, the following open options are available.

Spatial and Attribute Indexing

The OGR Shapefile driver supports spatial indexing and a limited form of attribute indexing.

The spatial indexing uses the same .qix quadtree spatial index files that are used by UMN MapServer. Spatial indexing can accelerate spatially filtered passes through large datasets to pick out a small area quite dramatically.

Starting with OGR 1.10, it can also use the ESRI spatial index files (.sbn / .sbx), but writing them is not supported currently.

To create a spatial index (in .qix format), issue a SQL command of the form

CREATE SPATIAL INDEX ON tablename [DEPTH N]

where optional DEPTH specifier can be used to control number of index tree levels generated. If DEPTH is omitted, tree depth is estimated on basis of number of features in a shapefile and its value ranges from 1 to 12.

To delete a spatial index issue a command of the form

DROP SPATIAL INDEX ON tablename

Otherwise, the MapServer shptree utility can be used:

shptree <shpfile> [<depth>] [<index_format>]

More information is available about this utility at the MapServer shptree page

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

Creation Issues

The Shapefile driver treats a directory as a dataset, and each Shapefile set (.shp, .shx, and .dbf) as a layer. The dataset name will be treated as a directory name. If the directory already exists it is used and existing files in the directory are ignored. If the directory does not exist it will be created.

As a special case attempts to create a new dataset with the extension .shp will result in a single file set being created instead of a directory.

ESRI shapefiles can only store one kind of geometry per layer (shapefile). On creation this is may be set based on the source file (if a uniform geometry type is known from the source driver), or it may be set directly by the user with the layer creation option SHPT (shown below). If not set the layer creation will fail. If geometries of incompatible types are written to the layer, the output will be terminated with an error.

Note that this can make it very difficult to translate a mixed geometry layer from another format into Shapefile format using ogr2ogr, since ogr2ogr has no support for separating out geometries from a source layer. See the FAQ for a solution.

Shapefile feature attributes are stored in an associated .dbf file, and so attributes suffer a number of limitations:

Also, .dbf files are required to have at least one field. If none are created by the application an "FID" field will be automatically created and populated with the record number.

The OGR shapefile driver supports rewriting existing shapes in a shapefile as well as deleting shapes. Deleted shapes are marked for deletion in the .dbf file, and then ignored by OGR. To actually remove them permanently (resulting in renumbering of FIDs) invoke the SQL 'REPACK <tablename>' via the datasource ExecuteSQL() method.

Starting with GDAL 2.0, REPACK will also result in .shp being rewritten if a feature geometry has been modified with SetFeature() and resulted in a change of the size the binary encoding of the geometry in the .shp file.

Starting with GDAL 2.2, REPACK is also done automatically at file closing, or at FlushCache()/SyncToDisk() time, since shapefiles with holes can cause interoperability issues with other software.

Field sizes

Starting with GDAL/OGR 1.10, the driver knows to auto-extend string and integer fields (up to the 255 bytes limit imposed by the DBF format) to dynamically accommodate for the length of the data to be inserted.

It is also possible to force a resize of the fields to the optimal width by issuing a SQL 'RESIZE <tablename>' via the datasource ExecuteSQL() method. This is convenient in situations where the default column width (80 characters for a string field) is bigger than necessary.

Spatial extent

Shapefiles store the layer spatial extent in the .SHP file. The layer spatial extent is automatically updated when inserting a new feature in a shapefile. However when updating an existing feature, if its previous shape was touching the bounding box of the layer extent but the updated shape does not touch the new extent, the computed extent will not be correct. It is then necessary to force a recomputation by invoking the SQL 'RECOMPUTE EXTENT ON <tablename>' via the datasource ExecuteSQL() method. The same applies for the deletion of a shape.

Note: RECOMPUTE EXTENT ON is available in OGR >= 1.9.0.

Size Issues

Geometry: The Shapefile format explicitly uses 32bit offsets and so cannot go over 8GB (it actually uses 32bit offsets to 16bit words), but the OGR shapefile implementation has a limitation to 4GB.

Attributes: The dbf format does not have any offsets in it, so it can be arbitrarily large.

However, for compatibility with other software implementation, it is not recommended to use a file size over 2GB for both .SHP and .DBF files.

Starting with OGR 1.11, the 2GB_LIMIT=YES layer creation option can be used to strictly enforce that limit. For update mode, the SHAPE_2GB_LIMIT configuration option can be set to YES for similar effect. If nothing is set, a warning will be emitted when the 2GB limit is reached.

Dataset Creation Options

None

Layer Creation Options

VSI Virtual File System API support

The driver supports reading from files managed by VSI Virtual File System API, which include "regular" files, as well as files in the /vsizip/, /vsigzip/ , /vsicurl/ domains.

Examples

Advanced topics

(GDAL >= 2.0) The SHAPE_REWIND_ON_WRITE configuration option/environment variable can be set to NO to prevent the shapefile writer to correct the winding order of exterior/interior rings to be conformant with the one mandated by the Shapefile specification. This can be useful in some situations where a MultiPolygon passed to the shapefile writer is not really a compliant Single Feature polygon, but originates from example from a MultiPatch object (from a Shapefile/FileGDB/PGeo datasource).

(GDAL >= 2.1) The SHAPE_RESTORE_SHX configuration option/environment variable can be set to YES (default NO) to restore broken or absent .shx file from associated .shp file during opening.

See Also