- Return JSON fields (as identified by GDAL) as dicts/lists in
read_dataframe; these were previously returned as strings (#556). - Drop support for GDAL 3.4 and 3.5 (#584).
- Add listing of GDAL data types and subtypes to
read_info(#556). - Add support to read list fields without arrow (#558).
- Fix decode error reading an sqlite file on windows (#568).
- Fix wrong layername when creating .gpkg.zip file (#570).
- Fix segfault on providing an invalid value for
layerinread_info(#564).
- The GDAL library included in the wheels is upgraded from 3.10.3 to 3.11.4 (#578).
- Add libkml driver to the wheels for more recent Linux platforms supported by manylinux_2_28, MacOS, and Windows (#561).
- Add libspatialite to the wheels (#546).
- Minimum required Python version is now 3.10 (#557).
- Wheels are now available for Python 3.14 (#579).
- Initial support for free-threaded Python builds, with the extension module declaring free-threaded support and wheels for Python 3.13t and 3.14t being built (#562).
- Compatibility with Shapely >= 2.1 to avoid triggering a deprecation warning at import (#542).
- Fix reading with a
skip_featureslarger than the available number of features to ensure this consistently returns an empty result for all file formats (#550).
- Capture all errors logged by gdal when opening a file fails (#495).
- Add support to read and write ".gpkg.zip" (GDAL >= 3.7), ".shp.zip", and ".shz" files (#527).
- Compatibility with the string dtype in the upcoming pandas 3.0 release (#493).
- Fix WKB writing on big-endian systems (#497).
- Fix writing fids to e.g. GPKG file with
use_arrow(#511). - Fix error in
write_dataframewhen writing an empty or all-None object column withuse_arrow(#512).
- The GDAL library included in the wheels is upgraded from 3.9.2 to 3.10.3 (#499).
- Add support to read, write, list, and remove
/vsimem/files (#457). - Raise specific error when trying to read non-UTF-8 file with
use_arrow=True(#490).
- Silence warning from
write_dataframewithGeoSeries.notna()(#435). - Enable mask & bbox filter when geometry column not read (#431).
- Raise
NotImplementedErrorwhen user attempts to write to an open file handle (#442). - Prevent seek on read from compressed inputs (#443).
- For the conda-forge package, change the dependency from
libgdaltolibgdal-core. This package is significantly smaller as it doesn't contain some large GDAL plugins. Extra plugins can be installed as seperate conda packages if needed: more info here. This also leads topyprojbecoming an optional dependency; you will need to installpyprojin order to support spatial reference systems (#452). - The GDAL library included in the wheels is updated from 3.8.5 to GDAL 3.9.2 (#466).
- pyogrio now requires a minimum version of Python >= 3.9 (#473).
- Wheels are now available for Python 3.13.
- Add
on_invalidparameter toread_dataframe(#422).
- Fixed bug transposing longitude and latitude when writing files with coordinate transformation from EPSG:4326 (#421).
- Fix bug preventing reading from file paths containing hashes in
read_dataframe(#412).
- MacOS wheels are now only available for macOS 12+. For older unsupported macOS versions, pyogrio can still be built from source (requires GDAL to be installed) (#417).
- Remove usage of deprecated
distutilsinsetup.py(#416).
- Support for writing based on Arrow as the transfer mechanism of the data
from Python to GDAL (requires GDAL >= 3.8). This is provided through the
new
pyogrio.raw.write_arrowfunction, or by using theuse_arrow=Trueoption inpyogrio.write_dataframe(#314, #346). - Add support for
fidsfilter toread_arrowandopen_arrow, and toread_dataframewithuse_arrow=True(#304). - Add some missing properties to
read_info, including layer name, geometry name and FID column name (#365). read_arrowandopen_arrownow provide GeoArrow-compliant extension metadata, including the CRS, when using GDAL 3.8 or higher (#366).- The
open_arrowfunction can now be used without apyarrowdependency. By default, it will now return a stream object implementing the Arrow PyCapsule Protocol (i.e. having an__arrow_c_stream__method). This object can then be consumed by your Arrow implementation of choice that supports this protocol. To keep the previous behaviour of returning apyarrow.RecordBatchReader, specifyuse_pyarrow=True(#349). - Warn when reading from a multilayer file without specifying a layer (#362).
- Allow writing to a new in-memory datasource using io.BytesIO object (#397).
- Fix error in
write_dataframeif input has a date column and non-consecutive index values (#325). - Fix encoding issues on windows for some formats (e.g. ".csv") and always write ESRI Shapefiles using UTF-8 by default on all platforms (#361).
- Raise exception in
read_arroworread_dataframe(..., use_arrow=True)if a boolean column is detected due to error in GDAL reading boolean values for FlatGeobuf / GPKG drivers (#335, #387); this has been fixed in GDAL >= 3.8.3. - Properly ignore fields not listed in
columnsparameter when reading from the data source not using the Arrow API (#391). - Properly handle decoding of ESRI Shapefiles with user-provided
encodingoption forread,read_dataframe, andopen_arrow, and correctly encode Shapefile field names and text values to the user-providedencodingforwriteandwrite_dataframe(#384). - Fixed bug preventing reading from bytes or file-like in
read_arrow/open_arrow(#407).
- The GDAL library included in the wheels is updated from 3.7.2 to GDAL 3.8.5.
- Using a
whereexpression combined with a list ofcolumnsthat does not include the column referenced in the expression is not recommended and will now return results based on driver-dependent behavior, which may include either returning empty results (even if non-empty results are expected fromwhereparameter) or raise an exception (#391). Previous versions of pyogrio incorrectly set ignored fields against the data source, allowing it to return non-empty results in these cases.
- Add
packagingas a dependency (#320). - Fix conversion of WKB to geometries with missing values when using
pandas.ArrowDtype(#321).
- Fix unspecified dependency on
packaging(#318).
- Support reading and writing datetimes with timezones (#253).
- Support writing dataframes without geometry column (#267).
- Calculate feature count by iterating over features if GDAL returns an
unknown count for a data layer (e.g., OSM driver); this may have signficant
performance impacts for some data sources that would otherwise return an
unknown count (count is used in
read_info,read,read_dataframe) (#271). - Add
arrow_to_pandas_kwargsparameter toread_dataframe+ reduce memory usage withuse_arrow=True(#273) - In
read_info, the result now also contains thetotal_boundsof the layer as well as some extracapabilitiesof the data source driver (#281). - Raise error if
readorread_dataframeis called with parameters to read no columns, geometry, or fids (#280). - Automatically detect supported driver by extension for all available
write drivers and addition of
detect_write_driver(#270). - Addition of
maskparameter toopen_arrow,read,read_dataframe, andread_boundsfunctions to select only the features in the dataset that intersect the mask geometry (#285). Note: GDAL < 3.8.0 returns features that intersect the bounding box of the mask when using the Arrow interface for some drivers; this has been fixed in GDAL 3.8.0. - Removed warning when no features are read from the data source (#299).
- Add support for
force_2d=Truewithuse_arrow=Trueinread_dataframe(#300).
-
test suite requires Shapely >= 2.0
-
using
skip_featuresgreater than the number of features available in a data layer now returns empty arrays forreadand an empty DataFrame forread_dataframeinstead of raising aValueError(#282). -
enabled
skip_featuresandmax_featuresforread_arrowandread_dataframe(path, use_arrow=True). Note that this incurs overhead because all features up to the next batch size abovemax_features(or size of data layer) will be read prior to slicing out the requested range of features (#282). -
The
use_arrow=Trueoption can be enabled globally for testing using thePYOGRIO_USE_ARROW=1environment variable (#296).
- Fix int32 overflow when reading int64 columns (#260)
- Fix
fid_as_index=Truedoesn't set fid as index usingread_dataframewithuse_arrow=True(#265) - Fix errors reading OSM data due to invalid feature count and incorrect reading of OSM layers beyond the first layer (#271)
- Always raise an exception if there is an error when writing a data source (#284)
- In
read_info(#281):- the
featuresproperty in the result will now be -1 if calculating the feature count is an expensive operation for this driver. You can force it to be calculated using theforce_feature_countparameter. - for boolean values in the
capabilitiesproperty, the values will now be booleans instead of 1 or 0.
- the
- The GDAL library included in the wheels is updated from 3.6.4 to GDAL 3.7.2.
- Add automatic detection of 3D geometries in
write_dataframe(#223, #229) - Add "driver" property to
read_inforesult (#224) - Add support for dataset open options to
read,read_dataframe, andread_info(#233) - Add support for pandas' nullable data types in
write_dataframe, or specifying a mask manually for missing values inwrite(#219) - Standardized 3-dimensional geometry type labels from "2.5D " to " Z" for consistency with well-known text (WKT) formats (#234)
- Failure error messages from GDAL are no longer printed to stderr (they were already translated into Python exceptions as well) (#236).
- Failure and warning error messages from GDAL are no longer printed to stderr: failures were already translated into Python exceptions and warning messages are now translated into Python warnings (#236, #242).
- Add access to low-level pyarrow
RecordBatchReaderviapyogrio.raw.open_arrow, which allows iterating over batches of Arrow tables (#205). - Add support for writing dataset and layer metadata (where supported by
driver) to
writeandwrite_dataframe, and add support for reading dataset and layer metadata inread_info(#237).
- The GDAL library included in the wheels is updated from 3.6.2 to GDAL 3.6.4.
- Wheels are now available for Linux aarch64 / arm64.
- Fix memory leak in reading files (#207)
- Fix to only use transactions for writing records when supported by the driver (#203)
- Support for reading based on Arrow as the transfer mechanism of the data
from GDAL to Python (requires GDAL >= 3.6 and
pyarrowto be installed). This can be enabled by passinguse_arrow=Truetopyogrio.read_dataframe(or by usingpyogrio.raw.read_arrowdirectly), and provides a further speed-up (#155, #191). - Support for appending to an existing data source when supported by GDAL by
passing
append=Truetopyogrio.write_dataframe(#197).
- In floating point columns, NaN values are now by default written as "null"
instead of NaN, but with an option to control this (pass
nan_as_null=Falseto keep the previous behaviour) (#190).
- It is now possible to pass GDAL's dataset creation options in addition
to layer creation options in
pyogrio.write_dataframe(#189). - When specifying a subset of
columnsto read, unnecessary IO or parsing is now avoided (#195).
- The GDAL library included in the wheels is updated from 3.4 to GDAL 3.6.2, and is now built with GEOS and sqlite with rtree support enabled (which allows writing a spatial index for GeoPackage).
- Wheels are now available for Python 3.11.
- Wheels are now available for MacOS arm64.
- new
get_gdal_data_path()utility funtion to check the path of the data directory detected by GDAL (#160)
- register GDAL drivers during initial import of pyogrio (#145)
- support writing "not a time" (NaT) values in a datetime column (#146)
- fixes an error when reading GPKG with bbox filter (#150)
- properly raises error when invalid where clause is used on a GPKG (#150)
- avoid duplicate count of available features (#151)
- use user-provided
encodingwhen reading files instead of using default encoding of data source type (#139) - always convert curve or surface geometry types to linear geometry types, such as lines or polygons (#140)
- support for reading from file-like objects and in-memory buffers (#25)
- index of GeoDataFrame created by
read_dataframecan now optionally be set to the FID of the features that are read, asint64dtype. Note that some drivers start FID numbering at 0 whereas others start numbering at 1. - generalize check for VSI files from
/vsizipto/vsi(#29) - add dtype for each field to
read_info(#30) - support writing empty GeoDataFrames (#38)
- support URI schemes (
zip://,s3://) (#43) - add keyword to promote mixed singular/multi geometry column to multi geometry type (#56)
- Python wheels built for Windows, MacOS (x86_64), and Linux (x86_64) (#49, #55, #57, #61, #63)
- automatically prefix zip files with URI scheme (#68)
- support use of a sql statement in read_dataframe (#70)
- correctly write geometry type for layer when dataset has multiple geometry types (#82)
- support reading
bool,int16,float32into correct dtypes (#83) - add
geometry_typetowrite_dataframeto set geometry type for layer (#85) - Use certifi to set
GDAL_CURL_CA_BUNDLE/PROJ_CURL_CA_BUNDLEdefaults (#97) - automatically detect driver for
.geojson,.geojsonland.geojsonsfiles (#101) - read DateTime fields with millisecond accuracy (#111)
- support writing object columns with np.nan values (#118)
- add support to write object columns that contain types different than string (#125)
- support writing datetime columns (#120)
- support for writing missing (null) geometries (#59)
readnow also returns an optional FIDs ndarray in addition to meta, geometries, and fields; this is the 2nd item in the returned tuple.
- Consolidated error handling to better use GDAL error messages and specific exception classes (#39). Note that this is a breaking change only if you are relying on specific error classes to be emitted.
- by default, writing GeoDataFrames with mixed singular and multi geometry
types will automatically promote to the multi type if the driver does not
support mixed geometry types (e.g.,
FGB, though it can write mixed geometry types ifgeometry_typeis set to"Unknown") - the geometry type of datasets with multiple geometry types will be set to
"Unknown"unless overridden usinggeometry_type. Note:"Unknown"may be ignored by some drivers (e.g., shapefile)
- use dtype
objectinstead ofnumpy.objectto eliminate deprecation warnings (#34) - raise error if layer cannot be opened (#35)
- fix passing gdal creation parameters in
write_dataframe(#62) - fix passing kwargs to GDAL in
write_dataframe(#67)
layer_geometry_typeintroduced in 0.4.0a1 was renamed togeometry_typefor consistency
People with a “+” by their names contributed a patch for the first time.
- Brendan Ward
- Joris Van den Bossche
- Martin Fleischmann
- Pieter Roggemans +
- Wei Ji Leong +
- Auto-discovery of
GDAL_VERSIONon Windows, ifgdalinfo.exeis discoverable on thePATH. - Addition of
read_boundsfunction to read the bounds of each feature. - Addition of a
fidskeyword toreadandread_dataframeto selectively read features based on a list of the FIDs.
- initial support for building on Windows.
- Windows: enabled search for GDAL dll directory for Python >= 3.8.
- Addition of
whereparameter toreadandread_dataframeto enable GDAL-compatible SQL WHERE queries to filter data sources. - Addition of
force_2dparameter toreadandread_dataframeto force coordinates to always be returned as 2 dimensional, dropping the 3rd dimension if present. - Addition of
bboxparameter toreadandread_dataframeto select only the features in the dataset that intersect the bbox. - Addition of
set_gdal_config_optionsto set GDAL configuration options andget_gdal_config_optionto get a GDAL configuration option. - Addition of
pyogrio.__gdal_version__attribute to return GDAL version tuple and__gdal_version_string__to return string version. - Addition of
list_driversfunction to list all available GDAL drivers. - Addition of read and write support for
FlatGeobufdriver when available in GDAL.
- Addition of
list_layersto list layers in a data source. - Addition of
read_infoto read basic information for a layer. - Addition of
read_dataframeto read from supported file formats (Shapefile, GeoPackage, GeoJSON) into GeoDataFrames. - Addition of
write_dataframeto write GeoDataFrames into supported file formats.