Skip to content

Commit d2baaa3

Browse files
ENH: improve support for datetime columns (#486)
Co-authored-by: Joris Van den Bossche <[email protected]>
1 parent 8291f60 commit d2baaa3

File tree

9 files changed

+806
-82
lines changed

9 files changed

+806
-82
lines changed

CHANGES.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@
1010

1111
### Improvements
1212

13+
- Add `datetime_as_string` and `mixed_offsets_as_utc` parameters to `read_dataframe`
14+
to choose the way datetime columns are returned + several fixes when reading and
15+
writing datetimes (#486).
1316
- Add listing of GDAL data types and subtypes to `read_info` (#556).
1417
- Add support to read list fields without arrow (#558, #597).
1518

@@ -183,7 +186,7 @@
183186

184187
### Improvements
185188

186-
- Support reading and writing datetimes with timezones (#253).
189+
- Support reading and writing datetimes with time zones (#253).
187190
- Support writing dataframes without geometry column (#267).
188191
- Calculate feature count by iterating over features if GDAL returns an
189192
unknown count for a data layer (e.g., OSM driver); this may have signficant

docs/source/introduction.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -481,13 +481,17 @@ Not all file formats have dedicated support to store datetime data, like ESRI
481481
Shapefile. For such formats, or if you require precision > ms, a workaround is to
482482
convert the datetimes to string.
483483

484-
Timezone information is preserved where possible, however GDAL only represents
485-
time zones as UTC offsets, whilst pandas uses IANA time zones (via `pytz` or
486-
`zoneinfo`). This means that dataframes with columns containing multiple offsets
487-
(e.g. when switching from standard time to summer time) will be written correctly,
488-
but when read via `pyogrio.read_dataframe()` will be returned as a UTC datetime
489-
column, as there is no way to reconstruct the original timezone from the individual
490-
offsets present.
484+
When you have datetime columns with time zone information, it is important to
485+
note that GDAL only represents time zones as UTC offsets, whilst pandas uses
486+
IANA time zones (via `pytz` or `zoneinfo`). As a result, even if a column in a
487+
DataFrame contains datetimes in a single time zone, this will often still result
488+
in mixed time zone offsets being written for time zones where daylight saving
489+
time is used (e.g. +01:00 and +02:00 offsets for time zone Europe/Brussels).
490+
When roundtripping through GDAL, the information about the original time zone
491+
is lost, only the offsets can be preserved. By default,
492+
{func}`pyogrio.read_dataframe()` will convert columns with mixed offsets to UTC
493+
to return a datetime64 column. If you want to preserve the original offsets,
494+
you can use `datetime_as_string=True` or `mixed_offsets_as_utc=False`.
491495

492496
## Dataset and layer creation options
493497

pyogrio/_io.pyx

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -980,10 +980,16 @@ cdef process_fields(
980980

981981
if datetime_as_string:
982982
# defer datetime parsing to user/ pandas layer
983-
# Update to OGR_F_GetFieldAsISO8601DateTime when GDAL 3.7+ only
984-
data[i] = get_string(
985-
OGR_F_GetFieldAsString(ogr_feature, field_index), encoding=encoding
986-
)
983+
IF CTE_GDAL_VERSION >= (3, 7, 0):
984+
data[i] = get_string(
985+
OGR_F_GetFieldAsISO8601DateTime(ogr_feature, field_index, NULL),
986+
encoding=encoding,
987+
)
988+
ELSE:
989+
data[i] = get_string(
990+
OGR_F_GetFieldAsString(ogr_feature, field_index),
991+
encoding=encoding,
992+
)
987993
else:
988994
success = OGR_F_GetFieldAsDateTimeEx(
989995
ogr_feature,
@@ -1602,6 +1608,7 @@ def ogr_open_arrow(
16021608
int return_fids=False,
16031609
int batch_size=0,
16041610
use_pyarrow=False,
1611+
datetime_as_string=False,
16051612
):
16061613

16071614
cdef int err = 0
@@ -1819,6 +1826,12 @@ def ogr_open_arrow(
18191826
"GEOARROW".encode("UTF-8")
18201827
)
18211828

1829+
# Read DateTime fields as strings, as the Arrow DateTime column type is
1830+
# quite limited regarding support for mixed time zones,...
1831+
IF CTE_GDAL_VERSION >= (3, 11, 0):
1832+
if datetime_as_string:
1833+
options = CSLSetNameValue(options, "DATETIME_AS_STRING", "YES")
1834+
18221835
# make sure layer is read from beginning
18231836
OGR_L_ResetReading(ogr_layer)
18241837

pyogrio/_ogr.pxd

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,14 @@ cdef extern from "ogr_api.h":
423423
OGRLayerH hLayer, ArrowArrayStream *out_stream, char** papszOptions
424424
)
425425

426+
IF CTE_GDAL_VERSION >= (3, 7, 0):
427+
428+
cdef extern from "ogr_api.h":
429+
const char* OGR_F_GetFieldAsISO8601DateTime(
430+
OGRFeatureH feature, int n, char** papszOptions
431+
)
432+
433+
426434
IF CTE_GDAL_VERSION >= (3, 8, 0):
427435

428436
cdef extern from "ogr_api.h":

0 commit comments

Comments
 (0)