You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support reading SPSS .sav files and add an option to remove compressed files after extraction. Changes: bump package version to 1.0.2, update CHANGELOG, import pyreadstat and implement _read_sav, add delete_zip_after parameter (default False) to Extractor with docs and attribute, register '.sav' handler, and add logic to track and delete zip/compressed files after extraction (with logging on success/failure). Updated tests to enable delete_zip_after=True.
Copy file name to clipboardExpand all lines: src/socio4health/extractor.py
+38-4Lines changed: 38 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,7 @@
15
15
importos
16
16
importpandasaspd
17
17
importgeopandasasgpd
18
+
importpyreadstat
18
19
importdask.dataframeasdd
19
20
fromtqdmimporttqdm
20
21
importglob
@@ -68,7 +69,7 @@ class Extractor:
68
69
colspecs : list
69
70
Column specifications for fixed-width files, defining the widths of each column. Required if ``is_fwf`` is ``True``.
70
71
sep : str
71
-
The separator to use when reading ``CSV`` files. Defaults to ``','``.
72
+
The separator to use when reading ``CSV`` files. Defaults to ``,``.
72
73
ddtype : Union[str, Dict]
73
74
The data type to use when reading files. Can be a single type or a dictionary mapping column names to types. Defaults to ``object``.
74
75
dtype : Union[str, Dict]
@@ -79,6 +80,8 @@ class Extractor:
79
80
The name or index of the Excel sheet to read. Can also be a list to read multiple sheets or ``None`` to read all sheets. Defaults to the first sheet (``0``).
80
81
geodriver : str
81
82
The driver to use for reading geospatial files with ``geopandas.read_file()`` (e.g., ``'ESRI Shapefile'``, ``'KML'``, etc.). Optional.
83
+
delete_zip_after : bool
84
+
If True, delete zip/compressed files after extraction. Defaults to False.
0 commit comments