Skip to content

Commit 478af3b

Browse files
committed
Support for reading SAS xport files
1 parent 5a4d60f commit 478af3b

File tree

12 files changed

+19420
-1
lines changed

12 files changed

+19420
-1
lines changed

doc/source/api.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,15 @@ HDFStore: PyTables (HDF5)
8282
HDFStore.get
8383
HDFStore.select
8484

85+
SAS
86+
~~~
87+
88+
.. autosummary::
89+
:toctree: generated/
90+
91+
read_sas
92+
XportReader
93+
8594
SQL
8695
~~~
8796

doc/source/io.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ object.
4141
* :ref:`read_html<io.read_html>`
4242
* :ref:`read_gbq<io.bigquery>` (experimental)
4343
* :ref:`read_stata<io.stata_reader>`
44+
* :ref:`read_sas<io.sas_reader>`
4445
* :ref:`read_clipboard<io.clipboard>`
4546
* :ref:`read_pickle<io.pickle>`
4647

@@ -4120,6 +4121,46 @@ easy conversion to and from pandas.
41204121

41214122
.. _xray: http://xray.readthedocs.org/
41224123

4124+
.. _io.sas:
4125+
4126+
SAS Format
4127+
----------
4128+
4129+
.. versionadded:: 0.17.0
4130+
4131+
The top-level function :function:`read_sas` currently can read (but
4132+
not write) SAS xport (.XPT) format files. Pandas cannot currently
4133+
handle SAS7BDAT files.
4134+
4135+
XPORT files only contain two value types: ASCII text and double
4136+
precision numeric values. There is no automatic type conversion to
4137+
integers, dates, or categoricals. By default the whole file is read
4138+
and returned as a ``DataFrame``.
4139+
4140+
Specify a ``chunksize`` or use ``iterator=True`` to obtain an
4141+
``XportReader`` object for incrementally reading the file. The
4142+
``XportReader`` object also has attributes that contain additional
4143+
information about the file and its variables.
4144+
4145+
Read a SAS XPORT file:
4146+
4147+
.. code-block:: python
4148+
4149+
df = pd.read_sas('sas_xport.xpt')
4150+
4151+
Obtain an iterator and read an XPORT file 100,000 lines at a time:
4152+
4153+
.. code-block:: python
4154+
4155+
rdr = pd.read_sas('sas_xport.xpt', chunk=100000)
4156+
for chunk in rdr:
4157+
do_something(chunk)
4158+
4159+
The specification_ for the xport file format is available from the SAS
4160+
web site.
4161+
4162+
.. _specification: https://support.sas.com/techsup/technote/ts140.pdf
4163+
41234164
.. _io.perf:
41244165

41254166
Performance Considerations

doc/source/whatsnew/v0.17.0.txt

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Highlights include:
2020
if they are all ``NaN``, see :ref:`here <whatsnew_0170.api_breaking.hdf_dropna>`
2121
- Support for ``Series.dt.strftime`` to generate formatted strings for datetime-likes, see :ref:`here <whatsnew_0170.strftime>`
2222
- Development installed versions of pandas will now have ``PEP440`` compliant version strings (:issue:`9518`)
23+
- Support for reading SAS xport files, see :meth:`~pandas.io.read_sas`.
2324

2425
Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsnew_0170.deprecations>` before updating.
2526

@@ -37,7 +38,6 @@ New features
3738
- Enable writing complex values to HDF stores when using table format (:issue:`10447`)
3839
- Enable reading gzip compressed files via URL, either by explicitly setting the compression parameter or by inferring from the presence of the HTTP Content-Encoding header in the response (:issue:`8685`)
3940

40-
4141
.. _whatsnew_0170.gil:
4242

4343
Releasing the GIL
@@ -94,6 +94,13 @@ Other enhancements
9494

9595
- Enable `read_hdf` to be used without specifying a key when the HDF file contains a single dataset (:issue:`10443`)
9696

97+
- :meth:`~pandas.io.read_sas` provides support for reading SAS XPORT format files:
98+
99+
df = pd.read_sas('sas_xport.xpt')
100+
101+
It is also possible to obtain an iterator and read an XPORT file
102+
incrementally.
103+
97104
- ``DatetimeIndex`` can be instantiated using strings contains ``NaT`` (:issue:`7599`)
98105
- The string parsing of ``to_datetime``, ``Timestamp`` and ``DatetimeIndex`` has been made consistent. (:issue:`7599`)
99106

pandas/io/api.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from pandas.io.json import read_json
1010
from pandas.io.html import read_html
1111
from pandas.io.sql import read_sql, read_sql_table, read_sql_query
12+
from pandas.io.sas import read_sas
1213
from pandas.io.stata import read_stata
1314
from pandas.io.pickle import read_pickle, to_pickle
1415
from pandas.io.packers import read_msgpack, to_msgpack

0 commit comments

Comments
 (0)