Skip to content

Commit b2bc0a0

Browse files
committed
Merge pull request #9711 from kshedden/sas_xport
SAS xport file reader
2 parents a2ac432 + 4694a42 commit b2bc0a0

File tree

12 files changed

+19438
-1
lines changed

12 files changed

+19438
-1
lines changed

doc/source/api.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,15 @@ HDFStore: PyTables (HDF5)
8282
HDFStore.get
8383
HDFStore.select
8484

85+
SAS
86+
~~~
87+
88+
.. autosummary::
89+
:toctree: generated/
90+
91+
read_sas
92+
XportReader
93+
8594
SQL
8695
~~~
8796

doc/source/io.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ object.
4141
* :ref:`read_html<io.read_html>`
4242
* :ref:`read_gbq<io.bigquery>` (experimental)
4343
* :ref:`read_stata<io.stata_reader>`
44+
* :ref:`read_sas<io.sas_reader>`
4445
* :ref:`read_clipboard<io.clipboard>`
4546
* :ref:`read_pickle<io.pickle>`
4647

@@ -4120,6 +4121,46 @@ easy conversion to and from pandas.
41204121

41214122
.. _xray: http://xray.readthedocs.org/
41224123

4124+
.. _io.sas:
4125+
4126+
SAS Format
4127+
----------
4128+
4129+
.. versionadded:: 0.17.0
4130+
4131+
The top-level function :function:`read_sas` currently can read (but
4132+
not write) SAS xport (.XPT) format files. Pandas cannot currently
4133+
handle SAS7BDAT files.
4134+
4135+
XPORT files only contain two value types: ASCII text and double
4136+
precision numeric values. There is no automatic type conversion to
4137+
integers, dates, or categoricals. By default the whole file is read
4138+
and returned as a ``DataFrame``.
4139+
4140+
Specify a ``chunksize`` or use ``iterator=True`` to obtain an
4141+
``XportReader`` object for incrementally reading the file. The
4142+
``XportReader`` object also has attributes that contain additional
4143+
information about the file and its variables.
4144+
4145+
Read a SAS XPORT file:
4146+
4147+
.. code-block:: python
4148+
4149+
df = pd.read_sas('sas_xport.xpt')
4150+
4151+
Obtain an iterator and read an XPORT file 100,000 lines at a time:
4152+
4153+
.. code-block:: python
4154+
4155+
rdr = pd.read_sas('sas_xport.xpt', chunk=100000)
4156+
for chunk in rdr:
4157+
do_something(chunk)
4158+
4159+
The specification_ for the xport file format is available from the SAS
4160+
web site.
4161+
4162+
.. _specification: https://support.sas.com/techsup/technote/ts140.pdf
4163+
41234164
.. _io.perf:
41244165

41254166
Performance Considerations

doc/source/whatsnew/v0.17.0.txt

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Highlights include:
2020
if they are all ``NaN``, see :ref:`here <whatsnew_0170.api_breaking.hdf_dropna>`
2121
- Support for ``Series.dt.strftime`` to generate formatted strings for datetime-likes, see :ref:`here <whatsnew_0170.strftime>`
2222
- Development installed versions of pandas will now have ``PEP440`` compliant version strings (:issue:`9518`)
23+
- Support for reading SAS xport files, see :ref:`here <whatsnew_0170.enhancements.sas_xport>`
2324

2425
Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsnew_0170.deprecations>` before updating.
2526

@@ -37,7 +38,6 @@ New features
3738
- Enable writing complex values to HDF stores when using table format (:issue:`10447`)
3839
- Enable reading gzip compressed files via URL, either by explicitly setting the compression parameter or by inferring from the presence of the HTTP Content-Encoding header in the response (:issue:`8685`)
3940

40-
4141
.. _whatsnew_0170.gil:
4242

4343
Releasing the GIL
@@ -85,6 +85,18 @@ We are now supporting a ``Series.dt.strftime`` method for datetime-likes to gene
8585

8686
The string format is as the python standard library and details can be found `here <https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior>`_
8787

88+
.. _whatsnew_0170.enhancements.sas_xport:
89+
90+
Support for SAS XPORT files
91+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
92+
93+
:meth:`~pandas.io.read_sas` provides support for reading SAS XPORT format files:
94+
95+
df = pd.read_sas('sas_xport.xpt')
96+
97+
It is also possible to obtain an iterator and read an XPORT file
98+
incrementally.
99+
88100
.. _whatsnew_0170.enhancements.other:
89101

90102
Other enhancements

pandas/io/api.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from pandas.io.json import read_json
1010
from pandas.io.html import read_html
1111
from pandas.io.sql import read_sql, read_sql_table, read_sql_query
12+
from pandas.io.sas import read_sas
1213
from pandas.io.stata import read_stata
1314
from pandas.io.pickle import read_pickle, to_pickle
1415
from pandas.io.packers import read_msgpack, to_msgpack

0 commit comments

Comments
 (0)