Skip to content

Conversation

bashtage
Copy link
Contributor

Add explicit error checking for out-of-range doubles when writing Stata files

closes #14618

DOUBLE_MAX = struct.unpack('<d', b'\x00\x00\x00\x00\x00\x00\xe0\x7f')[0]
for col in data:
if data[col].dtype == np.double:
value = data[col].max()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pandas.types.common.is_floating_dtype (or float_dtype) forgot

@bashtage
Copy link
Contributor Author

@jreback That doesn't work since it is True for np.float32.

@bashtage bashtage force-pushed the stata-max-value-check branch from b6f6432 to db89413 Compare November 10, 2016 23:48
@bashtage
Copy link
Contributor Author

Changed the approach and also added check for float32 column range with upcast if needed.

@jorisvandenbossche jorisvandenbossche added Bug IO Stata read_stata, to_stata labels Nov 11, 2016
@bashtage bashtage force-pushed the stata-max-value-check branch 2 times, most recently from 90d65fe to af41353 Compare November 14, 2016 11:23
@codecov-io
Copy link

codecov-io commented Nov 14, 2016

Current coverage is 85.28% (diff: 100%)

Merging #14637 into master will increase coverage by <.01%

@@             master     #14637   diff @@
==========================================
  Files           140        140          
  Lines         50693      50706    +13   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43235      43247    +12   
- Misses         7458       7459     +1   
  Partials          0          0          

Powered by Codecov. Last update 726efc7...55a98f5

Bug Fixes
~~~~~~~~~

- Explicit check in ``to_stata`` and ````StataWriter `` for out-of-range values when writing doubles (:issue:`14618`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.19.2; too many quotes on StataWriter


def test_out_of_range_double(self):
# GH 14618
df = DataFrame({'ColumnOk': [0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u throw some infs (and -infs) in here as well (unless that screws up the test)

@bashtage
Copy link
Contributor Author

Need to check behavior of infs/-inf, as well as NaN in Stata. It might support these values.

@bashtage bashtage force-pushed the stata-max-value-check branch from af41353 to f057d03 Compare November 15, 2016 12:31

Bug Fixes
~~~~~~~~~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move to 0.19.2

Add explicit error checking for out-of-range doubles when writing Stata files
Upcasts float32 to float64 if out-of-range values encountered
Tests for infinite values and raises if found

closes pandas-dev#14618
@bashtage bashtage force-pushed the stata-max-value-check branch from f057d03 to 55a98f5 Compare November 17, 2016 11:24
@jreback
Copy link
Contributor

jreback commented Nov 17, 2016

so infs are not allowed in state at all?

@jreback jreback added this to the 0.19.2 milestone Nov 17, 2016
@bashtage
Copy link
Contributor Author

+inf is a missing value, appears in Stata the same as NaN (denoted with a .). Basically Stata always uses the largest representable numbers as missing values, and everything above the upper cutoff for a double is a missing value. I think users who with to express a missing value in Stata should be forced to use NaN which exports fine.

-inf is allowed

@jreback
Copy link
Contributor

jreback commented Nov 17, 2016

ok, that's fine then.

@jreback jreback closed this in fe555db Nov 17, 2016
@jreback
Copy link
Contributor

jreback commented Nov 17, 2016

thanks!

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Dec 14, 2016
Add explicit error checking for out-of-range doubles when writing Stata files
Upcasts float32 to float64 if out-of-range values encountered
Tests for infinite values and raises if found

closes pandas-dev#14618
closes pandas-dev#14637

(cherry picked from commit fe555db)
@bashtage bashtage deleted the stata-max-value-check branch January 24, 2017 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug IO Stata read_stata, to_stata

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_stata + read_stata results in NaNs (close to double precision limit)

4 participants