Skip to content

Commit ce61b3f

Browse files
rkernjreback
authored andcommitted
ENH: Fine-grained errstate handling
closes #13109 closes #13135 The precise strategy to be taken here is open for discussion. I tried to be reasonably fine-grained rather than slap a generic decorator over everything because it's easier to go that direction than the reverse. The `errstate()` blocks in the tests were added *after* fixing all of the library code. Unfortunately, these are less fine-grained than I would like because some of the tests have many lines of the form `assert_array_equal(pandas_expression_to_test, expected_raw_numpy_expression)` where `expected_raw_numpy_expression` is what is triggering the warning. It was tedious to try to rewrite all of that to wrap just `expected_raw_numpy_expression`. I think I got everything exercised by the test suite except for parts of the test suite that are skipped on my machine due to dependencies. We'll see how things go in the CI. I haven't added any new tests yet. Could do if requested. Author: Robert Kern <[email protected]> Author: Robert Kern <[email protected]> Closes #13145 from rkern/fix/errstate and squashes the following commits: ef9c001 [Robert Kern] BUG: whoops, wrong function. 7fd2e86 [Robert Kern] ENH: More whatsnew documentation. 44805db [Robert Kern] ENH: Rearrange expression to avoid generating a warning that would need to be silenced. 1fe1bc2 [Robert Kern] pep8 bf1f662 [Robert Kern] BUG: New fixes after master rebase. e7adc03 [Robert Kern] BUG: wrong function. a59cfa7 [Robert Kern] ENH: Avoiding the bounds error is better than silencing the warning. 0e1ea81 [Robert Kern] BUG: A few more stragglers. 863ac93 [Robert Kern] TST: Add a new test to ensure that boolean comparisons are errstate-protected. 6932851 [Robert Kern] TST: Basic check that the global errstate remains unchanged. c9df7b3 [Robert Kern] BUG: removed debugging print 3b12f08 [Robert Kern] ENH: Silence numpy warnings from certain expressions computed during tests. eca512c [Robert Kern] BUG: Handle NaT explicitly. 6fbc9ce [Robert Kern] BUG: First pass at fine-grained errstate.
1 parent 51b20de commit ce61b3f

35 files changed

+449
-314
lines changed

doc/source/whatsnew/v0.19.0.txt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ This is a major release from 0.18.1 and includes a small number of API changes,
77
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
88
users upgrade to this version.
99

10+
.. warning::
11+
12+
pandas >= 0.19.0 will no longer silence numpy ufunc warnings upon import, see :ref:`here <whatsnew_0190.errstate>`. (:issue:`13109`, :issue:`13145`)
13+
1014
Highlights include:
1115

1216
- :func:`merge_asof` for asof-style time-series joining, see :ref:`here <whatsnew_0190.enhancements.asof_merge>`
@@ -357,6 +361,15 @@ Google BigQuery Enhancements
357361
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
358362
- The :func:`pandas.io.gbq.read_gbq` method has gained the ``dialect`` argument to allow users to specify whether to use BigQuery's legacy SQL or BigQuery's standard SQL. See the :ref:`docs <io.bigquery_reader>` for more details (:issue:`13615`).
359363

364+
.. _whatsnew_0190.errstate:
365+
366+
Fine-grained numpy errstate
367+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
368+
369+
Previous versions of pandas would permanently silence numpy's ufunc error handling when ``pandas`` was imported (:issue:`13109`). Pandas did this in order to silence the warnings that would arise from using numpy ufuncs on missing data, which are usually represented as NaNs. Unfortunately, this silenced legitimate warnings arising in non-pandas code in the application. Starting with 0.19.0, pandas will use the ``numpy.errstate`` context manager to silence these warnings in a more fine-grained manner only around where these operations are actually used in the pandas codebase.
370+
371+
After upgrading pandas, you may see "new" ``RuntimeWarnings`` being issued from your code. These are likely legitimate, and the underlying cause likely existed in the code when using previous versions of pandas that simply silenced the warning. Use `numpy.errstate <http://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html>`__ around the source of the ``RuntimeWarning`` to control how these conditions are handled.
372+
360373
.. _whatsnew_0190.enhancements.other:
361374

362375
Other enhancements

pandas/compat/numpy/__init__.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@
55
from distutils.version import LooseVersion
66
from pandas.compat import string_types, string_and_binary_types
77

8-
# turn off all numpy warnings
9-
np.seterr(all='ignore')
108

119
# numpy versioning
1210
_np_version = np.version.short_version

pandas/computation/align.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ def _align_core(terms):
9595
term_axis_size = len(ti.axes[axis])
9696
reindexer_size = len(reindexer)
9797

98-
ordm = np.log10(abs(reindexer_size - term_axis_size))
98+
ordm = np.log10(max(1, abs(reindexer_size - term_axis_size)))
9999
if ordm >= 1 and reindexer_size >= 10000:
100100
warnings.warn('Alignment difference on axis {0} is larger '
101101
'than an order of magnitude on term {1!r}, '

pandas/computation/expressions.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,8 @@ def _evaluate_standard(op, op_str, a, b, raise_on_error=True, **eval_kwargs):
5959
""" standard evaluation """
6060
if _TEST_MODE:
6161
_store_test_result(False)
62-
return op(a, b)
62+
with np.errstate(all='ignore'):
63+
return op(a, b)
6364

6465

6566
def _can_use_numexpr(op, op_str, a, b, dtype_check):

pandas/computation/ops.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -523,7 +523,8 @@ def __init__(self, func, args):
523523

524524
def __call__(self, env):
525525
operands = [op(env) for op in self.operands]
526-
return self.func.func(*operands)
526+
with np.errstate(all='ignore'):
527+
return self.func.func(*operands)
527528

528529
def __unicode__(self):
529530
operands = map(str, self.operands)

pandas/computation/tests/test_eval.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1613,7 +1613,8 @@ def test_unary_functions(self):
16131613
for fn in self.unary_fns:
16141614
expr = "{0}(a)".format(fn)
16151615
got = self.eval(expr)
1616-
expect = getattr(np, fn)(a)
1616+
with np.errstate(all='ignore'):
1617+
expect = getattr(np, fn)(a)
16171618
tm.assert_series_equal(got, expect, check_names=False)
16181619

16191620
def test_binary_functions(self):
@@ -1624,7 +1625,8 @@ def test_binary_functions(self):
16241625
for fn in self.binary_fns:
16251626
expr = "{0}(a, b)".format(fn)
16261627
got = self.eval(expr)
1627-
expect = getattr(np, fn)(a, b)
1628+
with np.errstate(all='ignore'):
1629+
expect = getattr(np, fn)(a, b)
16281630
tm.assert_almost_equal(got, expect, check_names=False)
16291631

16301632
def test_df_use_case(self):

pandas/core/frame.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3810,7 +3810,8 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
38103810
this = self[col].values
38113811
that = other[col].values
38123812
if filter_func is not None:
3813-
mask = ~filter_func(this) | isnull(that)
3813+
with np.errstate(all='ignore'):
3814+
mask = ~filter_func(this) | isnull(that)
38143815
else:
38153816
if raise_conflict:
38163817
mask_this = notnull(that)
@@ -4105,7 +4106,8 @@ def f(x):
41054106
return self._apply_empty_result(func, axis, reduce, *args, **kwds)
41064107

41074108
if isinstance(f, np.ufunc):
4108-
results = f(self.values)
4109+
with np.errstate(all='ignore'):
4110+
results = f(self.values)
41094111
return self._constructor(data=results, index=self.index,
41104112
columns=self.columns, copy=False)
41114113
else:
@@ -4931,7 +4933,8 @@ def f(x):
49314933
"type %s not implemented." %
49324934
filter_type)
49334935
raise_with_traceback(e)
4934-
result = f(data.values)
4936+
with np.errstate(all='ignore'):
4937+
result = f(data.values)
49354938
labels = data._get_agg_axis(axis)
49364939
else:
49374940
if numeric_only:

pandas/core/groupby.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -678,7 +678,8 @@ def apply(self, func, *args, **kwargs):
678678

679679
@wraps(func)
680680
def f(g):
681-
return func(g, *args, **kwargs)
681+
with np.errstate(all='ignore'):
682+
return func(g, *args, **kwargs)
682683
else:
683684
raise ValueError('func must be a callable if args or '
684685
'kwargs are supplied')
@@ -4126,7 +4127,10 @@ def loop(labels, shape):
41264127
out = stride * labels[0].astype('i8', subok=False, copy=False)
41274128

41284129
for i in range(1, nlev):
4129-
stride //= shape[i]
4130+
if shape[i] == 0:
4131+
stride = 0
4132+
else:
4133+
stride //= shape[i]
41304134
out += labels[i] * stride
41314135

41324136
if xnull: # exclude nulls
@@ -4365,7 +4369,9 @@ def _get_group_index_sorter(group_index, ngroups):
43654369
count = len(group_index)
43664370
alpha = 0.0 # taking complexities literally; there may be
43674371
beta = 1.0 # some room for fine-tuning these parameters
4368-
if alpha + beta * ngroups < count * np.log(count):
4372+
do_groupsort = (count > 0 and ((alpha + beta * ngroups) <
4373+
(count * np.log(count))))
4374+
if do_groupsort:
43694375
sorter, _ = _algos.groupsort_indexer(_ensure_int64(group_index),
43704376
ngroups)
43714377
return _ensure_platform_int(sorter)

pandas/core/internals.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -348,7 +348,8 @@ def apply(self, func, mgr=None, **kwargs):
348348
""" apply the function to my values; return a block if we are not
349349
one
350350
"""
351-
result = func(self.values, **kwargs)
351+
with np.errstate(all='ignore'):
352+
result = func(self.values, **kwargs)
352353
if not isinstance(result, Block):
353354
result = self.make_block(values=_block_shape(result,
354355
ndim=self.ndim))
@@ -1156,7 +1157,8 @@ def handle_error():
11561157

11571158
# get the result
11581159
try:
1159-
result = get_result(other)
1160+
with np.errstate(all='ignore'):
1161+
result = get_result(other)
11601162

11611163
# if we have an invalid shape/broadcast error
11621164
# GH4576, so raise instead of allowing to pass through

pandas/core/nanops.py

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ def _f(*args, **kwargs):
4545
'this dtype'.format(
4646
f.__name__.replace('nan', '')))
4747
try:
48-
return f(*args, **kwargs)
48+
with np.errstate(invalid='ignore'):
49+
return f(*args, **kwargs)
4950
except ValueError as e:
5051
# we want to transform an object array
5152
# ValueError message to the more typical TypeError
@@ -513,7 +514,8 @@ def nanskew(values, axis=None, skipna=True):
513514
m2 = _zero_out_fperr(m2)
514515
m3 = _zero_out_fperr(m3)
515516

516-
result = (count * (count - 1) ** 0.5 / (count - 2)) * (m3 / m2 ** 1.5)
517+
with np.errstate(invalid='ignore', divide='ignore'):
518+
result = (count * (count - 1) ** 0.5 / (count - 2)) * (m3 / m2 ** 1.5)
517519

518520
dtype = values.dtype
519521
if is_float_dtype(dtype):
@@ -562,10 +564,11 @@ def nankurt(values, axis=None, skipna=True):
562564
m2 = adjusted2.sum(axis, dtype=np.float64)
563565
m4 = adjusted4.sum(axis, dtype=np.float64)
564566

565-
adj = 3 * (count - 1) ** 2 / ((count - 2) * (count - 3))
566-
numer = count * (count + 1) * (count - 1) * m4
567-
denom = (count - 2) * (count - 3) * m2**2
568-
result = numer / denom - adj
567+
with np.errstate(invalid='ignore', divide='ignore'):
568+
adj = 3 * (count - 1) ** 2 / ((count - 2) * (count - 3))
569+
numer = count * (count + 1) * (count - 1) * m4
570+
denom = (count - 2) * (count - 3) * m2**2
571+
result = numer / denom - adj
569572

570573
# floating point error
571574
numer = _zero_out_fperr(numer)
@@ -579,7 +582,8 @@ def nankurt(values, axis=None, skipna=True):
579582
if denom == 0:
580583
return 0
581584

582-
result = numer / denom - adj
585+
with np.errstate(invalid='ignore', divide='ignore'):
586+
result = numer / denom - adj
583587

584588
dtype = values.dtype
585589
if is_float_dtype(dtype):
@@ -658,7 +662,8 @@ def _maybe_null_out(result, axis, mask):
658662

659663
def _zero_out_fperr(arg):
660664
if isinstance(arg, np.ndarray):
661-
return np.where(np.abs(arg) < 1e-14, 0, arg)
665+
with np.errstate(invalid='ignore'):
666+
return np.where(np.abs(arg) < 1e-14, 0, arg)
662667
else:
663668
return arg.dtype.type(0) if np.abs(arg) < 1e-14 else arg
664669

@@ -760,7 +765,8 @@ def f(x, y):
760765
ymask = isnull(y)
761766
mask = xmask | ymask
762767

763-
result = op(x, y)
768+
with np.errstate(all='ignore'):
769+
result = op(x, y)
764770

765771
if mask.any():
766772
if is_bool_dtype(result):

0 commit comments

Comments
 (0)