Skip to content

Conversation

@rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Adds an experimental option to return Python scalars instead of NumPy scalars across the API. This is not yet fully implemented everywhere, e.g. Series.__getitem__, but I'm hoping reductions are a substantial chunk.

This is complicated by #62988 where it was found that many of our doctests are not running. We run those doctests using NumPy>=2, and if we were to get those doctests to pass as-is, we would need to change the NumPy reprs from e.g. 2 to np.int64(2). If we then change reductions et al to returning Python scalars, we'd then change all the reprs back from e.g. np.int64(2) to 2. So instead I think we can:

  • Merge this experimental option, not yet advertising it to users.
  • Merge (after some work) DOC: Run all doctests #62988 where we run doctests with the experimental option enabled. This would reduce churn in the documentation.
  • Finish work on this option, expose to users in pandas 3.x and start deprecation process for changing the default.
  • Change default of future.python_scalars to True in 4.0, deprecate the future option.

@rhshadrach
Copy link
Member Author

@jbrockmendel
Copy link
Member

Perf impact?

possible xref #13468, #23106, #29738, #20791, #21256

@rhshadrach
Copy link
Member Author

Perf impact?

Plan to run a full set of ASVs next week, some microbenchmarks

from pandas.core.dtypes.cast import maybe_unbox_numpy_scalar

with pd.option_context("python_scalars", True):
    %timeit maybe_unbox_numpy_scalar(np.int64(2))
    # 828 ns ± 9.91 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
    %timeit maybe_unbox_numpy_scalar(2)
    # 161 ns ± 0.414 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

ser = pd.Series([1, 2, 3] * 10_000)
with pd.option_context("python_scalars", True):
    %timeit ser.sum()
    # 9.42 μs ± 423 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
with pd.option_context("python_scalars", False):
    %timeit ser.sum()
    # 8.28 μs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

@rhshadrach
Copy link
Member Author

rhshadrach commented Nov 13, 2025

Full ASVs are below, only showing where there was a 10% of more regression. In the full list, only the following two actually hit the function maybe_unbox_numpy_scalar.

| Change   | Before [c3bace88] <main>   | After [0896a2f6] <enh_python_scalars>   |   Ratio | Benchmark (Parameter)                                                                                        |
|----------|----------------------------|-----------------------------------------|---------|--------------------------------------------------------------------------------------------------------------|
| +        | 4.20±0.01μs                | 4.77±0.3μs                              |    1.14 | series_methods.NanOps.time_func('max', 1000, 'float64')                                                      |
| +        | 4.19±0.01μs                | 4.85±0.4μs                              |    1.16 | series_methods.NanOps.time_func('max', 1000, 'int32')                                                        |
Full list
| Change   | Before [c3bace88] <main>   | After [0896a2f6] <enh_python_scalars>   |   Ratio | Benchmark (Parameter)                                                                                        |
|----------|----------------------------|-----------------------------------------|---------|--------------------------------------------------------------------------------------------------------------|
| +        | 14.0±0.09μs                | 15.7±0.9μs                              |    1.12 | arithmetic.CategoricalComparisons.time_categorical_op('__ge__')                                              |  
| +        | 440±10μs                   | 580±20μs                                |    1.32 | arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function add>) |  
| +        | 751±6μs                    | 839±10μs                                |    1.12 | arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<BusinessDay>)                                          |  
| +        | 290±10μs                   | 325±2μs                                 |    1.12 | categoricals.Concat.time_concat                                                                              |  
| +        | 13.3±0.08ms                | 16.1±0.6ms                              |    1.21 | frame_methods.Fillna.time_fillna(True, 'Float64')                                                            |  
| +        | 182±3μs                    | 211±1μs                                 |    1.16 | frame_methods.MemoryUsage.time_memory_usage                                                                  |
| +        | 797±6μs                    | 906±2μs                                 |    1.14 | frame_methods.NSort.time_nlargest_one_column('last')                                                         |  
| +        | 989±8μs                    | 1.11±0.01ms                             |    1.12 | frame_methods.NSort.time_nsmallest_one_column('all')                                                         |  
| +        | 991±2μs                    | 1.10±0ms                                |    1.11 | frame_methods.NSort.time_nsmallest_one_column('first')                                                       |  
| +        | 1.25±0ms                   | 1.38±0ms                                |    1.1  | frame_methods.NSort.time_nsmallest_two_columns('all')                                                        |  
| +        | 1.26±0.01ms                | 1.39±0ms                                |    1.1  | frame_methods.NSort.time_nsmallest_two_columns('first')                                                      |  
| +        | 11.6±0.2ms                 | 15.1±0.2ms                              |    1.3  | groupby.TransformEngine.time_series_cython(False)                                                            |
| +        | 11.7±0.4ms                 | 14.9±0.09ms                             |    1.28 | groupby.TransformEngine.time_series_cython(True)                                                             |
| +        | 691±50ns                   | 812±50ns                                |    1.18 | index_cached_properties.IndexCache.time_inferred_type('TimedeltaIndex')                                      |
| +        | 34.1±0.6ms                 | 39.1±0.1ms                              |    1.15 | io.csv.ReadCSVCategorical.time_convert_post('c')                                                             |
| +        | 164±1μs                    | 232±1μs                                 |    1.41 | join_merge.Concat.time_concat_mixed_ndims(1)                                                                 |
| +        | 4.20±0.01μs                | 4.77±0.3μs                              |    1.14 | series_methods.NanOps.time_func('max', 1000, 'float64')                                                      |
| +        | 4.19±0.01μs                | 4.85±0.4μs                              |    1.16 | series_methods.NanOps.time_func('max', 1000, 'int32')                                                        |
| +        | 2.16±0.02ms                | 4.35±2ms                                |    2.01 | stat_ops.FrameOps.time_op('kurt', 'float', None)                                                             |
| +        | 2.25±0.02ms                | 5.28±2ms                                |    2.34 | stat_ops.FrameOps.time_op('skew', 'float', None)                                                             |

I was curious why only min/max showed up as being regressions in series_methods.NanOps. Here is the full output.

series_methods.NanOps
|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000, 'Int64')                                                                                                                 |
|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000, 'boolean')                                                                                                               |
|          | 4.47±0.03μs                | 4.51±0.05μs                             | 1.01    | series_methods.NanOps.time_func('argmax', 1000, 'float64')                                                                                                               |
|          | 1.09±0μs                   | 1.11±0.01μs                             | 1.02    | series_methods.NanOps.time_func('argmax', 1000, 'int32')                                                                                                                 |
|          | 1.15±0μs                   | 1.15±0μs                                | 1.00    | series_methods.NanOps.time_func('argmax', 1000, 'int64')                                                                                                                 |
|          | 1.12±0μs                   | 1.12±0μs                                | 1.00    | series_methods.NanOps.time_func('argmax', 1000, 'int8')                                                                                                                  |
|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000000, 'Int64')                                                                                                              |
|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000000, 'boolean')                                                                                                            |
|          | 243±1μs                    | 244±4μs                                 | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'float64')                                                                                                            |
|          | 59.0±0.2μs                 | 59.5±0.2μs                              | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'int32')                                                                                                              |
|          | 116±0.3μs                  | 117±0.4μs                               | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'int64')                                                                                                              |
|          | 17.7±0.06μs                | 17.9±0.06μs                             | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'int8')                                                                                                               |
|          | 27.9±0.2μs                 | 29.1±0.2μs                              | 1.04    | series_methods.NanOps.time_func('kurt', 1000, 'Int64')                                                                                                                   |
|          | 27.9±0.06μs                | 29.2±0.4μs                              | 1.05    | series_methods.NanOps.time_func('kurt', 1000, 'boolean')                                                                                                                 |
|          | 27.8±0.3μs                 | 28.3±0.2μs                              | 1.02    | series_methods.NanOps.time_func('kurt', 1000, 'float64')                                                                                                                 |
|          | 26.4±0.2μs                 | 27.0±0.2μs                              | 1.02    | series_methods.NanOps.time_func('kurt', 1000, 'int32')                                                                                                                   |
|          | 27.6±1μs                   | 27.6±0.3μs                              | 1.00    | series_methods.NanOps.time_func('kurt', 1000, 'int64')                                                                                                                   |
|          | 26.5±0.08μs                | 27.1±0.2μs                              | 1.02    | series_methods.NanOps.time_func('kurt', 1000, 'int8')                                                                                                                    |
|          | 7.60±0.2ms                 | 7.49±0.4ms                              | 0.99    | series_methods.NanOps.time_func('kurt', 1000000, 'Int64')                                                                                                                |
|          | 7.75±0.09ms                | 7.42±0.08ms                             | 0.96    | series_methods.NanOps.time_func('kurt', 1000000, 'boolean')                                                                                                              |
|          | 8.24±0.3ms                 | 7.60±0.3ms                              | 0.92    | series_methods.NanOps.time_func('kurt', 1000000, 'float64')                                                                                                              |
|          | 7.07±0.2ms                 | 6.25±0.3ms                              | ~0.88   | series_methods.NanOps.time_func('kurt', 1000000, 'int32')                                                                                                                |
|          | 6.62±0.1ms                 | 6.30±0.08ms                             | 0.95    | series_methods.NanOps.time_func('kurt', 1000000, 'int64')                                                                                                                |
|          | 6.48±0.06ms                | 6.22±0.1ms                              | 0.96    | series_methods.NanOps.time_func('kurt', 1000000, 'int8')                                                                                                                 |
|          | 5.86±0.01μs                | 5.94±0.1μs                              | 1.01    | series_methods.NanOps.time_func('max', 1000, 'Int64')                                                                                                                    |
|          | 5.17±0.05μs                | 5.34±0.07μs                             | 1.03    | series_methods.NanOps.time_func('max', 1000, 'boolean')                                                                                                                  |
| +        | 4.20±0.01μs                | 4.77±0.3μs                              | 1.14    | series_methods.NanOps.time_func('max', 1000, 'float64')                                                                                                                  |
| +        | 4.19±0.01μs                | 4.85±0.4μs                              | 1.16    | series_methods.NanOps.time_func('max', 1000, 'int32')                                                                                                                    |
|          | 4.21±0.02μs                | 4.40±0.02μs                             | 1.05    | series_methods.NanOps.time_func('max', 1000, 'int64')                                                                                                                    |
|          | 9.03±0.02μs                | 9.32±0.06μs                             | 1.03    | series_methods.NanOps.time_func('max', 1000, 'int8')                                                                                                                     |
|          | 513±20μs                   | 512±20μs                                | 1.00    | series_methods.NanOps.time_func('max', 1000000, 'Int64')                                                                                                                 |
|          | 295±3μs                    | 305±9μs                                 | 1.04    | series_methods.NanOps.time_func('max', 1000000, 'boolean')                                                                                                               |
|          | 419±0.8μs                  | 422±2μs                                 | 1.01    | series_methods.NanOps.time_func('max', 1000000, 'float64')                                                                                                               |
|          | 417±1μs                    | 418±1μs                                 | 1.00    | series_methods.NanOps.time_func('max', 1000000, 'int32')                                                                                                                 |
|          | 419±1μs                    | 417±0.8μs                               | 1.00    | series_methods.NanOps.time_func('max', 1000000, 'int64')                                                                                                                 |
|          | 33.3±0.2μs                 | 33.6±0.2μs                              | 1.01    | series_methods.NanOps.time_func('max', 1000000, 'int8')                                                                                                                  |
|          | 14.4±0.1μs                 | 14.5±0.05μs                             | 1.01    | series_methods.NanOps.time_func('mean', 1000, 'Int64')                                                                                                                   |
|          | 14.2±0.1μs                 | 14.5±0.04μs                             | 1.02    | series_methods.NanOps.time_func('mean', 1000, 'boolean')                                                                                                                 |
|          | 8.39±0.06μs                | 8.79±0.05μs                             | 1.05    | series_methods.NanOps.time_func('mean', 1000, 'float64')                                                                                                                 |
|          | 7.99±0.09μs                | 8.35±0.05μs                             | 1.05    | series_methods.NanOps.time_func('mean', 1000, 'int32')                                                                                                                   |
|          | 8.23±0.04μs                | 8.49±0.03μs                             | 1.03    | series_methods.NanOps.time_func('mean', 1000, 'int64')                                                                                                                   |
|          | 7.83±0.03μs                | 8.29±0.1μs                              | 1.06    | series_methods.NanOps.time_func('mean', 1000, 'int8')                                                                                                                    |
|          | 822±2μs                    | 820±1μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'Int64')                                                                                                                |
|          | 741±3μs                    | 745±2μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'boolean')                                                                                                              |
|          | 464±1μs                    | 469±3μs                                 | 1.01    | series_methods.NanOps.time_func('mean', 1000000, 'float64')                                                                                                              |
|          | 255±0.3μs                  | 256±0.7μs                               | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'int32')                                                                                                                |
|          | 357±0.6μs                  | 359±1μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'int64')                                                                                                                |
|          | 253±0.6μs                  | 253±1μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'int8')                                                                                                                 |
|          | 37.8±0.8μs                 | 39.8±2μs                                | 1.05    | series_methods.NanOps.time_func('median', 1000, 'Int64')                                                                                                                 |
|          | 37.3±0.3μs                 | 37.2±0.3μs                              | 1.00    | series_methods.NanOps.time_func('median', 1000, 'boolean')                                                                                                               |
|          | 5.98±0.01μs                | 6.11±0.04μs                             | 1.02    | series_methods.NanOps.time_func('median', 1000, 'float64')                                                                                                               |
|          | 5.60±0.03μs                | 5.64±0.02μs                             | 1.01    | series_methods.NanOps.time_func('median', 1000, 'int32')                                                                                                                 |
|          | 5.41±0.02μs                | 5.56±0.01μs                             | 1.03    | series_methods.NanOps.time_func('median', 1000, 'int64')                                                                                                                 |
|          | 21.0±0.09μs                | 21.0±0.3μs                              | 1.00    | series_methods.NanOps.time_func('median', 1000, 'int8')                                                                                                                  |
|          | 3.85±0.6ms                 | 4.37±0.7ms                              | ~1.14   | series_methods.NanOps.time_func('median', 1000000, 'Int64')                                                                                                              |
|          | 4.84±0.3ms                 | 5.03±0.3ms                              | 1.04    | series_methods.NanOps.time_func('median', 1000000, 'boolean')                                                                                                            |
|          | 2.08±0ms                   | 2.08±0.01ms                             | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'float64')                                                                                                            |
|          | 1.65±0ms                   | 1.65±0ms                                | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'int32')                                                                                                              |
|          | 1.46±0ms                   | 1.46±0ms                                | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'int64')                                                                                                              |
|          | 831±3μs                    | 829±1μs                                 | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'int8')                                                                                                               |
|          | 6.05±0.2μs                 | 5.87±0.06μs                             | 0.97    | series_methods.NanOps.time_func('min', 1000, 'Int64')                                                                                                                    |
|          | 5.14±0.02μs                | 5.34±0.01μs                             | 1.04    | series_methods.NanOps.time_func('min', 1000, 'boolean')                                                                                                                  |
|          | 4.22±0.02μs                | 4.41±0.02μs                             | 1.04    | series_methods.NanOps.time_func('min', 1000, 'float64')                                                                                                                  |
|          | 4.16±0.03μs                | 4.40±0.01μs                             | 1.06    | series_methods.NanOps.time_func('min', 1000, 'int32')                                                                                                                    |
|          | 4.21±0.01μs                | 4.37±0.02μs                             | 1.04    | series_methods.NanOps.time_func('min', 1000, 'int64')                                                                                                                    |
|          | 9.05±0.07μs                | 9.21±0.05μs                             | 1.02    | series_methods.NanOps.time_func('min', 1000, 'int8')                                                                                                                     |
|          | 530±20μs                   | 499±10μs                                | 0.94    | series_methods.NanOps.time_func('min', 1000000, 'Int64')                                                                                                                 |
|          | 330±10μs                   | 307±9μs                                 | 0.93    | series_methods.NanOps.time_func('min', 1000000, 'boolean')                                                                                                               |
|          | 421±1μs                    | 420±1μs                                 | 1.00    | series_methods.NanOps.time_func('min', 1000000, 'float64')                                                                                                               |
|          | 419±1μs                    | 417±0.6μs                               | 0.99    | series_methods.NanOps.time_func('min', 1000000, 'int32')                                                                                                                 |
|          | 421±0.6μs                  | 418±0.6μs                               | 0.99    | series_methods.NanOps.time_func('min', 1000000, 'int64')                                                                                                                 |
|          | 33.9±0.2μs                 | 34.3±0.2μs                              | 1.01    | series_methods.NanOps.time_func('min', 1000000, 'int8')                                                                                                                  |
|          | 6.15±0.03μs                | 6.37±0.02μs                             | 1.03    | series_methods.NanOps.time_func('prod', 1000, 'Int64')                                                                                                                   |
|          | 6.45±0.01μs                | 6.84±0.2μs                              | 1.06    | series_methods.NanOps.time_func('prod', 1000, 'boolean')                                                                                                                 |
|          | 6.46±0.03μs                | 6.65±0.03μs                             | 1.03    | series_methods.NanOps.time_func('prod', 1000, 'float64')                                                                                                                 |
|          | 4.78±0.02μs                | 4.94±0.03μs                             | 1.03    | series_methods.NanOps.time_func('prod', 1000, 'int32')                                                                                                                   |
|          | 4.36±0.01μs                | 4.60±0.05μs                             | 1.05    | series_methods.NanOps.time_func('prod', 1000, 'int64')                                                                                                                   |
|          | 4.73±0.03μs                | 4.94±0.08μs                             | 1.04    | series_methods.NanOps.time_func('prod', 1000, 'int8')                                                                                                                    |
|          | 857±2μs                    | 861±4μs                                 | 1.00    | series_methods.NanOps.time_func('prod', 1000000, 'Int64')                                                                                                                |
|          | 967±3μs                    | 963±2μs                                 | 1.00    | series_methods.NanOps.time_func('prod', 1000000, 'boolean')                                                                                                              |
|          | 883±10μs                   | 877±5μs                                 | 0.99    | series_methods.NanOps.time_func('prod', 1000000, 'float64')                                                                                                              |
|          | 732±3μs                    | 725±0.9μs                               | 0.99    | series_methods.NanOps.time_func('prod', 1000000, 'int32')                                                                                                                |
|          | 618±1μs                    | 621±0.9μs                               | 1.00    | series_methods.NanOps.time_func('prod', 1000000, 'int64')                                                                                                                |
|          | 722±2μs                    | 727±1μs                                 | 1.01    | series_methods.NanOps.time_func('prod', 1000000, 'int8')                                                                                                                 |
|          | 33.7±0.3μs                 | 34.7±0.4μs                              | 1.03    | series_methods.NanOps.time_func('sem', 1000, 'Int64')                                                                                                                    |
|          | 32.6±0.2μs                 | 33.4±0.2μs                              | 1.02    | series_methods.NanOps.time_func('sem', 1000, 'boolean')                                                                                                                  |
|          | 26.2±0.2μs                 | 27.3±0.1μs                              | 1.04    | series_methods.NanOps.time_func('sem', 1000, 'float64')                                                                                                                  |
|          | 18.9±0.08μs                | 19.5±0.07μs                             | 1.03    | series_methods.NanOps.time_func('sem', 1000, 'int32')                                                                                                                    |
|          | 19.2±0.02μs                | 19.3±0.05μs                             | 1.00    | series_methods.NanOps.time_func('sem', 1000, 'int64')                                                                                                                    |
|          | 44.4±0.2μs                 | 45.5±0.2μs                              | 1.02    | series_methods.NanOps.time_func('sem', 1000, 'int8')                                                                                                                     |
|          | 4.76±0.8ms                 | 5.19±0.3ms                              | 1.09    | series_methods.NanOps.time_func('sem', 1000000, 'Int64')                                                                                                                 |
|          | 5.13±0.06ms                | 5.46±0.05ms                             | 1.07    | series_methods.NanOps.time_func('sem', 1000000, 'boolean')                                                                                                               |
|          | 3.40±0.6ms                 | 3.36±0.6ms                              | 0.99    | series_methods.NanOps.time_func('sem', 1000000, 'float64')                                                                                                               |
|          | 2.62±0.01ms                | 2.61±0.01ms                             | 1.00    | series_methods.NanOps.time_func('sem', 1000000, 'int32')                                                                                                                 |
|          | 2.69±0.01ms                | 2.68±0ms                                | 1.00    | series_methods.NanOps.time_func('sem', 1000000, 'int64')                                                                                                                 |
|          | 2.19±0ms                   | 2.18±0ms                                | 1.00    | series_methods.NanOps.time_func('sem', 1000000, 'int8')                                                                                                                  |
|          | 28.9±0.3μs                 | 29.4±0.4μs                              | 1.02    | series_methods.NanOps.time_func('skew', 1000, 'Int64')                                                                                                                   |
|          | 28.5±0.2μs                 | 30.1±0.5μs                              | 1.06    | series_methods.NanOps.time_func('skew', 1000, 'boolean')                                                                                                                 |
|          | 28.9±0.09μs                | 30.1±0.3μs                              | 1.04    | series_methods.NanOps.time_func('skew', 1000, 'float64')                                                                                                                 |
|          | 27.3±0.1μs                 | 28.4±0.4μs                              | 1.04    | series_methods.NanOps.time_func('skew', 1000, 'int32')                                                                                                                   |
|          | 27.5±0.1μs                 | 28.3±0.3μs                              | 1.03    | series_methods.NanOps.time_func('skew', 1000, 'int64')                                                                                                                   |
|          | 27.5±0.2μs                 | 27.7±0.3μs                              | 1.01    | series_methods.NanOps.time_func('skew', 1000, 'int8')                                                                                                                    |
|          | 7.62±0.3ms                 | 7.36±0.3ms                              | 0.97    | series_methods.NanOps.time_func('skew', 1000000, 'Int64')                                                                                                                |
|          | 7.91±0.06ms                | 7.43±0.09ms                             | 0.94    | series_methods.NanOps.time_func('skew', 1000000, 'boolean')                                                                                                              |
|          | 8.39±0.3ms                 | 7.90±0.3ms                              | 0.94    | series_methods.NanOps.time_func('skew', 1000000, 'float64')                                                                                                              |
| -        | 7.25±0.1ms                 | 6.43±0.3ms                              | 0.89    | series_methods.NanOps.time_func('skew', 1000000, 'int32')                                                                                                                |
|          | 6.68±0.1ms                 | 6.49±0.2ms                              | 0.97    | series_methods.NanOps.time_func('skew', 1000000, 'int64')                                                                                                                |
|          | 6.78±0.06ms                | 6.47±0.07ms                             | 0.95    | series_methods.NanOps.time_func('skew', 1000000, 'int8')                                                                                                                 |
|          | 33.1±0.1μs                 | 33.4±0.3μs                              | 1.01    | series_methods.NanOps.time_func('std', 1000, 'Int64')                                                                                                                    |
|          | 34.1±0.2μs                 | 36.0±0.7μs                              | 1.05    | series_methods.NanOps.time_func('std', 1000, 'boolean')                                                                                                                  |
|          | 5.31±0.04μs                | 5.48±0.02μs                             | 1.03    | series_methods.NanOps.time_func('std', 1000, 'float64')                                                                                                                  |
|          | 5.29±0.02μs                | 5.49±0.03μs                             | 1.04    | series_methods.NanOps.time_func('std', 1000, 'int32')                                                                                                                    |
|          | 5.33±0.09μs                | 5.50±0.01μs                             | 1.03    | series_methods.NanOps.time_func('std', 1000, 'int64')                                                                                                                    |
|          | 26.0±0.2μs                 | 25.9±0.2μs                              | 1.00    | series_methods.NanOps.time_func('std', 1000, 'int8')                                                                                                                     |
|          | 1.73±0.08ms                | 1.74±0.04ms                             | 1.00    | series_methods.NanOps.time_func('std', 1000000, 'Int64')                                                                                                                 |
|          | 2.31±0.6ms                 | 2.42±0.8ms                              | 1.05    | series_methods.NanOps.time_func('std', 1000000, 'boolean')                                                                                                               |
|          | 1.24±0.01ms                | 1.23±0.01ms                             | 0.99    | series_methods.NanOps.time_func('std', 1000000, 'float64')                                                                                                               |
|          | 1.23±0.01ms                | 1.23±0ms                                | 1.00    | series_methods.NanOps.time_func('std', 1000000, 'int32')                                                                                                                 |
|          | 1.24±0.01ms                | 1.23±0ms                                | 0.99    | series_methods.NanOps.time_func('std', 1000000, 'int64')                                                                                                                 |
|          | 817±2μs                    | 815±1μs                                 | 1.00    | series_methods.NanOps.time_func('std', 1000000, 'int8')                                                                                                                  |
|          | 5.73±0.02μs                | 5.84±0.09μs                             | 1.02    | series_methods.NanOps.time_func('sum', 1000, 'Int64')                                                                                                                    |
|          | 6.06±0.02μs                | 6.16±0.04μs                             | 1.02    | series_methods.NanOps.time_func('sum', 1000, 'boolean')                                                                                                                  |
|          | 7.25±0.04μs                | 7.56±0.03μs                             | 1.04    | series_methods.NanOps.time_func('sum', 1000, 'float64')                                                                                                                  |
|          | 5.12±0.02μs                | 5.33±0.02μs                             | 1.04    | series_methods.NanOps.time_func('sum', 1000, 'int32')                                                                                                                    |
|          | 4.84±0.05μs                | 4.99±0.02μs                             | 1.03    | series_methods.NanOps.time_func('sum', 1000, 'int64')                                                                                                                    |
|          | 5.17±0.03μs                | 5.26±0.01μs                             | 1.02    | series_methods.NanOps.time_func('sum', 1000, 'int8')                                                                                                                     |
|          | 357±2μs                    | 355±1μs                                 | 0.99    | series_methods.NanOps.time_func('sum', 1000000, 'Int64')                                                                                                                 |
|          | 458±2μs                    | 456±1μs                                 | 1.00    | series_methods.NanOps.time_func('sum', 1000000, 'boolean')                                                                                                               |
|          | 260±10μs                   | 251±3μs                                 | 0.97    | series_methods.NanOps.time_func('sum', 1000000, 'float64')                                                                                                               |
|          | 230±0.8μs                  | 231±1μs                                 | 1.00    | series_methods.NanOps.time_func('sum', 1000000, 'int32')                                                                                                                 |
|          | 119±0.4μs                  | 120±0.9μs                               | 1.01    | series_methods.NanOps.time_func('sum', 1000000, 'int64')                                                                                                                 |
|          | 222±0.5μs                  | 234±5μs                                 | 1.06    | series_methods.NanOps.time_func('sum', 1000000, 'int8')                                                                                                                  |
|          | 31.2±0.09μs                | 32.3±0.06μs                             | 1.04    | series_methods.NanOps.time_func('var', 1000, 'Int64')                                                                                                                    |
|          | 32.2±0.4μs                 | 34.9±1μs                                | 1.08    | series_methods.NanOps.time_func('var', 1000, 'boolean')                                                                                                                  |
|          | 6.20±0.02μs                | 6.40±0.07μs                             | 1.03    | series_methods.NanOps.time_func('var', 1000, 'float64')                                                                                                                  |
|          | 6.13±0.03μs                | 6.36±0.05μs                             | 1.04    | series_methods.NanOps.time_func('var', 1000, 'int32')                                                                                                                    |
|          | 6.20±0.02μs                | 6.46±0.04μs                             | 1.04    | series_methods.NanOps.time_func('var', 1000, 'int64')                                                                                                                    |
|          | 27.3±0.09μs                | 27.2±0.4μs                              | 1.00    | series_methods.NanOps.time_func('var', 1000, 'int8')                                                                                                                     |
|          | 1.72±0.09ms                | 1.73±0.04ms                             | 1.01    | series_methods.NanOps.time_func('var', 1000000, 'Int64')                                                                                                                 |
|          | 2.30±0.6ms                 | 2.39±0.8ms                              | 1.04    | series_methods.NanOps.time_func('var', 1000000, 'boolean')                                                                                                               |
|          | 1.23±0ms                   | 1.23±0ms                                | 1.00    | series_methods.NanOps.time_func('var', 1000000, 'float64')                                                                                                               |
|          | 1.23±0ms                   | 1.22±0ms                                | 1.00    | series_methods.NanOps.time_func('var', 1000000, 'int32')                                                                                                                 |
|          | 1.23±0ms                   | 1.23±0ms                                | 1.00    | series_methods.NanOps.time_func('var', 1000000, 'int64')                                                                                                                 |
|          | 820±4μs                    | 814±1μs                                 | 0.99    | series_methods.NanOps.time_func('var', 1000000, 'int8')

@rhshadrach rhshadrach marked this pull request as ready for review November 13, 2025 22:11
@rhshadrach
Copy link
Member Author

@jbrockmendel - you good with the ASVs here?

@jbrockmendel
Copy link
Member

No complaints here.

else:
result = result.reshape(1)
if using_python_scalars():
result = np.array([result])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doing this instead of maybe_unbox_numpy_scalar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result here prior to L1543 is already a Python scalar when future.python_scalars=True due to calling the reduction function. In this block, keepdims=True so we need to convert it to a NumPy array.

if isinstance(result, np.longdouble):
result = float(result)
else:
result = value.item()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this will mess up on a timedelta64:

obj = np.timedelta64(1, "ns")
assert isinstance(obj, np.generic)

>>> obj.item()
1

I don't know if there are other cases where obj.item() messes up, but I'm wary of it. Heads up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - will add a test for all dtypes. Here is the full list of scalars and their corresponding item type without datetime/timedelta. Only other problematic one is complex256.

? bool <class 'bool'>
b int8 <class 'int'>
h int16 <class 'int'>
i int32 <class 'int'>
l int64 <class 'int'>
q int64 <class 'int'>
n int64 <class 'int'>
p int64 <class 'int'>
B uint8 <class 'int'>
H uint16 <class 'int'>
I uint32 <class 'int'>
L uint64 <class 'int'>
Q uint64 <class 'int'>
N uint64 <class 'int'>
P uint64 <class 'int'>
e float16 <class 'float'>
f float32 <class 'float'>
d float64 <class 'float'>
g float128 <class 'numpy.longdouble'>
F complex64 <class 'complex'>
D complex128 <class 'complex'>
G complex256 <class 'numpy.clongdouble'>
S |S1 <class 'bytes'>
U <U1 <class 'str'>
V |V0 <class 'bytes'>

For datetime/timedelta, the arrow dtypes already return pd.Timedelta and pd.Timestamp. I think we can align the NumPy dtypes to do the same.

result = getattr(df.C, op)()
assert isinstance(result, np.float64)
if using_python_scalars:
assert isinstance(result, float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think float64 subclasses float, so this won't exclude float64

if using_python_scalars:
assert isinstance(result, int)
else:
assert result.dtype == "uint64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just check type rather than dtype?

assert 0 == s.skew()
assert isinstance(s.skew(), np.float64) # GH53482
if using_python_scalars:
assert isinstance(s.skew(), float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't exclude float64

tm.assert_series_equal(result, expected)
else:
expected = values[1]
if using_python_scalars and values.dtype.kind in ["i", "f"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBD but checking for kind in "if" is very slightly faster than checking for kind in ["i", "f"].

if isinstance(all_data, Series):
assert not (std_x < 0).any()
else:
assert not (std_x < 0).any().any()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess np.bool_(True).any() returns itself? thats kind of convenient. could go in the list joris asked for of potential downsides

Copy link
Member Author

@rhshadrach rhshadrach Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular case, I'd say axis=None should be preferred. But perhaps there are others where that is useful.

if isinstance(all_data, Series):
assert not (var_x < 0).any()
else:
assert not (var_x < 0).any().any()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this pattern is going to show up a lot, could make a helper in pd._testing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to axis=None, no branching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants