Merge branch 'main' into selectn_series_perf_impact

Jeffrharr · web-flow · commit f26a7fc397c3 · 2025-03-24T19:10:04.000-06:00
diff --git a/doc/source/development/contributing_codebase.rst b/doc/source/development/contributing_codebase.rst
@@ -537,7 +537,7 @@ Preferred ``pytest`` idioms
     test and does not check if the test will fail. If this is the behavior you desire, use ``pytest.skip`` instead.
 
 If a test is known to fail but the manner in which it fails
-is not meant to be captured, use ``pytest.mark.xfail`` It is common to use this method for a test that
+is not meant to be captured, use ``pytest.mark.xfail``. It is common to use this method for a test that
 exhibits buggy behavior or a non-implemented feature. If
 the failing test has flaky behavior, use the argument ``strict=False``. This
 will make it so pytest does not fail if the test happens to pass. Using ``strict=False`` is highly undesirable, please use it only as a last resort.
diff --git a/doc/source/user_guide/basics.rst b/doc/source/user_guide/basics.rst
@@ -36,7 +36,7 @@ of elements to display is five, but you may pass a custom number.
 Attributes and underlying data
 ------------------------------
 
-pandas objects have a number of attributes enabling you to access the metadata
+pandas objects have a number of attributes enabling you to access the metadata.
 
 * **shape**: gives the axis dimensions of the object, consistent with ndarray
 * Axis labels
@@ -59,7 +59,7 @@ NumPy's type system to add support for custom arrays
 (see :ref:`basics.dtypes`).
 
 To get the actual data inside a :class:`Index` or :class:`Series`, use
-the ``.array`` property
+the ``.array`` property.
 
 .. ipython:: python
 
@@ -88,18 +88,18 @@ NumPy doesn't have a dtype to represent timezone-aware datetimes, so there
 are two possibly useful representations:
 
 1. An object-dtype :class:`numpy.ndarray` with :class:`Timestamp` objects, each
-   with the correct ``tz``
+   with the correct ``tz``.
 2. A ``datetime64[ns]`` -dtype :class:`numpy.ndarray`, where the values have
-   been converted to UTC and the timezone discarded
+   been converted to UTC and the timezone discarded.
 
-Timezones may be preserved with ``dtype=object``
+Timezones may be preserved with ``dtype=object``:
 
 .. ipython:: python
 
    ser = pd.Series(pd.date_range("2000", periods=2, tz="CET"))
    ser.to_numpy(dtype=object)
 
-Or thrown away with ``dtype='datetime64[ns]'``
+Or thrown away with ``dtype='datetime64[ns]'``:
 
 .. ipython:: python
 
diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst
@@ -775,6 +775,7 @@ Groupby/resample/rolling
 - Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
 - Bug in :meth:`.Resampler.interpolate` on a :class:`DataFrame` with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (:issue:`21351`)
 - Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
+- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
 - Bug in :meth:`DataFrameGroupBy.agg` that raises ``AttributeError`` when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (:issue:`55041`)
 - Bug in :meth:`DataFrameGroupBy.apply` and :meth:`SeriesGroupBy.apply` for empty data frame with ``group_keys=False`` still creating output index using group keys. (:issue:`60471`)
 - Bug in :meth:`DataFrameGroupBy.apply` that was returning a completely empty DataFrame when all return values of ``func`` were ``None`` instead of returning an empty DataFrame with the original columns and dtypes. (:issue:`57775`)
@@ -841,6 +842,7 @@ Other
 - Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a ``ValueError`` instead of a ``TypeError`` (:issue:`56330`)
 - Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`)
 - Bug in :meth:`MultiIndex.fillna` error message was referring to ``isna`` instead of ``fillna`` (:issue:`60974`)
+- Bug in :meth:`Series.describe` where median percentile was always included when the ``percentiles`` argument was passed (:issue:`60550`).
 - Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
 - Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
 - Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)
diff --git a/pandas/_libs/tslibs/period.pyx b/pandas/_libs/tslibs/period.pyx
@@ -1752,9 +1752,6 @@ cdef class _Period(PeriodMixin):
     def __cinit__(self, int64_t ordinal, BaseOffset freq):
         self.ordinal = ordinal
         self.freq = freq
-        # Note: this is more performant than PeriodDtype.from_date_offset(freq)
-        #  because from_date_offset cannot be made a cdef method (until cython
-        #  supported cdef classmethods)
         self._dtype = PeriodDtypeBase(freq._period_dtype_code, freq.n)
 
     @classmethod
@@ -1913,7 +1910,7 @@ cdef class _Period(PeriodMixin):
 
         Parameters
         ----------
-        freq : str, BaseOffset
+        freq : str, DateOffset
             The target frequency to convert the Period object to.
             If a string is provided,
             it must be a valid :ref:`period alias <timeseries.period_aliases>`.
@@ -2599,7 +2596,7 @@ cdef class _Period(PeriodMixin):
 
         Parameters
         ----------
-        freq : str, BaseOffset
+        freq : str, DateOffset
             Frequency to use for the returned period.
 
         See Also
diff --git a/pandas/core/generic.py b/pandas/core/generic.py
@@ -10818,9 +10818,8 @@ def describe(
         ----------
         percentiles : list-like of numbers, optional
             The percentiles to include in the output. All should
-            fall between 0 and 1. The default is
-            ``[.25, .5, .75]``, which returns the 25th, 50th, and
-            75th percentiles.
+            fall between 0 and 1. The default, ``None``, will automatically
+            return the 25th, 50th, and 75th percentiles.
         include : 'all', list-like of dtypes or None (default), optional
             A white list of data types to include in the result. Ignored
             for ``Series``. Here are the options:
diff --git a/pandas/core/methods/describe.py b/pandas/core/methods/describe.py
@@ -229,10 +229,15 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
 
     formatted_percentiles = format_percentiles(percentiles)
 
+    if len(percentiles) == 0:
+        quantiles = []
+    else:
+        quantiles = series.quantile(percentiles).tolist()
+
     stat_index = ["count", "mean", "std", "min"] + formatted_percentiles + ["max"]
     d = (
         [series.count(), series.mean(), series.std(), series.min()]
-        + series.quantile(percentiles).tolist()
+        + quantiles
         + [series.max()]
     )
     # GH#48340 - always return float on non-complex numeric data
@@ -354,10 +359,6 @@ def _refine_percentiles(
     # get them all to be in [0, 1]
     validate_percentile(percentiles)
 
-    # median should always be included
-    if 0.5 not in percentiles:
-        percentiles.append(0.5)
-
     percentiles = np.asarray(percentiles)
 
     # sort and check for duplicates
diff --git a/pandas/core/resample.py b/pandas/core/resample.py
@@ -507,22 +507,12 @@ def _wrap_result(self, result):
         """
         Potentially wrap any results.
         """
-        # GH 47705
-        obj = self.obj
-        if (
-            isinstance(result, ABCDataFrame)
-            and len(result) == 0
-            and not isinstance(result.index, PeriodIndex)
-        ):
-            result = result.set_index(
-                _asfreq_compat(obj.index[:0], freq=self.freq), append=True
-            )
-
         if isinstance(result, ABCSeries) and self._selection is not None:
             result.name = self._selection
 
         if isinstance(result, ABCSeries) and result.empty:
             # When index is all NaT, result is empty but index is not
+            obj = self.obj
             result.index = _asfreq_compat(obj.index[:0], freq=self.freq)
             result.name = getattr(obj, "name", None)
 
@@ -1756,6 +1746,17 @@ def func(x):
             return x.apply(f, *args, **kwargs)
 
         result = self._groupby.apply(func)
+
+        # GH 47705
+        if (
+            isinstance(result, ABCDataFrame)
+            and len(result) == 0
+            and not isinstance(result.index, PeriodIndex)
+        ):
+            result = result.set_index(
+                _asfreq_compat(self.obj.index[:0], freq=self.freq), append=True
+            )
+
         return self._wrap_result(result)
 
     _upsample = _apply
diff --git a/pandas/io/excel/_base.py b/pandas/io/excel/_base.py
@@ -197,7 +197,7 @@
     False otherwise. An example of a valid callable argument would be ``lambda
     x: x in [0, 2]``.
 nrows : int, default None
-    Number of rows to parse.
+    Number of rows to parse. Does not include header rows.
 na_values : scalar, str, list-like, or dict, default None
     Additional strings to recognize as NA/NaN. If dict passed, specific
     per-column NA values. By default the following values are interpreted
diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py
@@ -1565,6 +1565,9 @@ def format_percentiles(
     >>> format_percentiles([0, 0.5, 0.02001, 0.5, 0.666666, 0.9999])
     ['0%', '50%', '2.0%', '50%', '66.67%', '99.99%']
     """
+    if len(percentiles) == 0:
+        return []
+
     percentiles = np.asarray(percentiles)
 
     # It checks for np.nan as well
diff --git a/pandas/tests/frame/methods/test_describe.py b/pandas/tests/frame/methods/test_describe.py
@@ -413,3 +413,44 @@ def test_describe_exclude_pa_dtype(self):
             dtype=pd.ArrowDtype(pa.float64()),
         )
         tm.assert_frame_equal(result, expected)
+
+    @pytest.mark.parametrize("percentiles", [None, [], [0.2]])
+    def test_refine_percentiles(self, percentiles):
+        """
+        Test that the percentiles are returned correctly depending on the `percentiles`
+        argument.
+        - The default behavior is to return the 25th, 50th, and 75 percentiles
+        - If `percentiles` is an empty list, no percentiles are returned
+        - If `percentiles` is a non-empty list, only those percentiles are returned
+        """
+        # GH#60550
+        df = DataFrame({"a": np.arange(0, 10, 1)})
+
+        result = df.describe(percentiles=percentiles)
+
+        if percentiles is None:
+            percentiles = [0.25, 0.5, 0.75]
+
+        expected = DataFrame(
+            [
+                len(df.a),
+                df.a.mean(),
+                df.a.std(),
+                df.a.min(),
+                *[df.a.quantile(p) for p in percentiles],
+                df.a.max(),
+            ],
+            index=pd.Index(
+                [
+                    "count",
+                    "mean",
+                    "std",
+                    "min",
+                    *[f"{p:.0%}" for p in percentiles],
+                    "max",
+                ]
+            ),
+            columns=["a"],
+        )
+
+        tm.assert_frame_equal(result, expected)
diff --git a/pandas/tests/groupby/methods/test_describe.py b/pandas/tests/groupby/methods/test_describe.py
@@ -202,15 +202,15 @@ def test_describe_duplicate_columns():
     gb = df.groupby(df[1])
     result = gb.describe(percentiles=[])
 
-    columns = ["count", "mean", "std", "min", "50%", "max"]
+    columns = ["count", "mean", "std", "min", "max"]
     frames = [
-        DataFrame([[1.0, val, np.nan, val, val, val]], index=[1], columns=columns)
+        DataFrame([[1.0, val, np.nan, val, val]], index=[1], columns=columns)
         for val in (0.0, 2.0, 3.0)
     ]
     expected = pd.concat(frames, axis=1)
     expected.columns = MultiIndex(
         levels=[[0, 2], columns],
-        codes=[6 * [0] + 6 * [1] + 6 * [0], 3 * list(range(6))],
+        codes=[5 * [0] + 5 * [1] + 5 * [0], 3 * list(range(5))],
     )
     expected.index.names = [1]
     tm.assert_frame_equal(result, expected)
diff --git a/pandas/tests/resample/test_base.py b/pandas/tests/resample/test_base.py
@@ -438,6 +438,24 @@ def test_resample_size_empty_dataframe(freq, index):
     tm.assert_series_equal(result, expected)
 
 
+@pytest.mark.parametrize("index", [DatetimeIndex([]), TimedeltaIndex([])])
+@pytest.mark.parametrize("freq", ["D", "h"])
+@pytest.mark.parametrize(
+    "method", ["ffill", "bfill", "nearest", "asfreq", "interpolate", "mean"]
+)
+def test_resample_apply_empty_dataframe(index, freq, method):
+    # GH#55572
+    empty_frame_dti = DataFrame(index=index)
+
+    rs = empty_frame_dti.resample(freq)
+    result = rs.apply(getattr(rs, method))
+
+    expected_index = _asfreq_compat(empty_frame_dti.index, freq)
+    expected = DataFrame([], index=expected_index)
+
+    tm.assert_frame_equal(result, expected)
+
+
 @pytest.mark.parametrize(
     "index",
     [
diff --git a/web/pandas/config.yml b/web/pandas/config.yml
@@ -146,16 +146,6 @@ sponsors:
     url: https://numfocus.org/
     logo: static/img/partners/numfocus.svg
     kind: numfocus
-  - name: "Two Sigma"
-    url: https://www.twosigma.com/
-    logo: static/img/partners/two_sigma.svg
-    kind: partner
-    description: "Jeff Reback"
-  - name: "Voltron Data"
-    url: https://voltrondata.com/
-    logo: static/img/partners/voltron_data.svg
-    kind: partner
-    description: "Joris Van den Bossche"
   - name: "Coiled"
     url: https://www.coiled.io
     logo: static/img/partners/coiled.svg
@@ -171,21 +161,11 @@ sponsors:
     logo: static/img/partners/nvidia.svg
     kind: partner
     description: "Matthew Roeschke"
-  - name: "Intel"
-    url: https://www.intel.com/
-    logo: /static/img/partners/intel.svg
-    kind: partner
-    description: "Brock Mendel"
   - name: "Tidelift"
     url: https://tidelift.com
     logo: static/img/partners/tidelift.svg
     kind: regular
     description: "<i>pandas</i> is part of the <a href=\"https://tidelift.com/subscription/pkg/pypi-pandas?utm_source=pypi-pandas&utm_medium=referral&utm_campaign=readme\">Tidelift subscription</a>. You can support pandas by becoming a Tidelift subscriber."
-  - name: "Chan Zuckerberg Initiative"
-    url: https://chanzuckerberg.com/
-    logo: static/img/partners/czi.svg
-    kind: regular
-    description: "<i>pandas</i> is funded by the Essential Open Source Software for Science program of the Chan Zuckerberg Initiative. The funding is used for general maintenance, improve extension types, and a efficient string type."
   - name: "Bodo"
     url: https://www.bodo.ai/
     logo: static/img/partners/bodo.svg
diff --git a/web/pandas/index.html b/web/pandas/index.html
@@ -46,10 +46,10 @@ <h5>With the support of:</h5>
                     {% for row in sponsors.active | batch(6, "") %}
                         <div class="row mx-auto h-100">
                             {% for company in row %}
-                                <div class="col-6 col-md-2">
+                                <div class="col-6 col-md-2 d-flex align-items-center justify-content-center">
                                     {% if company %}
                                         <a href="{{ company.url }}" target="_blank">
-                                            <img class="img-fluid img-thumbnail py-5 mx-auto" alt="{{ company.name }}" src="{{ base_url }}{{ company.logo }}"/>
+                                            <img class="img-fluid w-100" alt="{{ company.name }}" src="{{ base_url }}{{ company.logo }}"/>
                                         </a>
                                     {% endif %}
                                 </div>
diff --git a/web/pandas/static/img/partners/czi.svg b/web/pandas/static/img/partners/czi.svg
diff --git a/web/pandas/static/img/partners/intel.svg b/web/pandas/static/img/partners/intel.svg
diff --git a/web/pandas/static/img/partners/two_sigma.svg b/web/pandas/static/img/partners/two_sigma.svg
diff --git a/web/pandas/static/img/partners/voltron_data.svg b/web/pandas/static/img/partners/voltron_data.svg