Skip to content

Commit d752ab3

Browse files
authored
Merge branch 'main' into #57512-bad-datetime-str-conversion-in-series-ctor
2 parents 16f6514 + ded256d commit d752ab3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+486
-1002
lines changed

asv_bench/benchmarks/categoricals.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def setup(self):
8888
)
8989

9090
for col in ("int", "float", "timestamp"):
91-
self.df[col + "_as_str"] = self.df[col].astype(str)
91+
self.df[f"{col}_as_str"] = self.df[col].astype(str)
9292

9393
for col in self.df.columns:
9494
self.df[col] = self.df[col].astype("category")

asv_bench/benchmarks/join_merge.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -328,6 +328,23 @@ def time_i8merge(self, how):
328328
merge(self.left, self.right, how=how)
329329

330330

331+
class UniqueMerge:
332+
params = [4_000_000, 1_000_000]
333+
param_names = ["unique_elements"]
334+
335+
def setup(self, unique_elements):
336+
N = 1_000_000
337+
self.left = DataFrame({"a": np.random.randint(1, unique_elements, (N,))})
338+
self.right = DataFrame({"a": np.random.randint(1, unique_elements, (N,))})
339+
uniques = self.right.a.drop_duplicates()
340+
self.right["a"] = concat(
341+
[uniques, Series(np.arange(0, -(N - len(uniques)), -1))], ignore_index=True
342+
)
343+
344+
def time_unique_merge(self, unique_elements):
345+
merge(self.left, self.right, how="inner")
346+
347+
331348
class MergeDatetime:
332349
params = [
333350
[

doc/source/development/contributing_docstring.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -940,7 +940,7 @@ Finally, docstrings can also be appended to with the ``doc`` decorator.
940940

941941
In this example, we'll create a parent docstring normally (this is like
942942
``pandas.core.generic.NDFrame``). Then we'll have two children (like
943-
``pandas.core.series.Series`` and ``pandas.core.frame.DataFrame``). We'll
943+
``pandas.core.series.Series`` and ``pandas.DataFrame``). We'll
944944
substitute the class names in this docstring.
945945

946946
.. code-block:: python

doc/source/development/maintaining.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -151,15 +151,15 @@ and then run::
151151
git bisect start
152152
git bisect good v1.4.0
153153
git bisect bad v1.5.0
154-
git bisect run bash -c "python setup.py build_ext -j 4; python t.py"
154+
git bisect run bash -c "python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true; python t.py"
155155

156156
This finds the first commit that changed the behavior. The C extensions have to be
157157
rebuilt at every step, so the search can take a while.
158158

159159
Exit bisect and rebuild the current version::
160160

161161
git bisect reset
162-
python setup.py build_ext -j 4
162+
python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true
163163

164164
Report your findings under the corresponding issue and ping the commit author to get
165165
their input.

doc/source/user_guide/enhancingperf.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -453,7 +453,7 @@ by evaluate arithmetic and boolean expression all at once for large :class:`~pan
453453
:func:`~pandas.eval` is many orders of magnitude slower for
454454
smaller expressions or objects than plain Python. A good rule of thumb is
455455
to only use :func:`~pandas.eval` when you have a
456-
:class:`.DataFrame` with more than 10,000 rows.
456+
:class:`~pandas.core.frame.DataFrame` with more than 10,000 rows.
457457

458458
Supported syntax
459459
~~~~~~~~~~~~~~~~

doc/source/user_guide/io.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6400,7 +6400,7 @@ ignored.
64006400
In [2]: df = pd.DataFrame({'A': np.random.randn(sz), 'B': [1] * sz})
64016401
64026402
In [3]: df.info()
6403-
<class 'pandas.core.frame.DataFrame'>
6403+
<class 'pandas.DataFrame'>
64046404
RangeIndex: 1000000 entries, 0 to 999999
64056405
Data columns (total 2 columns):
64066406
A 1000000 non-null float64

doc/source/whatsnew/v0.24.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@ then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was retur
840840
In [2]: df = pd.DataFrame({"A": [1, 2], "B": ['a', 'b'], "C": ['a', 'a']})
841841
842842
In [3]: type(pd.get_dummies(df, sparse=True))
843-
Out[3]: pandas.core.frame.DataFrame
843+
Out[3]: pandas.DataFrame
844844
845845
In [4]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
846846
Out[4]: pandas.core.sparse.frame.SparseDataFrame

doc/source/whatsnew/v1.0.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -414,7 +414,7 @@ Extended verbose info output for :class:`~pandas.DataFrame`
414414
... "text_col": ["a", "b", "c"],
415415
... "float_col": [0.0, 0.1, 0.2]})
416416
In [2]: df.info(verbose=True)
417-
<class 'pandas.core.frame.DataFrame'>
417+
<class 'pandas.DataFrame'>
418418
RangeIndex: 3 entries, 0 to 2
419419
Data columns (total 3 columns):
420420
int_col 3 non-null int64

doc/source/whatsnew/v3.0.0.rst

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ Other enhancements
3434
- Allow dictionaries to be passed to :meth:`pandas.Series.str.replace` via ``pat`` parameter (:issue:`51748`)
3535
- Support passing a :class:`Series` input to :func:`json_normalize` that retains the :class:`Series` :class:`Index` (:issue:`51452`)
3636
- Users can globally disable any ``PerformanceWarning`` by setting the option ``mode.performance_warnings`` to ``False`` (:issue:`56920`)
37-
-
3837

3938
.. ---------------------------------------------------------------------------
4039
.. _whatsnew_300.notable_bug_fixes:
@@ -211,6 +210,7 @@ Removal of prior version deprecations/changes
211210
- Enforced deprecation of strings ``T``, ``L``, ``U``, and ``N`` denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`57627`)
212211
- Enforced deprecation of strings ``T``, ``L``, ``U``, and ``N`` denoting units in :class:`Timedelta` (:issue:`57627`)
213212
- Enforced deprecation of the behavior of :func:`concat` when ``len(keys) != len(objs)`` would truncate to the shorter of the two. Now this raises a ``ValueError`` (:issue:`43485`)
213+
- Enforced deprecation of values "pad", "ffill", "bfill", and "backfill" for :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` (:issue:`57869`)
214214
- Enforced silent-downcasting deprecation for :ref:`all relevant methods <whatsnew_220.silent_downcasting>` (:issue:`54710`)
215215
- In :meth:`DataFrame.stack`, the default value of ``future_stack`` is now ``True``; specifying ``False`` will raise a ``FutureWarning`` (:issue:`55448`)
216216
- Iterating over a :class:`.DataFrameGroupBy` or :class:`.SeriesGroupBy` will return tuples of length 1 for the groups when grouping by ``level`` a list of length 1 (:issue:`50064`)
@@ -256,14 +256,18 @@ Removal of prior version deprecations/changes
256256
- Removed unused arguments ``*args`` and ``**kwargs`` in :class:`Resampler` methods (:issue:`50977`)
257257
- Unrecognized timezones when parsing strings to datetimes now raises a ``ValueError`` (:issue:`51477`)
258258
- Removed the :class:`Grouper` attributes ``ax``, ``groups``, ``indexer``, and ``obj`` (:issue:`51206`, :issue:`51182`)
259+
- Removed deprecated keyword ``verbose`` on :func:`read_csv` and :func:`read_table` (:issue:`56556`)
260+
- Removed the ``method`` keyword in ``ExtensionArray.fillna``, implement ``ExtensionArray._pad_or_backfill`` instead (:issue:`53621`)
259261
- Removed the attribute ``dtypes`` from :class:`.DataFrameGroupBy` (:issue:`51997`)
262+
- Enforced deprecation of ``argmin``, ``argmax``, ``idxmin``, and ``idxmax`` returning a result when ``skipna=False`` and an NA value is encountered or all values are NA values; these operations will now raise in such cases (:issue:`33941`, :issue:`51276`)
260263

261264
.. ---------------------------------------------------------------------------
262265
.. _whatsnew_300.performance:
263266

264267
Performance improvements
265268
~~~~~~~~~~~~~~~~~~~~~~~~
266269
- :attr:`Categorical.categories` returns a :class:`RangeIndex` columns instead of an :class:`Index` if the constructed ``values`` was a ``range``. (:issue:`57787`)
270+
- :class:`DataFrame` returns a :class:`RangeIndex` columns when possible when ``data`` is a ``dict`` (:issue:`57943`)
267271
- :func:`concat` returns a :class:`RangeIndex` level in the :class:`MultiIndex` result when ``keys`` is a ``range`` or :class:`RangeIndex` (:issue:`57542`)
268272
- :meth:`RangeIndex.append` returns a :class:`RangeIndex` instead of a :class:`Index` when appending values that could continue the :class:`RangeIndex` (:issue:`57467`)
269273
- :meth:`Series.str.extract` returns a :class:`RangeIndex` columns instead of an :class:`Index` column when possible (:issue:`57542`)
@@ -284,6 +288,7 @@ Performance improvements
284288
- Performance improvement in :meth:`RangeIndex.join` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57651`, :issue:`57752`)
285289
- Performance improvement in :meth:`RangeIndex.reindex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57647`, :issue:`57752`)
286290
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`, :issue:`57752`)
291+
- Performance improvement in :func:`merge` if hash-join can be used (:issue:`57970`)
287292
- Performance improvement in ``DataFrameGroupBy.__len__`` and ``SeriesGroupBy.__len__`` (:issue:`57595`)
288293
- Performance improvement in indexing operations for string dtypes (:issue:`56997`)
289294
- Performance improvement in unary methods on a :class:`RangeIndex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57825`)
@@ -316,7 +321,7 @@ Datetimelike
316321

317322
Timedelta
318323
^^^^^^^^^
319-
-
324+
- Accuracy improvement in :meth:`Timedelta.to_pytimedelta` to round microseconds consistently for large nanosecond based Timedelta (:issue:`57841`)
320325
-
321326

322327
Timezones

pandas/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@
2828
raise ImportError(
2929
f"C extension: {_module} not built. If you want to import "
3030
"pandas from the source directory, you may need to run "
31-
"'python setup.py build_ext' to build the C extensions first."
31+
"'python -m pip install -ve . --no-build-isolation --config-settings "
32+
"editable-verbose=true' to build the C extensions first."
3233
) from _err
3334

3435
from pandas._config import (

0 commit comments

Comments
 (0)