Skip to content

Commit 5fc8141

Browse files
authored
Merge branch 'pandas-dev:main' into autofilter-feature
2 parents 63966e1 + 49f4a94 commit 5fc8141

File tree

108 files changed

+929
-391
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

108 files changed

+929
-391
lines changed

doc/source/user_guide/io.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ on_bad_lines : {{'error', 'warn', 'skip'}}, default 'error'
343343
Specifies what to do upon encountering a bad line (a line with too many fields).
344344
Allowed values are :
345345

346-
- 'error', raise an ParserError when a bad line is encountered.
346+
- 'error', raise a ParserError when a bad line is encountered.
347347
- 'warn', print a warning when a bad line is encountered and skip that line.
348348
- 'skip', skip bad lines without raising or warning when they are encountered.
349349

@@ -3717,6 +3717,7 @@ The look and feel of Excel worksheets created from pandas can be modified using
37173717

37183718
* ``float_format`` : Format string for floating point numbers (default ``None``).
37193719
* ``freeze_panes`` : A tuple of two integers representing the bottommost row and rightmost column to freeze. Each of these parameters is one-based, so (1, 1) will freeze the first row and first column (default ``None``).
3720+
* ``autofilter`` : A boolean indicating whether to add automatic filters to all columns (default ``False``).
37203721

37213722
.. note::
37223723

doc/source/user_guide/timeseries.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,19 @@ inferred frequency upon creation:
241241
242242
pd.DatetimeIndex(["2018-01-01", "2018-01-03", "2018-01-05"], freq="infer")
243243
244+
In most cases, parsing strings to datetimes (with any of :func:`to_datetime`, :class:`DatetimeIndex`, or :class:`Timestamp`) will produce objects with microsecond ("us") unit. The exception to this rule is if your strings have nanosecond precision, in which case the result will have "ns" unit:
245+
246+
.. ipython:: python
247+
248+
pd.to_datetime(["2016-01-01 02:03:04"]).unit
249+
pd.to_datetime(["2016-01-01 02:03:04.123"]).unit
250+
pd.to_datetime(["2016-01-01 02:03:04.123456"]).unit
251+
pd.to_datetime(["2016-01-01 02:03:04.123456789"]).unit
252+
253+
.. versionchanged:: 3.0.0
254+
255+
Previously, :func:`to_datetime` and :class:`DatetimeIndex` would always parse strings to "ns" unit. During pandas 2.x, :class:`Timestamp` could give any of "s", "ms", "us", or "ns" depending on the specificity of the input string.
256+
244257
.. _timeseries.converting.format:
245258

246259
Providing a format argument
@@ -379,6 +392,16 @@ We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by
379392
380393
(stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")
381394
395+
Another common way to perform this conversion is to convert directly to an integer dtype. Note that the exact integers this produces will depend on the specific unit
396+
or resolution of the datetime64 dtype:
397+
398+
.. ipython:: python
399+
400+
stamps.astype(np.int64)
401+
stamps.astype("datetime64[s]").astype(np.int64)
402+
stamps.astype("datetime64[ms]").astype(np.int64)
403+
404+
382405
.. _timeseries.origin:
383406

384407
Using the ``origin`` parameter

doc/source/whatsnew/v3.0.0.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ Other enhancements
202202
- :class:`Holiday` has gained the constructor argument and field ``exclude_dates`` to exclude specific datetimes from a custom holiday calendar (:issue:`54382`)
203203
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
204204
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
205+
- :func:`DataFrame.to_excel` has a new ``autofilter`` parameter to add automatic filters to all columns (:issue:`61194`)
205206
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
206207
- :func:`to_numeric` on big integers converts to ``object`` datatype with python integers when not coercing. (:issue:`51295`)
207208
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
@@ -232,7 +233,6 @@ Other enhancements
232233
- Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
233234
- Support reading Stata 110-format (Stata 7) dta files (:issue:`47176`)
234235
- Switched wheel upload to **PyPI Trusted Publishing** (OIDC) for release-tag pushes in ``wheels.yml``. (:issue:`61718`)
235-
-
236236

237237
.. ---------------------------------------------------------------------------
238238
.. _whatsnew_300.notable_bug_fixes:
@@ -358,7 +358,7 @@ When passing strings, the resolution will depend on the precision of the string,
358358
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
359359
Out[5]: dtype('<M8[ns]')
360360
361-
The inferred resolution now matches that of the input strings:
361+
The inferred resolution now matches that of the input strings for nanosecond-precision strings, otherwise defaulting to microseconds:
362362

363363
.. ipython:: python
364364
@@ -367,13 +367,17 @@ The inferred resolution now matches that of the input strings:
367367
In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
368368
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
369369
370+
This is also a change for the :class:`Timestamp` constructor with a string input, which in version 2.x.y could give second or millisecond unit, which users generally disliked (:issue:`52653`)
371+
370372
In cases with mixed-resolution inputs, the highest resolution is used:
371373

372374
.. code-block:: ipython
373375
374376
In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
375377
Out[2]: dtype('<M8[ns]')
376378
379+
.. warning:: Many users will now get "M8[us]" dtype data in cases when they used to get "M8[ns]". For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.
380+
377381
.. _whatsnew_300.api_breaking.concat_datetime_sorting:
378382

379383
:func:`concat` no longer ignores ``sort`` when all objects have a :class:`DatetimeIndex`
@@ -1111,12 +1115,14 @@ Conversion
11111115
- Bug in :meth:`DataFrame.astype` not casting ``values`` for Arrow-based dictionary dtype correctly (:issue:`58479`)
11121116
- Bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
11131117
- Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
1118+
- Bug in :meth:`Series.convert_dtypes` and :meth:`DataFrame.convert_dtypes` raising ``TypeError`` when called on data with complex dtype (:issue:`60129`)
11141119
- Bug in :meth:`Series.convert_dtypes` and :meth:`DataFrame.convert_dtypes` removing timezone information for objects with :class:`ArrowDtype` (:issue:`60237`)
11151120
- Bug in :meth:`Series.reindex` not maintaining ``float32`` type when a ``reindex`` introduces a missing value (:issue:`45857`)
11161121
- Bug in :meth:`to_datetime` and :meth:`to_timedelta` with input ``None`` returning ``None`` instead of ``NaT``, inconsistent with other conversion methods (:issue:`23055`)
11171122

11181123
Strings
11191124
^^^^^^^
1125+
- Bug in :meth:`Series.str.match` failing to raise when given a compiled ``re.Pattern`` object and conflicting ``case`` or ``flags`` arguments (:issue:`62240`)
11201126
- Bug in :meth:`Series.str.replace` raising an error on valid group references (``\1``, ``\2``, etc.) on series converted to PyArrow backend dtype (:issue:`62653`)
11211127
- Bug in :meth:`Series.str.zfill` raising ``AttributeError`` for :class:`ArrowDtype` (:issue:`61485`)
11221128
- Bug in :meth:`Series.value_counts` would not respect ``sort=False`` for series having ``string`` dtype (:issue:`55224`)
@@ -1127,6 +1133,7 @@ Interval
11271133
- :meth:`Index.is_monotonic_decreasing`, :meth:`Index.is_monotonic_increasing`, and :meth:`Index.is_unique` could incorrectly be ``False`` for an ``Index`` created from a slice of another ``Index``. (:issue:`57911`)
11281134
- Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
11291135
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
1136+
- Bug in :func:`pandas.interval_range` incorrectly inferring ``int64`` dtype when ``np.float32`` and ``int`` are used for ``start`` and ``freq`` (:issue:`58964`)
11301137
- Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
11311138
- Construction of :class:`IntervalArray` and :class:`IntervalIndex` from arrays with mismatched signed/unsigned integer dtypes (e.g., ``int64`` and ``uint64``) now raises a :exc:`TypeError` instead of proceeding silently. (:issue:`55715`)
11321139

@@ -1299,6 +1306,7 @@ ExtensionArray
12991306
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
13001307
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
13011308
- Bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`)
1309+
- Bug in :meth:`NDArrayBackedExtensionArray.take` which produced arrays whose dtypes didn't match their underlying data, when called with integer arrays (:issue:`62448`)
13021310
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
13031311
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
13041312
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)

pandas/_libs/lib.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2593,7 +2593,7 @@ def maybe_convert_objects(ndarray[object] objects,
25932593
Whether to convert numeric entries.
25942594
convert_to_nullable_dtype : bool, default False
25952595
If an array-like object contains only integer or boolean values (and NaN) is
2596-
encountered, whether to convert and return an Boolean/IntegerArray.
2596+
encountered, whether to convert and return a Boolean/IntegerArray.
25972597
convert_non_numeric : bool, default False
25982598
Whether to convert datetime, timedelta, period, interval types.
25992599
dtype_if_all_nat : np.dtype, ExtensionDtype, or None, default None

pandas/_libs/tslibs/conversion.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -623,6 +623,8 @@ cdef _TSObject convert_str_to_tsobject(str ts, tzinfo tz,
623623
)
624624
if not string_to_dts_failed:
625625
reso = get_supported_reso(out_bestunit)
626+
if reso < NPY_FR_us:
627+
reso = NPY_FR_us
626628
check_dts_bounds(&dts, reso)
627629
obj = _TSObject()
628630
obj.dts = dts
@@ -661,6 +663,8 @@ cdef _TSObject convert_str_to_tsobject(str ts, tzinfo tz,
661663
nanos=&nanos,
662664
)
663665
reso = get_supported_reso(out_bestunit)
666+
if reso < NPY_FR_us:
667+
reso = NPY_FR_us
664668
return convert_datetime_to_tsobject(dt, tz, nanos=nanos, reso=reso)
665669

666670

pandas/_libs/tslibs/fields.pyx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ def get_date_name_field(
146146
NPY_DATETIMEUNIT reso=NPY_FR_ns,
147147
):
148148
"""
149-
Given a int64-based datetime index, return array of strings of date
149+
Given an int64-based datetime index, return array of strings of date
150150
name based on requested field (e.g. day_name)
151151
"""
152152
cdef:
@@ -335,7 +335,7 @@ def get_date_field(
335335
NPY_DATETIMEUNIT reso=NPY_FR_ns,
336336
):
337337
"""
338-
Given a int64-based datetime index, extract the year, month, etc.,
338+
Given an int64-based datetime index, extract the year, month, etc.,
339339
field and return an array of these values.
340340
"""
341341
cdef:
@@ -502,7 +502,7 @@ def get_timedelta_field(
502502
NPY_DATETIMEUNIT reso=NPY_FR_ns,
503503
):
504504
"""
505-
Given a int64-based timedelta index, extract the days, hrs, sec.,
505+
Given an int64-based timedelta index, extract the days, hrs, sec.,
506506
field and return an array of these values.
507507
"""
508508
cdef:
@@ -555,7 +555,7 @@ def get_timedelta_days(
555555
NPY_DATETIMEUNIT reso=NPY_FR_ns,
556556
):
557557
"""
558-
Given a int64-based timedelta index, extract the days,
558+
Given an int64-based timedelta index, extract the days,
559559
field and return an array of these values.
560560
"""
561561
cdef:
@@ -592,7 +592,7 @@ cpdef isleapyear_arr(ndarray years):
592592
@cython.boundscheck(False)
593593
def build_isocalendar_sarray(const int64_t[:] dtindex, NPY_DATETIMEUNIT reso):
594594
"""
595-
Given a int64-based datetime array, return the ISO 8601 year, week, and day
595+
Given an int64-based datetime array, return the ISO 8601 year, week, and day
596596
as a structured array.
597597
"""
598598
cdef:

pandas/_libs/tslibs/offsets.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -827,7 +827,7 @@ cdef class BaseOffset:
827827
@property
828828
def nanos(self):
829829
"""
830-
Returns a integer of the total number of nanoseconds for fixed frequencies.
830+
Returns an integer of the total number of nanoseconds for fixed frequencies.
831831
832832
Raises
833833
------

pandas/_libs/tslibs/strptime.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,8 @@ def array_strptime(
466466
# No error reported by string_to_dts, pick back up
467467
# where we left off
468468
item_reso = get_supported_reso(out_bestunit)
469+
if item_reso < NPY_DATETIMEUNIT.NPY_FR_us:
470+
item_reso = NPY_DATETIMEUNIT.NPY_FR_us
469471
state.update_creso(item_reso)
470472
if infer_reso:
471473
creso = state.creso
@@ -510,6 +512,8 @@ def array_strptime(
510512
val, fmt, exact, format_regex, locale_time, &dts, &item_reso
511513
)
512514

515+
if item_reso < NPY_DATETIMEUNIT.NPY_FR_us:
516+
item_reso = NPY_DATETIMEUNIT.NPY_FR_us
513517
state.update_creso(item_reso)
514518
if infer_reso:
515519
creso = state.creso

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,7 @@ cdef convert_to_timedelta64(object ts, str unit):
334334
Handle these types of objects:
335335
- timedelta/Timedelta
336336
337-
Return an timedelta64[ns] object
337+
Return a timedelta64[ns] object
338338
"""
339339
# Caller is responsible for checking unit not in ["Y", "y", "M"]
340340
if isinstance(ts, _Timedelta):

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1717,7 +1717,7 @@ cdef class _Timestamp(ABCTimestamp):
17171717
17181718
def to_period(self, freq=None):
17191719
"""
1720-
Return an period of which this timestamp is an observation.
1720+
Return a period of which this timestamp is an observation.
17211721

17221722
This method converts the given Timestamp to a Period object,
17231723
which represents a span of time,such as a year, month, etc.,

0 commit comments

Comments
 (0)