pandas-dev
diff --git a/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 1 addition & 0 deletions b/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 48 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎doc/source/user_guide/io.rst‎
Lines changed: 2 additions & 1 deletion b/‎doc/source/user_guide/io.rst‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎doc/source/user_guide/timeseries.rst‎
Lines changed: 23 additions & 0 deletions b/‎doc/source/user_guide/timeseries.rst‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎doc/source/whatsnew/v3.0.0.rst‎
Lines changed: 11 additions & 3 deletions b/‎doc/source/whatsnew/v3.0.0.rst‎
Lines changed: 11 additions & 3 deletions
diff --git a/‎pandas/_libs/lib.pyx‎
Lines changed: 2 additions & 2 deletions b/‎pandas/_libs/lib.pyx‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pandas/_libs/tslibs/conversion.pyx‎
Lines changed: 4 additions & 0 deletions b/‎pandas/_libs/tslibs/conversion.pyx‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎pandas/_libs/tslibs/fields.pyx‎
Lines changed: 5 additions & 5 deletions b/‎pandas/_libs/tslibs/fields.pyx‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎pandas/_libs/tslibs/offsets.pyx‎
Lines changed: 1 addition & 1 deletion b/‎pandas/_libs/tslibs/offsets.pyx‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pandas/_libs/tslibs/strptime.pyx‎
Lines changed: 4 additions & 0 deletions b/‎pandas/_libs/tslibs/strptime.pyx‎
Lines changed: 4 additions & 0 deletions
@@ -3,3 +3,4 @@
 - [ ] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit).
 - [ ] Added [type annotations](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#type-hints) to new arguments/methods/functions.
 - [ ] Added an entry in the latest `doc/source/whatsnew/vX.X.X.rst` file if fixing a bug or adding a new feature.
+- [ ] If I used AI to develop this pull request, I prompted it to follow `AGENTS.md`.
@@ -0,0 +1,48 @@
+# pandas Agent Instructions
+
+## Project Overview
+`pandas` is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
+
+## Purpose
+- Assist contributors by suggesting code changes, tests, and documentation edits for the pandas repository while preserving stability and compatibility.
+
+## Persona & Tone
+- Concise, neutral, code-focused. Prioritize correctness, readability, and tests.
+
+## Project Guidelines
+- Be sure to follow all guidelines for contributing to the codebase specified at https://pandas.pydata.org/docs/development/contributing_codebase.html
+- These guidelines are also available in the following local files, which should be loaded into context and adhered to
+    - doc/source/development/contributing_codebase.rst
+    - doc/source/development/contributing_docstring.rst
+    - doc/source/development/contributing_documentation.rst
+    - doc/source/development/contributing.rst
+
+## Decision heuristics
+- Favor small, backward-compatible changes with tests.
+- If a change would be breaking, propose it behind a deprecation path and document the rationale.
+- Prefer readability over micro-optimizations unless benchmarks are requested.
+- Add tests for behavioral changes; update docs only after code change is final.
+
+## Type hints guidance (summary)
+- Prefer PEP 484 style and types in pandas._typing when appropriate.
+- Avoid unnecessary use of typing.cast; prefer refactors that convey types to type-checkers.
+- Use builtin generics (list, dict) when possible.
+
+## Docstring guidance (summary)
+- Follow NumPy / numpydoc conventions used across the repo: short summary, extended summary, Parameters, Returns/Yields, See Also, Notes, Examples.
+- Ensure examples are deterministic, import numpy/pandas as documented, and pass doctest rules used by docs validation.
+- Preserve formatting rules: triple double-quotes, no blank line before/after docstring, parameter formatting ("name : type, default ..."), types and examples conventions.
+
+## Pull Requests (summary)
+- Pull request titles should be descriptive and include one of the following prefixes:
+    - ENH: Enhancement, new functionality
+    - BUG: Bug fix
+    - DOC: Additions/updates to documentation
+    - TST: Additions/updates to tests
+    - BLD: Updates to the build process/scripts
+    - PERF: Performance improvement
+    - TYP: Type annotations
+    - CLN: Code cleanup
+- Pull request descriptions should follow the template, and **succinctly** describe the change being made. Usually a few sentences is sufficient.
+- Pull requests which are resolving an existing Github Issue should include a link to the issue in the PR Description.
+- Do not add summaries or additional comments to individual commit messages. The single PR description is sufficient.
@@ -343,7 +343,7 @@ on_bad_lines : {{'error', 'warn', 'skip'}}, default 'error'
     Specifies what to do upon encountering a bad line (a line with too many fields).
     Allowed values are :
 
-    - 'error', raise an ParserError when a bad line is encountered.
+    - 'error', raise a ParserError when a bad line is encountered.
     - 'warn', print a warning when a bad line is encountered and skip that line.
     - 'skip', skip bad lines without raising or warning when they are encountered.
 
@@ -3717,6 +3717,7 @@ The look and feel of Excel worksheets created from pandas can be modified using
 
 * ``float_format`` : Format string for floating point numbers (default ``None``).
 * ``freeze_panes`` : A tuple of two integers representing the bottommost row and rightmost column to freeze. Each of these parameters is one-based, so (1, 1) will freeze the first row and first column (default ``None``).
+* ``autofilter`` : A boolean indicating whether to add automatic filters to all columns (default ``False``).
 
 .. note::
 
 
@@ -241,6 +241,19 @@ inferred frequency upon creation:
 
     pd.DatetimeIndex(["2018-01-01", "2018-01-03", "2018-01-05"], freq="infer")
 
+In most cases, parsing strings to datetimes (with any of :func:`to_datetime`, :class:`DatetimeIndex`, or :class:`Timestamp`) will produce objects with microsecond ("us") unit. The exception to this rule is if your strings have nanosecond precision, in which case the result will have "ns" unit:
+
+.. ipython:: python
+
+   pd.to_datetime(["2016-01-01 02:03:04"]).unit
+   pd.to_datetime(["2016-01-01 02:03:04.123"]).unit
+   pd.to_datetime(["2016-01-01 02:03:04.123456"]).unit
+   pd.to_datetime(["2016-01-01 02:03:04.123456789"]).unit
+
+.. versionchanged:: 3.0.0
+
+        Previously, :func:`to_datetime` and :class:`DatetimeIndex` would always parse strings to "ns" unit. During pandas 2.x, :class:`Timestamp` could give any of "s", "ms", "us", or "ns" depending on the specificity of the input string.
+
 .. _timeseries.converting.format:
 
 Providing a format argument
@@ -379,6 +392,16 @@ We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by
 
    (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")
 
+Another common way to perform this conversion is to convert directly to an integer dtype. Note that the exact integers this produces will depend on the specific unit
+or resolution of the datetime64 dtype:
+
+.. ipython:: python
+
+   stamps.astype(np.int64)
+   stamps.astype("datetime64[s]").astype(np.int64)
+   stamps.astype("datetime64[ms]").astype(np.int64)
+
+
 .. _timeseries.origin:
 
 Using the ``origin`` parameter
 
@@ -202,6 +202,7 @@ Other enhancements
 - :class:`Holiday` has gained the constructor argument and field ``exclude_dates`` to exclude specific datetimes from a custom holiday calendar (:issue:`54382`)
 - :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
 - :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
+- :func:`DataFrame.to_excel` has a new ``autofilter`` parameter to add automatic filters to all columns  (:issue:`61194`)
 - :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
 - :func:`to_numeric` on big integers converts to ``object`` datatype with python integers when not coercing. (:issue:`51295`)
 - :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
@@ -232,7 +233,6 @@ Other enhancements
 - Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
 - Support reading Stata 110-format (Stata 7) dta files (:issue:`47176`)
 - Switched wheel upload to **PyPI Trusted Publishing** (OIDC) for release-tag pushes in ``wheels.yml``. (:issue:`61718`)
--
 
 .. ---------------------------------------------------------------------------
 .. _whatsnew_300.notable_bug_fixes:
@@ -358,7 +358,7 @@ When passing strings, the resolution will depend on the precision of the string,
     In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
     Out[5]: dtype('<M8[ns]')
 
-The inferred resolution now matches that of the input strings:
+The inferred resolution now matches that of the input strings for nanosecond-precision strings, otherwise defaulting to microseconds:
 
 .. ipython:: python
 
@@ -367,13 +367,17 @@ The inferred resolution now matches that of the input strings:
     In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
     In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
 
+This is also a change for the :class:`Timestamp` constructor with a string input, which in version 2.x.y could give second or millisecond unit, which users generally disliked (:issue:`52653`)
+
 In cases with mixed-resolution inputs, the highest resolution is used:
 
 .. code-block:: ipython
 
     In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
     Out[2]: dtype('<M8[ns]')
 
+.. warning:: Many users will now get "M8[us]" dtype data in cases when they used to get "M8[ns]". For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.
+
 .. _whatsnew_300.api_breaking.concat_datetime_sorting:
 
 :func:`concat` no longer ignores ``sort`` when all objects have a :class:`DatetimeIndex`
@@ -1032,13 +1036,13 @@ Bug fixes
 Categorical
 ^^^^^^^^^^^
 - Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
+- Bug in :class:`pandas.Categorical` displaying string categories without quotes when using "string" dtype (:issue:`63045`)
 - Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
 - Bug in :func:`bdate_range` raising ``ValueError`` with frequency ``freq="cbh"`` (:issue:`62849`)
 - Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
 - Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
 - Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
 - Bug in :meth:`Series.convert_dtypes` with ``dtype_backend="pyarrow"`` where empty :class:`CategoricalDtype` :class:`Series` raised an error or got converted to ``null[pyarrow]`` (:issue:`59934`)
--
 
 Datetimelike
 ^^^^^^^^^^^^
@@ -1111,12 +1115,14 @@ Conversion
 - Bug in :meth:`DataFrame.astype` not casting ``values`` for Arrow-based dictionary dtype correctly (:issue:`58479`)
 - Bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
 - Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
+- Bug in :meth:`Series.convert_dtypes` and :meth:`DataFrame.convert_dtypes` raising ``TypeError`` when called on data with complex dtype (:issue:`60129`)
 - Bug in :meth:`Series.convert_dtypes` and :meth:`DataFrame.convert_dtypes` removing timezone information for objects with :class:`ArrowDtype` (:issue:`60237`)
 - Bug in :meth:`Series.reindex` not maintaining ``float32`` type when a ``reindex`` introduces a missing value (:issue:`45857`)
 - Bug in :meth:`to_datetime` and :meth:`to_timedelta` with input ``None`` returning ``None`` instead of ``NaT``, inconsistent with other conversion methods (:issue:`23055`)
 
 Strings
 ^^^^^^^
+- Bug in :meth:`Series.str.match` failing to raise when given a compiled ``re.Pattern`` object and conflicting ``case`` or ``flags`` arguments (:issue:`62240`)
 - Bug in :meth:`Series.str.replace` raising an error on valid group references (``\1``, ``\2``, etc.) on series converted to PyArrow backend dtype (:issue:`62653`)
 - Bug in :meth:`Series.str.zfill` raising ``AttributeError`` for :class:`ArrowDtype` (:issue:`61485`)
 - Bug in :meth:`Series.value_counts` would not respect ``sort=False`` for series having ``string`` dtype (:issue:`55224`)
@@ -1127,6 +1133,7 @@ Interval
 - :meth:`Index.is_monotonic_decreasing`, :meth:`Index.is_monotonic_increasing`, and :meth:`Index.is_unique` could incorrectly be ``False`` for an ``Index`` created from a slice of another ``Index``. (:issue:`57911`)
 - Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
 - Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
+- Bug in :func:`pandas.interval_range` incorrectly inferring ``int64`` dtype when ``np.float32`` and ``int`` are used for ``start`` and ``freq`` (:issue:`58964`)
 - Bug in :meth:`IntervalIndex.get_indexer` and :meth:`IntervalIndex.drop` when one of the sides of the index is non-unique (:issue:`52245`)
 - Construction of :class:`IntervalArray` and :class:`IntervalIndex` from arrays with mismatched signed/unsigned integer dtypes (e.g., ``int64`` and ``uint64``) now raises a :exc:`TypeError` instead of proceeding silently. (:issue:`55715`)
 
@@ -1299,6 +1306,7 @@ ExtensionArray
 - Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
 - Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
 - Bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`)
+- Bug in :meth:`NDArrayBackedExtensionArray.take` which produced arrays whose dtypes didn't match their underlying data, when called with integer arrays (:issue:`62448`)
 - Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
 - Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
 - Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
 
@@ -322,7 +322,7 @@ def item_from_zerodim(val: object) -> object:
     >>> item_from_zerodim(np.array([1]))
     array([1])
     """
-    if cnp.PyArray_IsZeroDim(val):
+    if cnp.PyArray_IsZeroDim(val) and cnp.PyArray_CheckExact(val):
         return cnp.PyArray_ToScalar(cnp.PyArray_DATA(val), val)
     return val
 
@@ -2593,7 +2593,7 @@ def maybe_convert_objects(ndarray[object] objects,
         Whether to convert numeric entries.
     convert_to_nullable_dtype : bool, default False
         If an array-like object contains only integer or boolean values (and NaN) is
-        encountered, whether to convert and return an Boolean/IntegerArray.
+        encountered, whether to convert and return a Boolean/IntegerArray.
     convert_non_numeric : bool, default False
         Whether to convert datetime, timedelta, period, interval types.
     dtype_if_all_nat : np.dtype, ExtensionDtype, or None, default None
 
@@ -623,6 +623,8 @@ cdef _TSObject convert_str_to_tsobject(str ts, tzinfo tz,
             )
             if not string_to_dts_failed:
                 reso = get_supported_reso(out_bestunit)
+                if reso < NPY_FR_us:
+                    reso = NPY_FR_us
                 check_dts_bounds(&dts, reso)
                 obj = _TSObject()
                 obj.dts = dts
@@ -661,6 +663,8 @@ cdef _TSObject convert_str_to_tsobject(str ts, tzinfo tz,
             nanos=&nanos,
         )
         reso = get_supported_reso(out_bestunit)
+        if reso < NPY_FR_us:
+            reso = NPY_FR_us
         return convert_datetime_to_tsobject(dt, tz, nanos=nanos, reso=reso)
 
 
 
@@ -146,7 +146,7 @@ def get_date_name_field(
     NPY_DATETIMEUNIT reso=NPY_FR_ns,
 ):
     """
-    Given a int64-based datetime index, return array of strings of date
+    Given an int64-based datetime index, return array of strings of date
     name based on requested field (e.g. day_name)
     """
     cdef:
@@ -335,7 +335,7 @@ def get_date_field(
     NPY_DATETIMEUNIT reso=NPY_FR_ns,
 ):
     """
-    Given a int64-based datetime index, extract the year, month, etc.,
+    Given an int64-based datetime index, extract the year, month, etc.,
     field and return an array of these values.
     """
     cdef:
@@ -502,7 +502,7 @@ def get_timedelta_field(
     NPY_DATETIMEUNIT reso=NPY_FR_ns,
 ):
     """
-    Given a int64-based timedelta index, extract the days, hrs, sec.,
+    Given an int64-based timedelta index, extract the days, hrs, sec.,
     field and return an array of these values.
     """
     cdef:
@@ -555,7 +555,7 @@ def get_timedelta_days(
     NPY_DATETIMEUNIT reso=NPY_FR_ns,
 ):
     """
-    Given a int64-based timedelta index, extract the days,
+    Given an int64-based timedelta index, extract the days,
     field and return an array of these values.
     """
     cdef:
@@ -592,7 +592,7 @@ cpdef isleapyear_arr(ndarray years):
 @cython.boundscheck(False)
 def build_isocalendar_sarray(const int64_t[:] dtindex, NPY_DATETIMEUNIT reso):
     """
-    Given a int64-based datetime array, return the ISO 8601 year, week, and day
+    Given an int64-based datetime array, return the ISO 8601 year, week, and day
     as a structured array.
     """
     cdef:
 
@@ -827,7 +827,7 @@ cdef class BaseOffset:
     @property
     def nanos(self):
         """
-        Returns a integer of the total number of nanoseconds for fixed frequencies.
+        Returns an integer of the total number of nanoseconds for fixed frequencies.
 
         Raises
         ------
 
@@ -466,6 +466,8 @@ def array_strptime(
                 # No error reported by string_to_dts, pick back up
                 # where we left off
                 item_reso = get_supported_reso(out_bestunit)
+                if item_reso < NPY_DATETIMEUNIT.NPY_FR_us:
+                    item_reso = NPY_DATETIMEUNIT.NPY_FR_us
                 state.update_creso(item_reso)
                 if infer_reso:
                     creso = state.creso
@@ -510,6 +512,8 @@ def array_strptime(
                 val, fmt, exact, format_regex, locale_time, &dts, &item_reso
             )
 
+            if item_reso < NPY_DATETIMEUNIT.NPY_FR_us:
+                item_reso = NPY_DATETIMEUNIT.NPY_FR_us
             state.update_creso(item_reso)
             if infer_reso:
                 creso = state.creso