Skip to content

Commit 7d66f64

Browse files
authored
Merge branch 'main' into replace-appender-in-stata
2 parents 42d066a + 10102e6 commit 7d66f64

File tree

89 files changed

+1730
-866
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+1730
-866
lines changed

.github/workflows/broken-linkcheck.yml

Lines changed: 0 additions & 39 deletions
This file was deleted.

.github/workflows/cache-cleanup-daily.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ on:
66

77
jobs:
88
cleanup:
9-
runs-on: ubuntu-latest
9+
runs-on: ubuntu-24.04
1010
if: github.repository_owner == 'pandas-dev'
1111
permissions:
1212
actions: write

.github/workflows/cache-cleanup.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ on:
66

77
jobs:
88
cleanup:
9-
runs-on: ubuntu-latest
9+
runs-on: ubuntu-24.04
10+
if: github.repository_owner == 'pandas-dev'
1011
steps:
1112
- name: Clean Cache
1213
run: |

.github/workflows/comment-commands.yml

Lines changed: 0 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,6 @@ permissions:
99
pull-requests: write
1010

1111
jobs:
12-
issue_assign:
13-
runs-on: ubuntu-24.04
14-
if: (!github.event.issue.pull_request) && github.event.comment.body == 'take'
15-
concurrency:
16-
group: ${{ github.actor }}-issue-assign
17-
steps:
18-
- run: |
19-
echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}"
20-
curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees
2112
preview_docs:
2213
runs-on: ubuntu-24.04
2314
if: github.event.issue.pull_request && github.event.comment.body == '/preview'
@@ -28,64 +19,3 @@ jobs:
2819
with:
2920
previewer-server: "https://pandas.pydata.org/preview"
3021
artifact-job: "Doc Build and Upload"
31-
asv_run:
32-
runs-on: ubuntu-24.04
33-
# TODO: Support more benchmarking options later, against different branches, against self, etc
34-
if: github.event.issue.pull_request && startsWith(github.event.comment.body, '@github-actions benchmark')
35-
defaults:
36-
run:
37-
shell: bash -el {0}
38-
env:
39-
ENV_FILE: environment.yml
40-
COMMENT: ${{github.event.comment.body}}
41-
42-
concurrency:
43-
# Set concurrency to prevent abuse(full runs are ~5.5 hours !!!)
44-
# each user can only run one concurrent benchmark bot at a time
45-
# We don't cancel in progress jobs, but if you want to benchmark multiple PRs, you're gonna have
46-
# to wait
47-
group: ${{ github.actor }}-asv
48-
cancel-in-progress: false
49-
50-
steps:
51-
- name: Checkout
52-
uses: actions/checkout@v5
53-
with:
54-
fetch-depth: 0
55-
56-
# Although asv sets up its own env, deps are still needed
57-
# during discovery process
58-
- name: Set up Conda
59-
uses: ./.github/actions/setup-conda
60-
61-
- name: Run benchmarks
62-
id: bench
63-
continue-on-error: true # asv will exit code 1 for regressions
64-
run: |
65-
# extracting the regex, see https://stackoverflow.com/a/36798723
66-
REGEX=$(echo "$COMMENT" | sed -n "s/^.*-b\s*\(\S*\).*$/\1/p")
67-
cd asv_bench
68-
asv check -E existing
69-
git remote add upstream https://github.com/pandas-dev/pandas.git
70-
git fetch upstream
71-
asv machine --yes
72-
asv continuous -f 1.1 -b $REGEX upstream/main HEAD
73-
echo 'BENCH_OUTPUT<<EOF' >> $GITHUB_ENV
74-
asv compare -f 1.1 upstream/main HEAD >> $GITHUB_ENV
75-
echo 'EOF' >> $GITHUB_ENV
76-
echo "REGEX=$REGEX" >> $GITHUB_ENV
77-
78-
- uses: actions/github-script@v8
79-
env:
80-
BENCH_OUTPUT: ${{env.BENCH_OUTPUT}}
81-
REGEX: ${{env.REGEX}}
82-
with:
83-
script: |
84-
const ENV_VARS = process.env
85-
const run_url = `https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`
86-
github.rest.issues.createComment({
87-
issue_number: context.issue.number,
88-
owner: context.repo.owner,
89-
repo: context.repo.repo,
90-
body: '\nBenchmarks completed. View runner logs here.' + run_url + '\nRegex used: '+ 'regex ' + ENV_VARS["REGEX"] + '\n' + ENV_VARS["BENCH_OUTPUT"]
91-
})

.github/workflows/unit-tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ jobs:
182182
strategy:
183183
matrix:
184184
# Note: Don't use macOS latest since macos 14 appears to be arm64 only
185-
os: [macos-13, macos-14, windows-latest]
185+
os: [macos-13, macos-14, windows-2025]
186186
env_file: [actions-311.yaml, actions-312.yaml, actions-313.yaml]
187187
fail-fast: false
188188
runs-on: ${{ matrix.os }}
@@ -322,7 +322,7 @@ jobs:
322322
fail-fast: false
323323
matrix:
324324
# Separate out macOS 13 and 14, since macOS 14 is arm64 only
325-
os: [ubuntu-24.04, macOS-13, macOS-14, windows-latest]
325+
os: [ubuntu-24.04, macOS-13, macOS-14, windows-2025]
326326

327327
timeout-minutes: 90
328328

.github/workflows/wheels.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ jobs:
229229
- build_sdist
230230
- build_wheels
231231

232-
runs-on: ubuntu-latest
232+
runs-on: ubuntu-24.04
233233

234234
environment:
235235
name: pypi
@@ -243,6 +243,8 @@ jobs:
243243
with:
244244
path: dist # everything lands in ./dist/**
245245

246+
# TODO: This step can be probably be achieved by actions/download-artifact@v5
247+
# by specifying merge-multiple: true, and a glob pattern
246248
- name: Collect files
247249
run: |
248250
mkdir -p upload

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.12.11
22+
rev: v0.13.3
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -46,7 +46,7 @@ repos:
4646
- id: codespell
4747
types_or: [python, rst, markdown, cython, c]
4848
- repo: https://github.com/MarcoGorelli/cython-lint
49-
rev: v0.16.7
49+
rev: v0.17.0
5050
hooks:
5151
- id: cython-lint
5252
- id: double-quote-cython-strings
@@ -67,7 +67,7 @@ repos:
6767
- id: trailing-whitespace
6868
args: [--markdown-linebreak-ext=md]
6969
- repo: https://github.com/PyCQA/isort
70-
rev: 6.0.1
70+
rev: 6.1.0
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
@@ -92,14 +92,14 @@ repos:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]
9494
- repo: https://github.com/pre-commit/mirrors-clang-format
95-
rev: v21.1.0
95+
rev: v21.1.2
9696
hooks:
9797
- id: clang-format
9898
files: ^pandas/_libs/src|^pandas/_libs/include
9999
args: [-i]
100100
types_or: [c, c++]
101101
- repo: https://github.com/trim21/pre-commit-mirror-meson
102-
rev: v1.9.0
102+
rev: v1.9.1
103103
hooks:
104104
- id: meson-fmt
105105
args: ['--inplace']

doc/source/development/contributing.rst

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,16 +36,14 @@ and `good first issue
3636
<https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+sort%3Aupdated-desc+label%3A%22good+first+issue%22+no%3Aassignee>`_
3737
are typically good for newer contributors.
3838

39-
Once you've found an interesting issue, it's a good idea to assign the issue to yourself,
40-
so nobody else duplicates the work on it. On the Github issue, a comment with the exact
41-
text ``take`` to automatically assign you the issue
42-
(this will take seconds and may require refreshing the page to see it).
39+
Once you've found an interesting issue, leave a comment with your intention
40+
to start working on it. If somebody else has
41+
already commented on issue but they have shown a lack of activity in the issue
42+
or a pull request in the past 2-3 weeks, you may take it over.
4343

4444
If for whatever reason you are not able to continue working with the issue, please
45-
unassign it, so other people know it's available again. You can check the list of
46-
assigned issues, since people may not be working in them anymore. If you want to work on one
47-
that is assigned, feel free to kindly ask the current assignee if you can take it
48-
(please allow at least a week of inactivity before considering work in the issue discontinued).
45+
leave a comment on an issue, so other people know it's available again. You can check the list of
46+
assigned issues, since people may not be working in them anymore.
4947

5048
We have several :ref:`contributor community <community>` communication channels, which you are
5149
welcome to join, and ask questions as you figure things out. Among them are regular meetings for

doc/source/getting_started/comparison/comparison_with_sql.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,42 @@ column with another DataFrame's index.
270270
indexed_df2 = df2.set_index("key")
271271
pd.merge(df1, indexed_df2, left_on="key", right_index=True)
272272
273+
:meth:`~pandas.merge` also supports joining on multiple columns by passing a list of column names.
274+
275+
.. code-block:: sql
276+
277+
SELECT *
278+
FROM df1_multi
279+
INNER JOIN df2_multi
280+
ON df1_multi.key1 = df2_multi.key1
281+
AND df1_multi.key2 = df2_multi.key2;
282+
283+
.. ipython:: python
284+
285+
df1_multi = pd.DataFrame({
286+
"key1": ["A", "B", "C", "D"],
287+
"key2": [1, 2, 3, 4],
288+
"value": np.random.randn(4)
289+
})
290+
df2_multi = pd.DataFrame({
291+
"key1": ["B", "D", "D", "E"],
292+
"key2": [2, 4, 4, 5],
293+
"value": np.random.randn(4)
294+
})
295+
pd.merge(df1_multi, df2_multi, on=["key1", "key2"])
296+
297+
If the columns have different names between DataFrames, on can be replaced with left_on and
298+
right_on.
299+
300+
.. ipython:: python
301+
302+
df2_multi = pd.DataFrame({
303+
"key_1": ["B", "D", "D", "E"],
304+
"key_2": [2, 4, 4, 5],
305+
"value": np.random.randn(4)
306+
})
307+
pd.merge(df1_multi, df2_multi, left_on=["key1", "key2"], right_on=["key_1", "key_2"])
308+
273309
LEFT OUTER JOIN
274310
~~~~~~~~~~~~~~~
275311

doc/source/whatsnew/v3.0.0.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -948,6 +948,7 @@ Datetimelike
948948
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
949949
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
950950
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
951+
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
951952
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
952953
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
953954
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
@@ -972,6 +973,7 @@ Datetimelike
972973
- Bug in constructing arrays with a timezone-aware :class:`ArrowDtype` from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like :class:`DatetimeTZDtype` (:issue:`61775`)
973974
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
974975

976+
975977
Timedelta
976978
^^^^^^^^^
977979
- Accuracy improvement in :meth:`Timedelta.to_pytimedelta` to round microseconds consistently for large nanosecond based Timedelta (:issue:`57841`)
@@ -1008,8 +1010,8 @@ Conversion
10081010

10091011
Strings
10101012
^^^^^^^
1013+
- Bug in :meth:`Series.str.zfill` raising ``AttributeError`` for :class:`ArrowDtype` (:issue:`61485`)
10111014
- Bug in :meth:`Series.value_counts` would not respect ``sort=False`` for series having ``string`` dtype (:issue:`55224`)
1012-
-
10131015

10141016
Interval
10151017
^^^^^^^^
@@ -1079,6 +1081,8 @@ I/O
10791081
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
10801082
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
10811083
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
1084+
- Bug in :meth:`read_csv` with ``engine="c"`` reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
1085+
- Bug in :meth:`read_csv` with ``engine="c"`` reading large float numbers with preceding integers as strings. Now reads them as floats. (:issue:`51295`)
10821086
- Bug in :meth:`read_csv` with ``engine="pyarrow"`` and ``dtype="Int64"`` losing precision (:issue:`56136`)
10831087
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)
10841088
- Bug in :meth:`read_html` where ``rowspan`` in header row causes incorrect conversion to ``DataFrame``. (:issue:`60210`)

0 commit comments

Comments
 (0)