Skip to content

Commit 69c35ee

Browse files
committed
Merge remote-tracking branch 'upstream/main' into perf/read-csv
2 parents 46c9883 + 531c0e3 commit 69c35ee

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+3223
-1915
lines changed

.github/workflows/codeql.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ jobs:
2828

2929
steps:
3030
- uses: actions/checkout@v5
31-
- uses: github/codeql-action/init@v3
31+
- uses: github/codeql-action/init@v4
3232
with:
3333
languages: ${{ matrix.language }}
34-
- uses: github/codeql-action/autobuild@v3
35-
- uses: github/codeql-action/analyze@v3
34+
- uses: github/codeql-action/autobuild@v4
35+
- uses: github/codeql-action/analyze@v4

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
163163

164164
- name: Build wheels
165-
uses: pypa/[email protected].0
165+
uses: pypa/[email protected].1
166166
with:
167167
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
168168
env:

doc/source/whatsnew/v3.0.0.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -655,6 +655,7 @@ Other API changes
655655
an empty ``RangeIndex`` or empty ``Index`` with object dtype when determining
656656
the dtype of the resulting Index (:issue:`60797`)
657657
- :class:`IncompatibleFrequency` now subclasses ``TypeError`` instead of ``ValueError``. As a result, joins with mismatched frequencies now cast to object like other non-comparable joins, and arithmetic with indexes with mismatched frequencies align (:issue:`55782`)
658+
- :class:`Series` "flex" methods like :meth:`Series.add` no longer allow passing a :class:`DataFrame` for ``other``; use the DataFrame reversed method instead (:issue:`46179`)
658659
- :meth:`CategoricalIndex.append` no longer attempts to cast different-dtype indexes to the caller's dtype (:issue:`41626`)
659660
- :meth:`ExtensionDtype.construct_array_type` is now a regular method instead of a ``classmethod`` (:issue:`58663`)
660661
- Comparison operations between :class:`Index` and :class:`Series` now consistently return :class:`Series` regardless of which object is on the left or right (:issue:`36759`)
@@ -716,6 +717,7 @@ Other Deprecations
716717
- Deprecated using ``epoch`` date format in :meth:`DataFrame.to_json` and :meth:`Series.to_json`, use ``iso`` instead. (:issue:`57063`)
717718
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.unstack` and :meth:`DataFrame.unstack` (:issue:`12189`, :issue:`53868`)
718719
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
720+
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
719721
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
720722

721723
.. ---------------------------------------------------------------------------
@@ -873,6 +875,7 @@ Other Removals
873875
- Removed the ``method`` keyword in ``ExtensionArray.fillna``, implement ``ExtensionArray._pad_or_backfill`` instead (:issue:`53621`)
874876
- Removed the attribute ``dtypes`` from :class:`.DataFrameGroupBy` (:issue:`51997`)
875877
- Enforced deprecation of ``argmin``, ``argmax``, ``idxmin``, and ``idxmax`` returning a result when ``skipna=False`` and an NA value is encountered or all values are NA values; these operations will now raise in such cases (:issue:`33941`, :issue:`51276`)
878+
- Enforced deprecation of storage option "pyarrow_numpy" for :class:`StringDtype` (:issue:`60152`)
876879
- Removed specifying ``include_groups=True`` in :class:`.DataFrameGroupBy.apply` and :class:`.Resampler.apply` (:issue:`7155`)
877880

878881
.. ---------------------------------------------------------------------------
@@ -1012,12 +1015,13 @@ Strings
10121015
^^^^^^^
10131016
- Bug in :meth:`Series.str.zfill` raising ``AttributeError`` for :class:`ArrowDtype` (:issue:`61485`)
10141017
- Bug in :meth:`Series.value_counts` would not respect ``sort=False`` for series having ``string`` dtype (:issue:`55224`)
1018+
- Bug in multiplication with a :class:`StringDtype` incorrectly allowing multiplying by bools; explicitly cast to integers instead (:issue:`62595`)
10151019

10161020
Interval
10171021
^^^^^^^^
10181022
- :meth:`Index.is_monotonic_decreasing`, :meth:`Index.is_monotonic_increasing`, and :meth:`Index.is_unique` could incorrectly be ``False`` for an ``Index`` created from a slice of another ``Index``. (:issue:`57911`)
1023+
- Bug in :class:`Index`, :class:`Series`, :class:`DataFrame` constructors when given a sequence of :class:`Interval` subclass objects casting them to :class:`Interval` (:issue:`46945`)
10191024
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
1020-
-
10211025

10221026
Indexing
10231027
^^^^^^^^

pandas/_libs/include/pandas/parser/pd_parser.h

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,8 @@ typedef struct {
3737
int (*parser_trim_buffers)(parser_t *);
3838
int (*tokenize_all_rows)(parser_t *, const char *);
3939
int (*tokenize_nrows)(parser_t *, size_t, const char *);
40-
int64_t (*str_to_int64)(const char *, int64_t, int64_t, TokenizerError *,
41-
char);
42-
uint64_t (*str_to_uint64)(uint_state *, const char *, int64_t, uint64_t,
43-
TokenizerError *, char);
40+
int64_t (*str_to_int64)(const char *, int *, char);
41+
uint64_t (*str_to_uint64)(uint_state *, const char *, int *, char);
4442
double (*xstrtod)(const char *, char **, char, char, char, int, int *, int *);
4543
double (*precise_xstrtod)(const char *, char **, char, char, char, int, int *,
4644
int *);
@@ -88,12 +86,10 @@ static PandasParser_CAPI *PandasParserAPI = NULL;
8886
PandasParserAPI->tokenize_all_rows((self), (encoding_errors))
8987
#define tokenize_nrows(self, nrows, encoding_errors) \
9088
PandasParserAPI->tokenize_nrows((self), (nrows), (encoding_errors))
91-
#define str_to_int64(p_item, int_min, int_max, error, t_sep) \
92-
PandasParserAPI->str_to_int64((p_item), (int_min), (int_max), (error), \
93-
(t_sep))
94-
#define str_to_uint64(state, p_item, int_max, uint_max, error, t_sep) \
95-
PandasParserAPI->str_to_uint64((state), (p_item), (int_max), (uint_max), \
96-
(error), (t_sep))
89+
#define str_to_int64(p_item, error, t_sep) \
90+
PandasParserAPI->str_to_int64((p_item), (error), (t_sep))
91+
#define str_to_uint64(state, p_item, error, t_sep) \
92+
PandasParserAPI->str_to_uint64((state), (p_item), (error), (t_sep))
9793
#define xstrtod(p, q, decimal, sci, tsep, skip_trailing, error, maybe_int) \
9894
PandasParserAPI->xstrtod((p), (q), (decimal), (sci), (tsep), \
9995
(skip_trailing), (error), (maybe_int))

pandas/_libs/include/pandas/parser/tokenizer.h

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@ See LICENSE for the license
1414
#define PY_SSIZE_T_CLEAN
1515
#include <Python.h>
1616

17+
#define ERROR_NO_DIGITS 1
18+
#define ERROR_OVERFLOW 2
19+
#define ERROR_INVALID_CHARS 3
20+
1721
#include <stdint.h>
1822

1923
#define STREAM_INIT_SIZE 32
@@ -46,13 +50,6 @@ See LICENSE for the license
4650
* duplication of some file I/O.
4751
*/
4852

49-
typedef enum {
50-
TOKENIZER_OK,
51-
ERROR_NO_DIGITS,
52-
ERROR_OVERFLOW,
53-
ERROR_INVALID_CHARS,
54-
} TokenizerError;
55-
5653
typedef enum {
5754
START_RECORD,
5855
START_FIELD,
@@ -211,10 +208,9 @@ void uint_state_init(uint_state *self);
211208

212209
int uint64_conflict(uint_state *self);
213210

214-
uint64_t str_to_uint64(uint_state *state, const char *p_item, int64_t int_max,
215-
uint64_t uint_max, TokenizerError *error, char tsep);
216-
int64_t str_to_int64(const char *p_item, int64_t int_min, int64_t int_max,
217-
TokenizerError *error, char tsep);
211+
uint64_t str_to_uint64(uint_state *state, const char *p_item, int *error,
212+
char tsep);
213+
int64_t str_to_int64(const char *p_item, int *error, char tsep);
218214
double xstrtod(const char *p, char **q, char decimal, char sci, char tsep,
219215
int skip_trailing, int *error, int *maybe_int);
220216
double precise_xstrtod(const char *p, char **q, char decimal, char sci,

pandas/_libs/lib.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2255,7 +2255,8 @@ cpdef bint is_interval_array(ndarray values):
22552255
for i in range(n):
22562256
val = values[i]
22572257

2258-
if isinstance(val, Interval):
2258+
if type(val) is Interval:
2259+
# GH#46945 catch Interval exactly, excluding subclasses
22592260
if closed is None:
22602261
closed = val.closed
22612262
numeric = (

pandas/_libs/parsers.pyx

Lines changed: 17 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -63,11 +63,6 @@ from numpy cimport (
6363
cnp.import_array()
6464

6565
from pandas._libs cimport util
66-
from pandas._libs.util cimport (
67-
INT64_MAX,
68-
INT64_MIN,
69-
UINT64_MAX,
70-
)
7166

7267
from pandas._libs import lib
7368

@@ -149,10 +144,7 @@ cdef extern from "pandas/parser/tokenizer.h":
149144
SKIP_LINE
150145
FINISHED
151146

152-
ctypedef enum TokenizerError:
153-
TOKENIZER_OK,
154-
ERROR_OVERFLOW,
155-
ERROR_INVALID_CHARS
147+
enum: ERROR_OVERFLOW, ERROR_INVALID_CHARS
156148

157149
ctypedef enum BadLineHandleMethod:
158150
ERROR,
@@ -284,10 +276,8 @@ cdef extern from "pandas/parser/pd_parser.h":
284276
int tokenize_all_rows(parser_t *self, const char *encoding_errors) nogil
285277
int tokenize_nrows(parser_t *self, size_t nrows, const char *encoding_errors) nogil
286278

287-
int64_t str_to_int64(char *p_item, int64_t int_min,
288-
int64_t int_max, TokenizerError *error, char tsep) nogil
289-
uint64_t str_to_uint64(uint_state *state, char *p_item, int64_t int_max,
290-
uint64_t uint_max, TokenizerError *error, char tsep) nogil
279+
int64_t str_to_int64(char *p_item, int *error, char tsep) nogil
280+
uint64_t str_to_uint64(uint_state *state, char *p_item, int *error, char tsep) nogil
291281

292282
double xstrtod(const char *p, char **q, char decimal,
293283
char sci, char tsep, int skip_trailing,
@@ -1797,7 +1787,7 @@ cdef int _try_uint64_nogil(parser_t *parser, int64_t col,
17971787
const kh_str_starts_t *na_hashset,
17981788
uint64_t *data, uint_state *state) nogil:
17991789
cdef:
1800-
TokenizerError error = TOKENIZER_OK
1790+
int error = 0
18011791
Py_ssize_t i, lines = line_end - line_start
18021792
coliter_t it
18031793
const char *word = NULL
@@ -1813,15 +1803,13 @@ cdef int _try_uint64_nogil(parser_t *parser, int64_t col,
18131803
data[i] = 0
18141804
continue
18151805

1816-
data[i] = str_to_uint64(state, word, INT64_MAX, UINT64_MAX,
1817-
&error, parser.thousands)
1806+
data[i] = str_to_uint64(state, word, &error, parser.thousands)
18181807
if error != 0:
18191808
return error
18201809
else:
18211810
for i in range(lines):
18221811
COLITER_NEXT(it, word)
1823-
data[i] = str_to_uint64(state, word, INT64_MAX, UINT64_MAX,
1824-
&error, parser.thousands)
1812+
data[i] = str_to_uint64(state, word, &error, parser.thousands)
18251813
if error != 0:
18261814
return error
18271815

@@ -1832,7 +1820,7 @@ cdef _try_int64(parser_t *parser, int64_t col,
18321820
int64_t line_start, int64_t line_end,
18331821
bint na_filter, kh_str_starts_t *na_hashset, bint raise_on_float):
18341822
cdef:
1835-
TokenizerError error = TOKENIZER_OK
1823+
int error = 0
18361824
int na_count = 0
18371825
Py_ssize_t lines
18381826
coliter_t it
@@ -1859,13 +1847,13 @@ cdef _try_int64(parser_t *parser, int64_t col,
18591847
return result, na_count
18601848

18611849

1862-
cdef TokenizerError _try_int64_nogil(parser_t *parser, int64_t col,
1863-
int64_t line_start,
1864-
int64_t line_end, bint na_filter,
1865-
const kh_str_starts_t *na_hashset, int64_t NA,
1866-
int64_t *data, int *na_count) nogil:
1850+
cdef int _try_int64_nogil(parser_t *parser, int64_t col,
1851+
int64_t line_start,
1852+
int64_t line_end, bint na_filter,
1853+
const kh_str_starts_t *na_hashset, int64_t NA,
1854+
int64_t *data, int *na_count) nogil:
18671855
cdef:
1868-
TokenizerError error = TOKENIZER_OK
1856+
int error = 0
18691857
Py_ssize_t i, lines = line_end - line_start
18701858
coliter_t it
18711859
const char *word = NULL
@@ -1882,16 +1870,14 @@ cdef TokenizerError _try_int64_nogil(parser_t *parser, int64_t col,
18821870
data[i] = NA
18831871
continue
18841872

1885-
data[i] = str_to_int64(word, INT64_MIN, INT64_MAX,
1886-
&error, parser.thousands)
1887-
if error != TOKENIZER_OK:
1873+
data[i] = str_to_int64(word, &error, parser.thousands)
1874+
if error != 0:
18881875
return error
18891876
else:
18901877
for i in range(lines):
18911878
COLITER_NEXT(it, word)
1892-
data[i] = str_to_int64(word, INT64_MIN, INT64_MAX,
1893-
&error, parser.thousands)
1894-
if error != TOKENIZER_OK:
1879+
data[i] = str_to_int64(word, &error, parser.thousands)
1880+
if error != 0:
18951881
return error
18961882

18971883
return error

0 commit comments

Comments
 (0)