-
Notifications
You must be signed in to change notification settings - Fork 229
clib.converison._to_numpy: Add tests for pandas.Series with pandas numeric dtypes #3584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 18 commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
127d8ca
clib.converison._to_numpy: Add tests for numpy numeric dtypes
seisman 6ae1ddb
Add tests for pandas.Series with NumPy numeric dtypes
seisman a9635b5
Improve docstrings for tests
seisman 6f966db
Add tests for Python built-in types
seisman 9fd655b
Merge branch 'main' into to_numpy/numpy_numeric
seisman f9bf19c
Refactor a few tests
seisman 8bc2f56
Check the expected dtype
seisman 42a0951
Define params list
seisman 0d102a2
Merge branch 'main' into to_numpy/numpy_numeric
seisman 933bc62
Input array now is not C-contiguous
seisman 4edfef0
Add test for Python built-in complex dtype
seisman 7222db2
Add tests for panda.Series with pandas numeric dtypes
seisman a4f15cd
Add workarounds for pandas nullable dtypes prior pandas v2.1
seisman b0d7903
Remove duplicated test test_vectors_to_arrays_pandas_nan
seisman eedb5b6
Merge branch 'main' into to_numpy/pandas_numeric
seisman 8ca200b
Use data.to_numpy(na_value=np.nan).astype(dtype=dtype) for float16
seisman c5ed329
Merge branch 'main' into to_numpy/pandas_numeric
seisman 3cc596a
Further improve the workaround for pandas dtypes
seisman f24f62f
Merge branch 'main' into to_numpy/pandas_numeric
seisman f3aa7b9
Merge branch 'main' into to_numpy/pandas_numeric
seisman a65f6ae
clib.conversion._to_numpy: Add tests for pandas.Series with pyarrow n…
seisman 6b0e8d0
Fix cases
seisman 5aa0946
Merge all pandas-related tests into a single test
seisman f182efd
Shorten a few test names
seisman 0615b5b
Test pandas.Series with numpy dtypes
seisman 4061ec7
Refactor the workaround for pandas<2.2
seisman cdf7c38
Merge branch 'main' into to_numpy/pandas_numeric
seisman 0e0438b
Fix the workaround for pandas<2.2
seisman 8883f3c
Another fix for the workaround
seisman afeaa38
dtype defaults to an empty string, rather than None
seisman d3f3e5d
Shortern more test names
seisman 27a204f
Merge branch 'main' into to_numpy/pandas_numeric
seisman 80fb7a2
Revert "Shortern more test names"
seisman 58fc74e
Merge branch 'main' into to_numpy/pandas_numeric
seisman 20c447b
Revert "Shorten a few test names"
seisman f3d22d4
Merge branch 'main' into to_numpy/pandas_numeric
seisman bb1cb07
Fix test name
seisman 949274d
Merge branch 'main' into to_numpy/pandas_numeric
seisman b320d0b
Merge branch 'main' into to_numpy/pandas_numeric
michaelgrund 28fa954
Separate variable 'dtype' and 'numpy_dtype' for the input and result …
seisman 456902f
Merge branch 'main' into to_numpy/pandas_numeric
seisman 388aee9
Merge branch 'main' into to_numpy/pandas_numeric
seisman 18ed9e5
Merge branch 'main' into to_numpy/pandas_numeric
seisman ff1c6f5
Merge branch 'main' into to_numpy/pandas_numeric
seisman 6ff55e9
Merge branch 'main' into to_numpy/pandas_numeric
seisman 2fbf310
Merge branch 'main' into to_numpy/pandas_numeric
seisman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,19 +162,37 @@ def _to_numpy(data: Any) -> np.ndarray: | |
"date64[ms][pyarrow]": np.datetime64, | ||
} | ||
|
||
if ( | ||
hasattr(data, "isna") | ||
and data.isna().any() | ||
and Version(pd.__version__) < Version("2.2") | ||
): | ||
# Workaround for dealing with pd.NA with pandas < 2.2. | ||
# Bug report at: https://github.com/GenericMappingTools/pygmt/issues/2844 | ||
# Following SPEC0, pandas 2.1 will be dropped in 2025 Q3, so it's likely | ||
# we can remove the workaround in PyGMT v0.17.0. | ||
array = np.ascontiguousarray(data.astype(float)) | ||
else: | ||
vec_dtype = str(getattr(data, "dtype", "")) | ||
array = np.ascontiguousarray(data, dtype=dtypes.get(vec_dtype)) | ||
# pandas numeric dtypes were converted to np.object_ dtype prior pandas 2.2, and are | ||
# converted to suitable numpy dtypes since pandas 2.2. Refer to the following link | ||
# for details: https://pandas.pydata.org/docs/whatsnew/v2.2.0.html#to-numpy-for-numpy-nullable-and-arrow-types-converts-to-suitable-numpy-dtype | ||
# Here are the workarounds for pandas < 2.2. | ||
# Following SPEC 0, pandas 2.1 should be dropped in 2025 Q3, so it's likely we can | ||
# remove the workaround in PyGMT v0.17.0. | ||
if Version(pd.__version__) < Version("2.2"): | ||
# Specify mapping from pandas nullable dtypes to suitable numpy dtypes | ||
seisman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
dtypes.update( | ||
{ | ||
"Int8": np.int8, | ||
"Int16": np.int16, | ||
"Int32": np.int32, | ||
"Int64": np.int64, | ||
"UInt8": np.uint8, | ||
"UInt16": np.uint16, | ||
"UInt32": np.uint32, | ||
"UInt64": np.uint64, | ||
"Float32": np.float32, | ||
"Float64": np.float64, | ||
} | ||
) | ||
# For pandas.Index/pandas.Series, pandas/pyarrow integer dtypes with missing | ||
seisman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# values should be cast to NumPy float dtypes and NaN is used as missing value | ||
# indicator. | ||
if getattr(data, "hasnans", False): # pandas.Index/pandas.Series has 'hasnans' | ||
dtype = np.float64 if data.dtype.kind in "iu" else data.dtype.numpy_dtype | ||
seisman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
data = data.to_numpy(na_value=np.nan).astype(dtype=dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When dealing with missing values with pandas 2.1, we have some choices to convert pandas array into numpy array.
As shown below, it turns out only the 3rd way works for all cases. In [1]: import pandas as pd
In [2]: x = pd.Series([1, 2, pd.NA], dtype=pd.Int32Dtype())
In [3]: import numpy as np
In [4]: np.ascontiguousarray(x, dtype=np.float64)
...
ValueError: cannot convert to 'float64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
In [5]: x.to_numpy(dtype=np.float64, na_value=np.nan)
Out[5]: array([ 1., 2., nan])
In [6]: x.to_numpy(na_value=np.nan).astype(np.float64)
Out[6]: array([ 1., 2., nan])
In [7]: x = pd.Series([1, 2, pd.NA], dtype="float16[pyarrow]")
In [8]: x
Out[8]:
0 1.0
1 2.0
2 <NA>
dtype: halffloat[pyarrow]
In [9]: x.to_numpy(dtype=np.float64, na_value=np.nan)
...
ArrowTypeError: Expected np.float16 instance
In [10]: x.to_numpy(na_value=np.nan).astype(np.float64)
Out[10]: array([ 1., 2., nan]) |
||
|
||
vec_dtype = str(getattr(data, "dtype", "")) | ||
array = np.ascontiguousarray(data, dtype=dtypes.get(vec_dtype)) | ||
return array | ||
|
||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.