Skip to content

update testing.integrate_kernel type hints#5137

Merged
neutrinoceros merged 3 commits intoyt-project:mainfrom
chrishavlin:fix_integrate_kernel_typing
Apr 4, 2025
Merged

update testing.integrate_kernel type hints#5137
neutrinoceros merged 3 commits intoyt-project:mainfrom
chrishavlin:fix_integrate_kernel_typing

Conversation

@chrishavlin
Copy link
Copy Markdown
Contributor

Close #5136

Copy link
Copy Markdown
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this. However, note that np.ndarray is not a valid type annotation, we should use NDArray[np.floating] instead (from numpy.typing import NDArray).

Furthermore, as numpy's type annotations get more and more refined, it is becoming apparent that mypy is not the correct typechecker we should be using. Numpy devs recommend basedpyright over it. I might start a migration later, but I just wanted to raise awareness for now. See https://github.com/numpy/numpy/releases/tag/v2.2.1

@neutrinoceros neutrinoceros added this to the 4.4.1 milestone Mar 20, 2025
@chrishavlin
Copy link
Copy Markdown
Contributor Author

I swear I knew that at one point :) but in this case I actually just copied the type hinting in the previous function without thinking. so i also added a commit here to switch those other occurrences of np.ndarray in type hints in this file.

@neutrinoceros
Copy link
Copy Markdown
Member

Looks like you accidentally committed an unrelated (and massive cpp file)

yt/testing.py Outdated
# tested: volume integral is 1.
def cubicspline_python(
x: float | np.ndarray,
x: float | NDArray[np.floating],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the return annotation should be changed as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ya, thought I had put that in. maybe i forgot to commit it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget again ? 😅

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or did you just not push your changes yet?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, sorry, didn't push any changes yet, only fixed the stray .cpp

yt/testing.py Outdated
kernelfunc: Callable[[float | NDArray[np.floating]], float | NDArray[np.floating]],
b: float,
hsml: float,
) -> float:
Copy link
Copy Markdown
Member

@neutrinoceros neutrinoceros Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the return value is actually np.float64, but it'd be safer to resolve this disparity by actually returning a float. Can you update the return statement ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. so the typing here doesn't seem to be consistent with how integrate_kernel is used elsewhere. here's a test that uses arrays for b and hsml:

start1 = np.array((1.53, 0.53, 1.0))
end1 = np.array((1.53, 0.53, 3.0))
ray1 = ds.ray(start1, end1)
b1 = np.array([np.sqrt(2.0) * 0.03] * 2)
hsml1 = np.array([0.05] * 2)
len1 = np.sqrt(np.sum((end1 - start1) ** 2))
# for a ParticleDataset like this one, the Ray object attempts
# to generate the 't' and 'dts' fields using the grid method
ray1.field_data["t"] = ray1.ds.arr(ray1._generate_container_field_sph("t"))
ray1.field_data["dts"] = ray1.ds.arr(ray1._generate_container_field_sph("dts"))
# not demanding too much precision;
# from kernel volume integrals, the linear interpolation
# restricts you to 4 -- 5 digits precision
assert_equal(ray1["t"].shape, (2,))
assert_rel_equal(ray1["t"], np.array([0.25, 0.75]), 5)
assert_rel_equal(
ray1["gas", "position"].v, np.array([[1.5, 0.5, 1.5], [1.5, 0.5, 2.5]]), 5
)
dl1 = integrate_kernel(kernelfunc, b1, hsml1)

not certain if the typing for this function needs to updated so that b and hsml are float | NDArray or if that test there is not quite right. @nastasha-w I believe you wrote that test, any thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, the annotations here are a bit of a mess. I basically just edited until the type checker stopped throwing errors; I think I tried adding that these were float arrays at some point, but I didn't get that to work. Feel free to change the types here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I am already editing that function for #5121 , so this will be a fun rebase for me later. Current status:

def integrate_kernel(
    kernelfunc: Callable[[float | np.ndarray], np.ndarray],
    b: float | np.ndarray,
    hsml: float | np.ndarray,
    nsample: int = 500,
) -> float:

although I'm now realizing that's wrong, and I actually count on it to return arrays in some cases in a different function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, the annotations here are a bit of a mess. I basically just edited until the type checker stopped throwing errors;

No worries! I'm forever learning about proper python type checking myself...

and I actually count on it to return arrays in some cases in a different function.

So sounds like having b, hsml and the return value be float | NDArray is consistent with how you're using it. I'll go with that here. also, FYI in case you didn't read through all the discussion in this PR , this comment on np.ndarray vs np.typing.NDArray in type hints might be helpful for your changes.

pos3_i1: NDArray[np.floating],
periodic: tuple[bool, bool, bool] = (True,) * 3,
periods: np.ndarray = _zeroperiods,
) -> np.ndarray:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually here the output dtype is bound to be the same as pos3_i3, so, instead of np.floating, we should use _FloatingT = TypeVar("_FloatingT", bound=np.floating) here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i think i understand this -- using TypeVar("_FloatingT", bound=np.floating) would ensure the input/output precisions match? in that case though, should I use npt.NBitBase for this (https://numpy.org/doc/stable/reference/typing.html#numpy.typing.NBitBase) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind about the NBitBase. i'm learning. np.floating is the way i think.

@chrishavlin chrishavlin force-pushed the fix_integrate_kernel_typing branch from baf44ca to ca9cda0 Compare March 21, 2025 15:27
@chrishavlin
Copy link
Copy Markdown
Contributor Author

the stray .cpp is gone now.

@nastasha-w
Copy link
Copy Markdown
Contributor

Sorry, I seem to have made a mess of the type annotations in those test functions. In case it helps, I think these are all the files where I use the problem function in testing.py:

  • yt/visualization/tests/test_offaxisprojection_pytestonly.py
  • yt/geometry/coordinates/tests/test_sph_pixelization.py
  • yt/geometry/coordinates/tests/test_sph_pixelization_pytestonly.py

@chrishavlin
Copy link
Copy Markdown
Contributor Author

@neutrinoceros just fyi, this is ready for you to look at when you get a chance (no worries if you can't get to it immediately, just wanted to make sure you weren't waiting on me).

@nastasha-w
Copy link
Copy Markdown
Contributor

Welp, I'm learning some new type annotation options here! Would it make sense to also use the _FloatingT type for the kernel integrator? That one also doesn't make assumptions on the exact float precision of the inputs.

Copy link
Copy Markdown
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there. Sorry for the delay !

yt/testing.py Outdated
],
b: float | npt.NDArray[np.floating],
hsml: float | npt.NDArray[np.floating],
) -> float | npt.NDArray[np.floating]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's often much preferable to be strict on the return type. Here I don't see any reason not to be

Suggested change
) -> float | npt.NDArray[np.floating]:
) -> float:

yt/testing.py Outdated
Comment on lines +155 to +158
result = pre * integral
if isinstance(result, np.floating):
return result.item()
return result
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = pre * integral
if isinstance(result, np.floating):
return result.item()
return result
return float(pre * integral)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. This function can be called on an arrays of b and hsml values and float(array) will not convert an array to the np.float type. Being strict on return type is fine, but then we should either revert to always returning an array, and changing the dtype, or do that type cast before picking out the single element for a 0d-array.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... e.g. np.float32(pre * integral) would work for arrays though. We'd need to pick which floating point precision to specify in that case though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Why would np.float32(...) work but not float(...) ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but that's what worked and didn't work when I tried it out:

Python 3.13.2 (main, Feb  4 2025, 14:51:09) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.arange(3)
>>> float(a)
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    float(a)
    ~~~~~^^^
TypeError: only length-1 arrays can be converted to Python scalars
>>> np.float32(a)
array([0., 1., 2.], dtype=float32)
>>> np.float(a)
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    np.float(a)
    ^^^^^^^^
  File "/Users/nastasha/code/venvs/ytdev_pixav/lib/python3.13/site-packages/numpy/__init__.py", line 397, in __getattr__
    raise AttributeError(__former_attrs__[attr], name=None)
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

@neutrinoceros
Copy link
Copy Markdown
Member

Would it make sense to also use the _FloatingT type for the kernel integrator? That one also doesn't make assumptions on the exact float precision of the inputs.

depends.
_FloatingT, as defined here, doesn't just act as a placeholder for an unknown dtype: it is also a TypeVar and as such, it expresses a relation between input types and output types.

Can you point me to the exact function you're refering to ?

@nastasha-w
Copy link
Copy Markdown
Contributor

@neutrinoceros I meant the function integrate_kernel in testing.py (line 120 in my version of testing.py). As-is, the output will depend on the exact float types used for b and hsml, although it isn't obvious to me which type will be used if they aren't the same... Perhaps your explicit type-casting suggestion is a better idea here, but I would like to retain the ability to return arrays and not just single float values.

@neutrinoceros
Copy link
Copy Markdown
Member

Then how is the decision to return an array taken, and would it make sense to unconditionally return an array instead ?

@nastasha-w
Copy link
Copy Markdown
Contributor

Yeah, the version before this PR would always return an array; if the inputs were floats, the result would be a 0d-array. I just called np.array on the hsml and b inputs. 0d-arrays are annoying though, I'd be fine with returning a float if neither b nor hsml was input as an array. e.g.

if not (isinstance(b, np.ndarray) or isinstance(hsml, np.ndarray)):
    return result.item()
return result

@chrishavlin
Copy link
Copy Markdown
Contributor Author

would it make sense to unconditionally return an array instead ? (@neutrinoceros )

This sounds like a good solution to me. I agree that the conditional return type is not ideal if we can simplify it.

0d-arrays are annoying though (@nastasha-w )

how about always returning an array with np.atleast_1d (i.e., return np.atleast_1d(pre * integral))?

@chrishavlin
Copy link
Copy Markdown
Contributor Author

chrishavlin commented Apr 3, 2025

Latest push updates integrate_kernel to always return an array (using np.atleast_1d so it's never a 0d array).

Don't think there was anything else left to do?

@nastasha-w
Copy link
Copy Markdown
Contributor

It looks good to me, but then again, so did the initial mess I made here

@neutrinoceros
Copy link
Copy Markdown
Member

Ah, now it looks like we can't merge yet because the jenkins server is apparently down ?

@neutrinoceros neutrinoceros merged commit 69acde4 into yt-project:main Apr 4, 2025
12 of 13 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/yt that referenced this pull request Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mypy failure with np 2.2.4

3 participants