Skip to content

Add direct pint/pint-xarray support #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 18, 2022
Merged

Add direct pint/pint-xarray support #41

merged 15 commits into from
May 18, 2022

Conversation

DocOtak
Copy link
Owner

@DocOtak DocOtak commented Mar 22, 2022

Aims to add support for setting the correct units on inputs that are either pint.Quantities or wrappers of pint.Quantities such as xarray (and the very handy pint-xarray library).

This is a work in progress as the API needs get figured out a bit.

@DocOtak
Copy link
Owner Author

DocOtak commented Mar 22, 2022

@rcaneill some early architecture discussion would be welcome.

My thoughts so far:

  • I don't want to require pint+pint-xarray, so things will need to be wrapped with some "does this import exist" logic
  • Should we require pint-xarray for any pint type support? the upstream gsw library is very unhappy with pint.Quantity inpus (the unit aware values). So we need to feed gsw "raw" ndarrays or DataArrays that have been converted to normal ndarrays. The pint-xarray wrapper has some very convenient methods this conversion.
  • Since it was a specific asked for feature, I'd like to make this the 0.3.0 feature and push auto discovery of computable properties to later. I feel like we might end up relying on pint to make the auto discovery possible since there will need to be some unit checking involved, not just standard names.
  • We might be forced to deal with the "degree_north" incompatibility with dealing with pint (or maybe pint-xarray handles it?) at least for the pint.Quantity objects

p.s. COVID willing, I head to sea in about 3 weeks for ~2mo.

@rcaneill
Copy link
Collaborator

  • I don't want to require pint+pint-xarray, so things will need to be wrapped with some "does this import exist" logic

I agree with this

  • Should we require pint-xarray for any pint type support? the upstream gsw library is very unhappy with pint.Quantity inputs (the unit aware values). So we need to feed gsw "raw" ndarrays or DataArrays that have been converted to normal ndarrays. The pint-xarray wrapper has some very convenient methods this conversion.

Hm this is a good question. As the package is called gsw-xarray it makes sense that the users will have xarray installed, so having also pint-xarray would make sense to me. Indeed it would be nice that gsw could consume pint.Quantity inputs, but this is not the aim of our package.
So I would be in favor of forcing pint-xarray in case of using pint.

  • Since it was a specific asked for feature, I'd like to make this the 0.3.0 feature and push auto discovery of computable properties to later. I feel like we might end up relying on pint to make the auto discovery possible since there will need to be some unit checking involved, not just standard names.

Sounds good to me

@rcaneill
Copy link
Collaborator

I guess that we will need to write a long dict of the units of the inputs of the gsw functions, to be able to convert the pint inputs (e.g. Pascal to bar, etc)

I can probably modify the script in #28 to get this done

@rcaneill rcaneill added this to the Version 0.3.0 milestone Mar 23, 2022
@rcaneill
Copy link
Collaborator

rcaneill commented May 4, 2022

I think that this PR would be ready to be merged. When merged I can start working on a new PR to check / convert the units of the arguments to match the needed units of the upstream gsw

@DocOtak
Copy link
Owner Author

DocOtak commented May 4, 2022

Feel free to merge, I don't remember the state it was in when I left for my cruise... so more tests in a follow up would be welcome.

@rcaneill
Copy link
Collaborator

rcaneill commented May 9, 2022

I will continue the work from this branch (add some more tests, play a little with pint, etc) before merging

pre-commit = "^2.17.0"

[tool.poetry.extras]
docs = ["Sphinx", "furo"]
pint = ["Pint", "pint-xarray"]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me

@rcaneill
Copy link
Collaborator

@DocOtak can you review?

I added support for pint only quantities, corrected a bug, and added few tests. I think that it is ready for merge, and it solves the issue: "If input is pint.Quantity or xarray.DataArray pint-xarray wrapped quantity, returns a quantity".

Next step (in a next PR) would be "if input is quantity, verify / convert unit"
Next - Next step could be (?) "Parse units from input dataarray when possible". This should be discussed in an issue, but I am not so sure that this is a good idea, as many dataarrays have attributes wrongly sets from operations keeping attributes, etc

@rcaneill rcaneill marked this pull request as ready for review May 10, 2022 09:05
@DocOtak
Copy link
Owner Author

DocOtak commented May 11, 2022

I didn't get though looking at all this before the rosette came up and it was time to sample. I hope I can finish it tomorrow.

@rcaneill
Copy link
Collaborator

No rush :)

@rcaneill
Copy link
Collaborator

I just thought that we should be careful with the unit registry. If we return quantities that belong to another registry as the one the user had, things won't work (cf bottom of page https://pint.readthedocs.io/en/stable/tutorial.html).
I don't have a solution yet.

@DocOtak
Copy link
Owner Author

DocOtak commented May 15, 2022

I'd like to ping @dcherian for advise on how best to handle pint unit registries. But I think what we'll want to do is call pint.get_application_registry(), which should give us the correct thing in contexts where e.g. cf-xarray is in use and it defines its own registry. My first reading of the code is that, if we just replace the UnitRegistry() call with the get_application_registry one, we would have import order problems, e.g. if cf-xarray is imported after we call pint.get_application_registry() I don't think we would have a reference to the cf-xarray application registry. I think this is fixed by moving the ureg creation stuff into the pint_compat function.

Seems my dev environment for this project is broken and I'm not sure I have the bandwidth to fix it out here (will make an attempt).

@dcherian
Copy link

But I think what we'll want to do is call pint.get_application_registry(), which should give us the correct thing in contexts where e.g. cf-xarray is in use and it defines its own registry.

I defer to @keewis for these questions :)

You could consider adding cf_xarray as a dependency for unit handling. One of its intended goals is to provide a unit registry that works for CF-type things in the xarray world. It also has no extra dependencies.

@rcaneill
Copy link
Collaborator

I think that adding cf_xarray for handling units is a reasonable choice. However, I don't think that this would solve the problem that we need to return quantities using the same registry than the one used by the quantities passed by the user.

Correct me if I'm wrong, but cf_xarray would be needed only in the case where we would like to parse the units from input dataarrays? And if the user is using the cf_xarray unit registry, it should be caught by my previous point (using pint.get_application_registry() or other solution)

@DocOtak
Copy link
Owner Author

DocOtak commented May 16, 2022

Is there any way to inspect a Quantity (or Unit) and get the registry it belongs to? Nothing is jumping out at me in the pint docs regarding this.

@DocOtak
Copy link
Owner Author

DocOtak commented May 16, 2022

I've read though the source a little, and it looks like the ApplicationRegistry object that is returned by pint.get_application_registry() is a singleton that will always point at whatever the current application unit registry is. So I think I was wrong about import order mattering. (yay)

I think the only change needed is to swap the ureg = pint.UnitRegistry() with ureg = pint.get_application_registry() and we'll inherit all the good things in cf_xarray if that is being used. Here is a toy example:

>>> import pint
>>> reg = pint.get_application_registry()
>>> reg("degree_north")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/pint/registry.py", line 2425, in __call__
    return self._registry(*args, **kwargs)
  File "/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/pint/registry.py", line 2372, in __call__
    return self(*args, **kwargs)
  File "/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/pint/registry.py", line 1388, in parse_expression
    return build_eval_tree(gen).evaluate(
  File "/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/pint/pint_eval.py", line 122, in evaluate
    return define_op(self.left)
  File "/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/pint/registry.py", line 1389, in <lambda>
    lambda x: self._eval_token(x, case_sensitive=case_sensitive, **values)
  File "/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/pint/registry.py", line 1275, in _eval_token
    {self.get_name(token_text, case_sensitive=case_sensitive): 1}
  File "/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/pint/registry.py", line 722, in get_name
    raise UndefinedUnitError(name_or_alias)
pint.errors.UndefinedUnitError: 'degree_north' is not defined in the unit registry
>>> from cf_xarray import units
/home/vscode/.cache/pypoetry/virtualenvs/gsw-xarray-Apk1CQ_8-py3.8/lib/python3.8/site-packages/cf_xarray/units.py:114: UserWarning: Import(s) unavailable to set up matplotlib support...skipping this portion of the setup.
  warnings.warn(
>>> reg("degree_north")
<Quantity(1, 'degrees_north')>

@rcaneill
Copy link
Collaborator

Seems good but if the user has multiple registries, and uses the 1st one for the quantities, then using get_application_registry will return the 2nd registry I guess.

Maybe having a look at the _REGISTRY property of quantities? cf https://github.com/hgrecco/pint/blob/306907348754fd7ff90d70f6b9889a7cfd8a5a70/pint/facets/plain/quantity.py#L1485-L1490

       # Registry equality check based on util.SharedRegistryObject
       if self._REGISTRY is not other._REGISTRY:
           mess = "Cannot operate with {} and {} of different registries."
           raise ValueError(
               mess.format(self.__class__.__name__, other.__class__.__name__)
           )`

@rcaneill
Copy link
Collaborator

Hum, looking more in details in the sources, it feels like the _REGISTRY is pointing to the same thing as the get_application_registry...
https://github.com/hgrecco/pint/blob/306907348754fd7ff90d70f6b9889a7cfd8a5a70/pint/util.py#L815-L823

@rcaneill
Copy link
Collaborator

rcaneill commented May 16, 2022

So maybe instead of creating / loading the registry, we could: 1) verify that all inputs belong to the same registry in our pint_compat(or we decide something else, e.g. we use the registry of the 1st argument) (I am in favor of following pint and not allow different registries), and 2) pass the registry to our quantify function (we need to check how to give a registry to the pint_xarray quantify method).

If no one tries / tests this (European) night, I may give it a try tomorrow if I have time.

@keewis
Copy link

keewis commented May 16, 2022

this is what pint-xarray is doing: don't allow mixing multiple registries, and allow passing a registry if the application registry is not desired.

For 2: .pint.quantify has a registry parameter that will be used to convert string units

@DocOtak
Copy link
Owner Author

DocOtak commented May 16, 2022

@rcaneill managed to get my dev env back up and running!

I made an attempt to implement what's been discussed here:

  • Pull the registry off the input arguments, either arg.pint.registry in the case of a DataArray, or the _REGISTRY if a "raw" pint quantity.
  • Check to make sure everything belongs to one registry
  • Either return that registry, or None if pint isn't being used
  • I switched the call to _core.py:quantify() to always happen, but just short circuit if there is no registry
  • Made sure we were using the pint_xarray registry in the tests that use pint_xarray
  • Added a test to ensure failure when mixed registries are used

Copy link
Collaborator

@rcaneill rcaneill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thanks Andrew :)
For what I see it seems ready to merge

@DocOtak
Copy link
Owner Author

DocOtak commented May 17, 2022

Let me write some docs as the final touch.

Do we want to attempt some unit input validation in this dev cycle (in a separate PR). e.g. make sure the SA input is g/kg?

@rcaneill
Copy link
Collaborator

Do we want to attempt some unit input validation in this dev cycle (in a separate PR). e.g. make sure the SA input is g/kg?

I started this, I open a PR with my (very early) WIP so we can discuss API and choices: see PR #47.
I started it from the branch of PR 41, so it may be messy to see diff until PR 41 is merged...

@DocOtak
Copy link
Owner Author

DocOtak commented May 17, 2022

Does the readme make sense? Are more docs needed? (@rcaneill merge if you think it's good)

@rcaneill
Copy link
Collaborator

This is very clear!

@rcaneill rcaneill merged commit 3b37096 into master May 18, 2022
@rcaneill rcaneill deleted the pint branch November 27, 2023 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants