A convention for complex numbers? #370
Replies: 45 comments
-
Re question 1: I think CF is a good place for this, however... the inertia you will encounter for getting any of these added is probably going to be large. CF is very slowly adopting netcdf4 features, e.g. strings were added just today. When looking though the mailing list, proposals for standard names take the form "imaginary_part_of_fourier_transform_of_air_pressure_wrt_time" (and corresponding "real_part" names). None of these have made it into the actual standard name list. These would be using option 2b above. The netcdf4-python docs use option 2a above as an example (compound types): https://unidata.github.io/netcdf4-python/netCDF4/index.html#section10 This method seems to be the way that the library authors think you should store data like this. Option 2b seemed to be the one favored in a mailing list 2010 discussion, but there was no conclusion. This was about 2 years after netcdf4 was introduced and option 2a was "unavailable" or at least too new. http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/003517.html Option 2c is what the CFRadial folks have proposed with an attribute which indicates that the last dim is two reals representing a complex. https://cf-trac.llnl.gov/trac/ticket/169 |
Beta Was this translation helpful? Give feedback.
-
It's nice to see that somebody has tried each of these approaches before! With a compound dtype, I can use netCDF4-Python to create complex values that can be read by h5py: import netCDF4
import numpy
import h5py
size = 3
datac = numpy.exp(1j*(1.+numpy.linspace(0, numpy.pi, size)))
with netCDF4.Dataset("complex.nc","w") as f:
complex128 = numpy.dtype([("r",numpy.float64), ("i",numpy.float64)])
complex128_t = f.createCompoundType(complex128, "complex128")
x_dim = f.createDimension("x_dim", None)
v = f.createVariable("var", complex128_t, "x_dim")
data = numpy.empty(size,complex128)
data["r"] = datac.real
data["i"] = datac.imag
v[:] = data
with h5py.File('complex.nc') as f:
print(f['var'][...]) But unfortunately it does not seem to be possible to do it the other way around -- netCDF-C can't read the sort of custom dtypes that h5py creates. This could presumably be fixed either in netCDF-C or in h5py. I think my favored solution would be option 2c -- the extra dimensions for real/imag parts feels like the most efficient/compact solution. It's backwards compatible with netCDF3 (still useful for many purposes!) and also with newer format like zarr (though zarr also supports complex values directly). From the perspective of a tool like xarray it avoids the issue of needing to combine attributes from two different variables into a single variable. |
Beta Was this translation helpful? Give feedback.
-
I agree that option 2c is likely the best to maximize compatibility across different version of netCDF, etc. Splitting an imaginary number into two variables is (to me) counterintuitive and counterproductive. An attribute, such as the CF-Radial |
Beta Was this translation helpful? Give feedback.
-
What are the use cases for storing complex numbers in polar form? I guess it allows you to load half the data from disk, if you only need one of the terms? From a simplicity perspective, splitting along the last dimension into real and imaginary components feels the cleanest to me -- these map directly into numeric types. This avoids the issue of specifying multiple units in a single string, which feels a little strange to me. If you need to keep track of different units for the magnitude and phase, then I think multiple variables would be appropriate. |
Beta Was this translation helpful? Give feedback.
-
With the caveat that none of my work involves complex numbers: I ask this because if this is an "encoding" then perhaps the units wouldn't really apply until after you do the "decode" step. The representation of the values "at rest" aren't as connected as the original CF Radial proposal would imply. Similar to how |
Beta Was this translation helpful? Give feedback.
-
This issue is certainly of interest in the CF community and has been discussed before, most recently by CF-Radial, as mentioned. I favour option 2b, because
Whether we should add complex numbers to CF in some way also depends on their being a use-case which is strong enough for CF to adopt, since we don't add things to CF solely because we can imagine the possible need for them. |
Beta Was this translation helpful? Give feedback.
-
As a maintainer of domain agnostic software that interprets a subset of CF conventions, I would like to have a standard way to store arrays of complex values in a single netCDF file, that doesn't require adopting all of CF to be useful. In xarray's data model, we can already represent multi-dimensional labeled arrays of complex values as a single variable, and these are what users want to store to netCDF. Most of these users don't particularly care about CF conventions (e.g., they are physicists not climate scientists), but I'd like them to be able to store "more standard" data to disk, so it has a better chance of being readable by other software. So if we use multiple variables for complex values, I'd like to standards for:
The advantage of approach 2(c) is that it side-steps all of these issues, because it's impossible to use different metadata for the different components. I'm sure there are valid use cases for using different units for real/imaginary components or storing data in polar form. I agree that for these use cases, it makes sense to store data in separate arrays, and I would encourage creating domain specific conventions for how to store particular sets of variables. But these feel like a different use case: data that could be represented as complex values, not data that always is complex valued. |
Beta Was this translation helpful? Give feedback.
-
Another point in favor of 2c, IMO, is that one can readily read in the data from a netCDF file and create a view of this data using a client language's native support for complex values with no copies. |
Beta Was this translation helpful? Give feedback.
-
Ping @mike-dixon and @piyushrpt, who were involved in the discussion on the CF-Trac website. |
Beta Was this translation helpful? Give feedback.
-
Just wanted to chime in. While waiting for resolution on this, the following features have been added to GDAL primarily with SAR geospatial datasets in mind: NETCDF4 HDF5 These implement approach 2a. It allows us to work well with native numpy types in python as well as with complex values support in C++. We have been focusing more on using CF conventions with HDF5 as it allows us to float16 for our applications. |
Beta Was this translation helpful? Give feedback.
-
I just had my attention drawn to this. By way of trying to move this along, a bit of a summary and restatement. The original suggestion from @shoyer was that there were three options: a) compound data types, b) real and imaginary parts in separate arrays, and c) real and imaginary parts in a multi-dimensional array. This was effectively because he had correctly rejected underlying library support (netCDF doesn't do it, effectively because HDF doesn't do it, yet). A couple of quick comments
Jonathan has said we need a use case, and cited cf-radial. Maybe we need to revisit that? But in any case, this feels like a real chicken and egg issue: we don't have a use case for complex numbers because we don't have support for them ... So, revisiting the options,
My take on the discussion so far is that there is agreeement we should do this, and more support for (c) than the other options. We have a CF meeting in a couple of weeks, can we knock this one off there? |
Beta Was this translation helpful? Give feedback.
-
I'm a research software engineer from the plasma science community where we regularly use complex numbers, and I've seen all three conventions in use in production research software. I've also been trying to push some kind of solution forward in the last few months. I strongly support option a, using a compound datatype. This is supported in the netCDF4 API, released in 2008, so should be widely available! Option b, split variables, and option c, a new dimension, are very unlikely to make it into either netcdf or HDF5 at the library level. The h5netcdf implementation, which uses h5py to write netcdf-compliant HDF5 files directly, also uses a compound datatype However, the downside of picking any convention at the library level is that there will still be plenty of existing files (and software) using the others, as well as variations on the chosen format (mostly different names for the real/imaginary components). This has been bothering me so much I've started work on a (proof-of-concept for now) drop-in extension to netCDF with C, C++, Fortran, and Python APIs. The idea is to handle all possible conventions for reading (and eventually appending) so that users and developers don't have to care about which one a particular file is using. It also writes native complex numbers to compound datatypes, checking if the type already exists in the file and creating it if not. The plan is to support reading as many existing representations as possible in order to make it completely painless for an application to switch convention.
|
Beta Was this translation helpful? Give feedback.
-
A compound data type would be OK for CF metadata if it was real and imaginary parts, but not if it was polar, because the two components have different units, and would need different standard names. |
Beta Was this translation helpful? Give feedback.
-
Yes, I had meant to reference the original trac ticket where this part of the problem was originally discussed. |
Beta Was this translation helpful? Give feedback.
-
I am not sure we can get a nirvana here, the past is the past, and given we had no convention, I don't feel obliged to be backwards compatible, existing software will work with existing files, and new software can surely cope. I'm also not sure that we can hope to predict what HDFx will do, and when it will do it. That said, I don't have a preference for (a) or (c) at the moment, I was just trying to reflect what seemed like a preference of (c) thus far. However, that preference may have been biased by the fact that when this ticket was open (a) would have been much harder than it is now. |
Beta Was this translation helpful? Give feedback.
-
@ZedThree You said:
This is presumably for writing to NetCDF or HDF files using compound data types under the hood? |
Beta Was this translation helpful? Give feedback.
-
Would this tool work in conjunction, somehow, with whatever else we're using to create datasets (e.g. netCDF4-python in my case)? Thanks, |
Beta Was this translation helpful? Give feedback.
-
@bnlawrence Yes, exactly. It should also work with the other backing file types that support the netCDF4 format. I could also make it configurable to use a complex dimension when writing. @davidhassell Yep, the Python API is built on top of netCDF4-python, so you can literally do It's currently proof-of-concept, but it does work and you try it out here. Packages coming soon! |
Beta Was this translation helpful? Give feedback.
-
Have you thought about a pull request directly into netCDF4-python? In any case, for other readers coming to this thread, you should look at this link from @ZedThree above; it's a great summary of complex number issues. I think we'd be pretty silly to deviate from the analysis (and "blessed solution") he provides. |
Beta Was this translation helpful? Give feedback.
-
Hi All
FYI - I want to let you know about our decisions in CfRadial.
We are storing spectra, which can be represented as:
power - i.e. a single scalar
power and phase - i.e. 2 scalars
a complex pair, also 2 scalars
For the complex case, we chose to store these as 2 separate variables.
We are now working on a CF convention for radar time series data - the
so-called In-phase and Quadrature (I/Q) pairs.
Complex numbers apply here as well. We are adopting the same approach - 2
separate variables.
We will probably use the ancillary_variables attribute to
indicate the connection between the 2 arrays.
Thanks
Mike Dixon
…On Fri, Oct 6, 2023 at 2:32 AM Bryan Lawrence ***@***.***> wrote:
@davidhassell <https://github.com/davidhassell> Yep, the Python API is
built on top of netCDF4-python, so you can literally do import nc_complex
as netCDF4 and not touch the rest of your code. For other languages, it
will be similar, though sometimes function names may have to be changed.
Have you thought about a pull request directly into netCDF4-python?
In any case, for other readers coming to this thread, you should look at
this link <https://github.com/PlasmaFAIR/nc-complex> from @ZedThree
<https://github.com/ZedThree> above; it's a great summary of complex
number issues. I think we'd be pretty silly to deviate from the analysis
(and "blessed solution") he provides.
—
Reply to this email directly, view it on GitHub
<#369>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTOD74UJ6UFZD5ZEPVN3BTX5662NAVCNFSM4IVKCMG2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZVGAYTSNJTGUYA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Mike,
My initial thought on this is it would be better to use a different attribute, rather than to overload an existing one with a use for which it wasn't intended ... but it'd be good first to make sure that I understand what you have in mind. Could you possibly post a CDL snippet giving an example? Many thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi David
I would welcome the addition of attributes specifically for this purpose.
Mike
…On Thu, Oct 12, 2023 at 7:42 AM David Hassell ***@***.***> wrote:
Hi Mike,
We will probably use the ancillary_variables attribute to indicate the
connection between the 2 arrays.
My initial thought on this is it would be better to use a different
attribute, rather than to overload an existing one with a use for which it
wasn't intended ... but it'd be good first to make sure that I understand
what you have in mind. Could you possibly post a CDL snippet giving an
example?
Many thanks,
David
—
Reply to this email directly, view it on GitHub
<#369>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTOD73YOIZGSQVUTUCUBYTX67XTFANCNFSM4IVKCMGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Dear @mike-dixon On 6th October, you wrote
I think that choice will work well in CF. @davidhassell replied on 12th
meaning a new attribute, other than Best wishes Jonathan |
Beta Was this translation helpful? Give feedback.
-
to be clear for me, when you say: """ Is that two complex numbers, or one complex number (which requires 2 values to represent. I think you mean the latter, but want to make sure. Anyway, conceptually, a complex number IS a scalar -- so to the degree possible, it would be really great to treat it as such in CF. If that is not possible to do properly (a complex type). and we need to use two variables, I think they should be defined specifically as a complex number, rather than by overloading an existing concept. Which ai think is what @davidhassell is suggesting. |
Beta Was this translation helpful? Give feedback.
-
I think the latter, too, i.e. a complex number is conceptually a scalar, but in netCDF needs two real numbers to represent it, one of which is implicitly multiplied by
Yes. Although pending @mike-dixon's CDL CF-radial example, I'm not yet sure what the new structure would look like. |
Beta Was this translation helpful? Give feedback.
-
I'd just like to plug my nc-complex library again. The C API is now production ready, and support for it has been integrated into netcdf4-python directly and will be available in their next release. I'm currently working on the C++ and Fortran APIs, and they will be available soon. The library supports using a dimension or a compound type to represent complex numbers (and multiple different conventions), and is completely interoperable with the standard netCDF library. It's also trivial to use: switch the I haven't yet added support for complex numbers as two separate variables, as there are some difficult questions like how to represent the name of the variable, should attributes be merged, and so on. A lot more functions would also need to be wrapped in order to handle accessing two variables as a single one. @mike-dixon Is there some discussion available somewhere as to why CF-Radial has chosen to use two variables? I'd like to understand the benefits of using them over a separate dimension or a compound datatype. |
Beta Was this translation helpful? Give feedback.
-
FYI... HDF Group has just published an implementation RFC for float16 and complex number datatypes in libhdf5. You can provide comments or suggestions at HDFGroup/hdf5#3339. |
Beta Was this translation helpful? Give feedback.
-
I think this is great news! Given that we haven't yet established a CF convention for complex yet, it seems the obvious thing to do is adopt the HDF complex -- assuming it gets into netcdf soon. :-) As for CF-Radial -- I'm still confused about whether they are storing complex numbers, or 2D-values-that-have-a-magnitude-and-direction (now that I wrote that -- 2D vectors), which can be mathematically represented as complex. Which seems to me a different thing. For example, in my line of work, we often need to represent wind speed and direction -- but I've never seen anyone call that a complex number. And if CF, we don't use the speed-direction form, standardizing on the eastward-northward form instead. |
Beta Was this translation helpful? Give feedback.
-
A brief update, as a couple of things have happened recently: netcdf4-python can now read and write complex numbers. There's a new The RFC to add native complex numbers to HDF5 is progressing, and is likely to be in version 1.15. It will also have support for no-op conversions to/from compound datatype representations. This hopefully gives us some more pressure to add native complex numbers in netCDF itself. My nc-complex library is an extension that adds support for complex numbers for C and C++. I need to find some time to add Fortran support too. |
Beta Was this translation helpful? Give feedback.
-
A PR for complex number datatypes in libhdf5: HDFGroup/hdf5#4630. It is planned to appear in the next major libhdf5 release 1.16.0. It comes with support for on-the-fly conversion of many current ways of storing complex numbers in HDF5/netCDF-4 files. Comments or testing of the PR welcome. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This may not come up often for Climate and Forecast use-cases, but many physical scientists are interested in storing arrays of complex numbers. This has come up with some regularity for xarray users, e.g., see pydata/xarray#3297 for the most recent issue.
There does not appear to be a standard for how to store complex numbers in netCDF files, so scientists are currently presented with two poor options:
invalid_netcdf=True
with h5netcdf: https://github.com/shoyer/h5netcdf#invalid-netcdf-filesI would love to resolve this with a standard way to store complex values in netCDF file, so xarray doesn't have to invent its own standard or encourage writing HDF5 files that aren't valid in netCDF.
Some questions:
a. Use compound data types in a single array, like h5py and HDF5.jl (see complex number support JuliaIO/HDF5.jl#558). Advantages: this is an existing standard already in use. Disadvantages: requires the HDF5 data model; netCDF-C lacks support for reading some types compound data types (NetCDF unable to read some HDF5 enums Unidata/netcdf-c#267), including the convention used by h5py.
b. Store real and imaginary parts in separate arrays, with some sort of metadata convention for indicating that they are the same logical array. Advantages: easy to interpret merely by inspection; easy to access real or imaginary parts separately. Disadvantages: a new convention; reading data into complex dtypes in memory will likely require an additional copy to combine real/imaginary parts.
c. Store real and imaginary parts in a single array with an extra dimension of length 2 at the end for real and imaginary parts. Advantages: still backwards compatible with old netCDF formats; can be memory mapped without a copy. Disadvantages: slightly less self-explanatory (user needs to understand the dimension mapping); adds an extra dangling dimension of length two in the dataset.
Beta Was this translation helpful? Give feedback.
All reactions