Conversation
93da373 to
3c454fe
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #277 +/- ##
==========================================
+ Coverage 84.34% 84.48% +0.14%
==========================================
Files 69 69
Lines 4771 4834 +63
Branches 656 668 +12
==========================================
+ Hits 4024 4084 +60
+ Misses 516 508 -8
- Partials 231 242 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c648d39 to
ddc0ad1
Compare
tests/test_hapi_codecs.py
Outdated
| def test_spz_getdata_to_csv(self): | ||
| hapi_csv_codec: CodecInterface = get_codec('hapi/csv') | ||
| spz_var = spz.get_data("cda/STA_L1_HET/Proton_Flux", "2020-10-28", "2020-10-28T01") | ||
| hapi_csv_file = hapi_csv_codec.save_variables(variables=[spz_var], file='test_output.csv') |
There was a problem hiding this comment.
Use tmp file or dir for auto cleanup.
| raise ValueError("No variables to save") | ||
| hapi_csv_file = HapiCsvFile() | ||
| hapi_csv_file.add_parameter(_make_hapi_csv_time_axis(variables[0].time)) | ||
| hapi_csv_file.add_parameter(_make_hapi_csv_time_axis(variables[0].axes[0])) |
There was a problem hiding this comment.
time is cleaner and show the intent
| for ax in _get_variable_axes(variable, is_time_dependent=True): | ||
| meta = { | ||
| "name": _time_dependent_axis_name(ax), | ||
| "type": "double", |
There was a problem hiding this comment.
Not sure hard-coding the type here is safe, ax.values can be of any numerical type.
There was a problem hiding this comment.
done (refactored with _create_meta())
| json_str = json.dumps(headers, indent=2, ensure_ascii=False) | ||
| commented = "\n".join("#" + line for line in json_str.splitlines()) | ||
| dest.write((commented + "\n").encode("utf-8")) |
There was a problem hiding this comment.
this is how we prepend the json headers as comment on top of .csv file
There was a problem hiding this comment.
Sure but why binary and multi-lines?
There was a problem hiding this comment.
- multilines because I find it more human readable
- binary because it was the orginal format sent by
save_hapi_csv()
do you prefer
- one-line header ?
- change save_hapi_csv() for a text data type ?
There was a problem hiding this comment.
one line is faster so yes for this one.
For text, I don't know, it was just suspicious we had to do this, I want to be sure we do not miss anything that would break later.
| time_dependent_axes = _get_variable_axes(variable, is_time_dependent=True) | ||
| if time_dependent_axes: | ||
| bins.extend([ | ||
| {"name": ax.name, "unit": ax.unit, "centers": _time_dependent_axis_name(ax)} |
There was a problem hiding this comment.
| {"name": ax.name, "unit": ax.unit, "centers": _time_dependent_axis_name(ax)} | |
| {"name": ax.name, "units": ax.unit, "centers": _time_dependent_axis_name(ax)} |
tests/test_hapi_codecs.py
Outdated
| variables = hapi_csv_codec.load_variables(file=f, variables=['spectra_time_dependent_bins'], disable_cache=True) | ||
| self.assertIn('spectra_time_dependent_bins', variables) | ||
|
|
||
| def test_load_time_independant_axis(self): |
There was a problem hiding this comment.
| def test_load_time_independant_axis(self): | |
| def test_load_time_independent_axis(self): |
There was a problem hiding this comment.
Pull request overview
Enhances the HAPI CSV codec to better support multi-axis (binned) variables on read and to export Speasy variables to HAPI CSV, with added regression tests and new CSV fixtures.
Changes:
- Add tests and fixtures for time-independent and time-varying bin axes in HAPI CSV.
- Extend HAPI CSV codec to decode
binsintoVariableAxisobjects and emit bins metadata when saving. - Update HAPI CSV writer to output pretty-printed commented JSON headers and to expand multi-dimensional values into CSV columns.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_hapi_codecs.py | Adds unit tests for bin-axis decoding and for saving variables back to HAPI CSV. |
| tests/resources/HAPI_ndData_TimeVarying_Axis.csv | New fixture covering time-varying bin centers/ranges. |
| tests/resources/HAPI_ndData_TimeIndependent_Axis.csv | New fixture covering time-independent bin centers/ranges. |
| speasy/core/codecs/bundled_codecs/hapi_csv/writer.py | Updates header serialization and flattens multi-dim parameters into multiple CSV columns. |
| speasy/core/codecs/bundled_codecs/hapi_csv/codec.py | Adds bins<->axes mapping for load/save and writes additional parameters for time-varying axes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| json_str = json.dumps(headers, indent=2, ensure_ascii=False) | ||
| commented = "\n".join("#" + line for line in json_str.splitlines()) | ||
| dest.write((commented + "\n").encode("utf-8")) |
There was a problem hiding this comment.
The header is written as bytes (encode('utf-8')) but pandas.DataFrame.to_csv typically writes strings to the same dest. If dest is a text stream, line 21 will raise TypeError; if dest is a binary stream, to_csv may fail when writing strings. Use a consistent text writer for both header and dataframe output (e.g., write commented + '\\n' as str, or wrap a binary stream with a TextIOWrapper and pass that same wrapper to both writes).
| df = pds.DataFrame(data) | ||
| df.to_csv(dest, index=False, header=False, date_format='%Y-%m-%dT%H:%M:%S.%fZ', float_format='%.3g') |
There was a problem hiding this comment.
The header is written as bytes (encode('utf-8')) but pandas.DataFrame.to_csv typically writes strings to the same dest. If dest is a text stream, line 21 will raise TypeError; if dest is a binary stream, to_csv may fail when writing strings. Use a consistent text writer for both header and dataframe output (e.g., write commented + '\\n' as str, or wrap a binary stream with a TextIOWrapper and pass that same wrapper to both writes).
| vals = param.values | ||
| if vals.ndim == 1: | ||
| data[param.name] = vals | ||
| else: | ||
| for i in range(vals.shape[1]): | ||
| data[f"{param.name}_{i}"] = vals[:, i] |
There was a problem hiding this comment.
Multi-dimensional values are only expanded along shape[1] and sliced with vals[:, i]. For ndim > 2, vals[:, i] remains multi-dimensional and will produce object-like columns or inconsistent CSV output. Consider reshaping values to 2D (time, -1) and generating column names for all flattened components (or iterating across all non-time indices) so ndData variables with >2 dimensions serialize predictably.
| {"name": ax.name, "unit": ax.unit, "centers": ax.values.tolist()} | ||
| for ax in time_independent_axes | ||
| ]) | ||
|
|
||
| time_dependent_axes = _get_variable_axes(variable, is_time_dependent=True) | ||
| if time_dependent_axes: | ||
| bins.extend([ | ||
| {"name": ax.name, "unit": ax.unit, "centers": _time_dependent_axis_name(ax)} |
There was a problem hiding this comment.
The bin metadata uses the key unit, but HAPI parameter metadata (and your fixtures) use units. This will cause _decode_meta / _bin_to_axis to miss units and drop axis units on load. Rename unit to units in the created bins entries to match the HAPI schema and the rest of the codec.
| {"name": ax.name, "unit": ax.unit, "centers": ax.values.tolist()} | |
| for ax in time_independent_axes | |
| ]) | |
| time_dependent_axes = _get_variable_axes(variable, is_time_dependent=True) | |
| if time_dependent_axes: | |
| bins.extend([ | |
| {"name": ax.name, "unit": ax.unit, "centers": _time_dependent_axis_name(ax)} | |
| {"name": ax.name, "units": ax.unit, "centers": ax.values.tolist()} | |
| for ax in time_independent_axes | |
| ]) | |
| time_dependent_axes = _get_variable_axes(variable, is_time_dependent=True) | |
| if time_dependent_axes: | |
| bins.extend([ | |
| {"name": ax.name, "units": ax.unit, "centers": _time_dependent_axis_name(ax)} |
| {"name": ax.name, "unit": ax.unit, "centers": ax.values.tolist()} | ||
| for ax in time_independent_axes | ||
| ]) | ||
|
|
||
| time_dependent_axes = _get_variable_axes(variable, is_time_dependent=True) | ||
| if time_dependent_axes: | ||
| bins.extend([ | ||
| {"name": ax.name, "unit": ax.unit, "centers": _time_dependent_axis_name(ax)} |
There was a problem hiding this comment.
The bin metadata uses the key unit, but HAPI parameter metadata (and your fixtures) use units. This will cause _decode_meta / _bin_to_axis to miss units and drop axis units on load. Rename unit to units in the created bins entries to match the HAPI schema and the rest of the codec.
| {"name": ax.name, "unit": ax.unit, "centers": ax.values.tolist()} | |
| for ax in time_independent_axes | |
| ]) | |
| time_dependent_axes = _get_variable_axes(variable, is_time_dependent=True) | |
| if time_dependent_axes: | |
| bins.extend([ | |
| {"name": ax.name, "unit": ax.unit, "centers": _time_dependent_axis_name(ax)} | |
| {"name": ax.name, "units": ax.unit, "centers": ax.values.tolist()} | |
| for ax in time_independent_axes | |
| ]) | |
| time_dependent_axes = _get_variable_axes(variable, is_time_dependent=True) | |
| if time_dependent_axes: | |
| bins.extend([ | |
| {"name": ax.name, "units": ax.unit, "centers": _time_dependent_axis_name(ax)} |
| def _bin_to_axis(json_bin: Dict[str, Any], hap_csv_file: HapiCsvFile) -> VariableAxis: | ||
| centers = json_bin.get("centers") | ||
| name = json_bin.get("name", "bin_axis") | ||
| if centers is None: | ||
| raise ValueError("Invalid bin specification: missing 'centers' field") | ||
| if isinstance(centers, str): | ||
| hapi_parameter = hap_csv_file.get_parameter(centers) |
There was a problem hiding this comment.
Parameter name hap_csv_file looks like a typo and is easy to confuse with hapi_csv_file used elsewhere. Renaming it to hapi_csv_file would improve readability and reduce the chance of mistakes when editing these helpers.
| def _bin_to_axis(json_bin: Dict[str, Any], hap_csv_file: HapiCsvFile) -> VariableAxis: | |
| centers = json_bin.get("centers") | |
| name = json_bin.get("name", "bin_axis") | |
| if centers is None: | |
| raise ValueError("Invalid bin specification: missing 'centers' field") | |
| if isinstance(centers, str): | |
| hapi_parameter = hap_csv_file.get_parameter(centers) | |
| def _bin_to_axis(json_bin: Dict[str, Any], hapi_csv_file: HapiCsvFile) -> VariableAxis: | |
| centers = json_bin.get("centers") | |
| name = json_bin.get("name", "bin_axis") | |
| if centers is None: | |
| raise ValueError("Invalid bin specification: missing 'centers' field") | |
| if isinstance(centers, str): | |
| hapi_parameter = hapi_csv_file.get_parameter(centers) |
| return variable_axis | ||
|
|
||
|
|
||
| def _bins_to_axes(json_bins: List[Dict[str, Any]], hap_csv_file: HapiCsvFile) -> List[VariableAxis]: |
There was a problem hiding this comment.
Parameter name hap_csv_file looks like a typo and is easy to confuse with hapi_csv_file used elsewhere. Renaming it to hapi_csv_file would improve readability and reduce the chance of mistakes when editing these helpers.
tests/test_hapi_codecs.py
Outdated
| def test_spz_getdata_to_csv(self): | ||
| hapi_csv_codec: CodecInterface = get_codec('hapi/csv') | ||
| spz_var = spz.get_data("cda/STA_L1_HET/Proton_Flux", "2020-10-28", "2020-10-28T01") | ||
| hapi_csv_file = hapi_csv_codec.save_variables(variables=[spz_var], file='test_output.csv') | ||
| self.assertTrue(hapi_csv_file) |
There was a problem hiding this comment.
This test performs a live spz.get_data(...) call (network / remote dependency) and writes to a fixed path (test_output.csv) in the working directory, which can make CI flaky and pollute the repo. Consider converting this into an integration test (skipped by default), mocking spz.get_data, and writing to a NamedTemporaryFile similar to the other save tests.
tests/test_hapi_codecs.py
Outdated
| variables = hapi_csv_codec.load_variables(file=f, variables=['spectra_time_dependent_bins'], disable_cache=True) | ||
| self.assertIn('spectra_time_dependent_bins', variables) | ||
|
|
||
| def test_load_time_independant_axis(self): |
There was a problem hiding this comment.
Correct spelling: 'independant' -> 'independent' in the test name.
| def test_load_time_independant_axis(self): | |
| def test_load_time_independent_axis(self): |
| #} | ||
|
|
||
| 1970-01-01T00:00:00.000Z,0,1,0.5,0.3333333333333333,0.25,0.2,0.16666666666666666,0.14285714285714285,0.125,0.1111111111111111,11,13,15,17,19,21,23,25,27,29,10,12,12,14,14,16,16,18,18,20,20,22,22,24,24,26,26,28,28,30 | ||
| 1970-01-01T00:01:08.000Z,0,1,0.5,0.3333333333333333,0.25,0.2,0.16666666666666666,0.14285714285714285,0.125,0.1111111111111111,1,3,5,7,9,11,13,15,17,19 No newline at end of file |
There was a problem hiding this comment.
The header declares three parameters after Time: spectra_time_dependent_bins (10 values), frequency_centers_time_varying (10 values), and frequency_ranges_time_varying (10x2 = 20 values). Row 68 includes the additional 20 range values, but row 69 does not, so the CSV row length is inconsistent with the declared schema. Update row 69 to include the missing 20 values (or remove/adjust the declared ranges parameter) so the fixture can be parsed reliably.
| 1970-01-01T00:01:08.000Z,0,1,0.5,0.3333333333333333,0.25,0.2,0.16666666666666666,0.14285714285714285,0.125,0.1111111111111111,1,3,5,7,9,11,13,15,17,19 | |
| 1970-01-01T00:01:08.000Z,0,1,0.5,0.3333333333333333,0.25,0.2,0.16666666666666666,0.14285714285714285,0.125,0.1111111111111111,1,3,5,7,9,11,13,15,17,19,0,2,2,4,4,6,6,8,8,10,10,12,12,14,14,16,16,18,18,20 |
6e92056 to
e19a699
Compare
|
Thank you, Richard. A few remarks that could be discussed. Here is what I am doing: import speasy as spz
imf_data = spz.get_data(spz.inventories.tree.amda.Parameters.ACE.MFI.ace_imf_all.imf, "2008-01-01", "2008-01-02")
hapi_csv_codec = spz.core.codecs.get_codec('hapi/csv')
hapi_csv_codec.save_variables(variables=[imf_data], file='./imf_data_hapi.csv')1. Missing required fields in the headerCurrently, we have the following header: In my view, the writer should directly produce an output that complies with the HAPI specification: https://github.com/hapi-server/data-specification/blob/master/hapi-3.2.0/HAPI-data-access-spec-3.2.0.md#372-response As stated in the spec, “the contents of the header should be the same as returned from the info endpoint.” Based on this, I would expect the header to include at least the required fields listed here: https://github.com/hapi-server/data-specification/blob/master/hapi-3.2.0/HAPI-data-access-spec-3.2.0.md#362-info-response-object In the current response, the following fields are missing:
2. Data precision in the written fileThe first data row in the generated CSV file is: However, when inspecting the first data point directly in Python, I get: >>> imf_data.values[0]
array([-3.84 , 0.428, -1.905], dtype=float32)There is a noticeable loss of precision on the third component. 3. Optional headerSince the "data" endpoint allows the header to be optional, it would be useful to add a writer option to enable or disable header generation. I’ll continue running some additional tests on my side. |
| elif np.issubdtype(dtype, np.floating): | ||
| return "double" | ||
| else: | ||
| raise ValueError(f"Unsupported data type {variable.values.dtype}") |
There was a problem hiding this comment.
variable is not defined
There was a problem hiding this comment.
yes, sorry.
I have my own flake8 settings on pre-commit now ;-)
e19a699 to
8a9a642
Compare
8a9a642 to
36f946d
Compare
HAPI CSV json headers may contain the 'bins' field: Build a VariableAxis, either from parameter name or from given values list.
New _bins_to_axes() method used in _hapi_csv_to_speasy_variable(): if 'bins' in hapi meta, then add as many VariableAxis as there are 'centers' list or variable.
- use temp file in test - use SpzVar.time member - HapiCsvParameter unit to units member - extract SpzVariable dtype
6f8dc02 to
0919b52
Compare
|



Enhance hapi csv codec features: read mulitaxis variables, and write speasy to hapi csv.