Skip to content

MPES reader changes from mpes refactoring#203

Merged
domna merged 42 commits intomasterfrom
mpes-reader-update
Jan 22, 2024
Merged

MPES reader changes from mpes refactoring#203
domna merged 42 commits intomasterfrom
mpes-reader-update

Conversation

@domna
Copy link
Collaborator

@domna domna commented Jan 8, 2024

This

Fixes #210, Fixes #212

@domna domna changed the title MPES reader changes from the mpes refactoring MPES reader changes from mpes refactoring Jan 8, 2024
@domna domna marked this pull request as draft January 16, 2024 10:18
@domna
Copy link
Collaborator Author

domna commented Jan 17, 2024

Hey @rettigl, if you like you can already review the config_file.json. This is finished. I still need to update the regression tests and there are still some bugs in pynxtools (#210, #212) , which we need to fix before all tests pass.

I introduced a new feature that the config file can be written in a nested fashion, because I thought this is much easier to read and maintain (and we could easier show an "instance" of data at the workshop). I also added support for *{a,b,c} syntax, which basically replaces all * by the list in brackets, here a, b and c. You can see an example for the lenses, where it just expands for each of the lens labels. I checked the function such that it reproduces the old config, so we should be good. Links are now written as @link:<target> instead of a dict, as this is also more consistent with the file referencing. Except for the new link syntax everything should be compatible to the old flat json notation.

@domna domna requested a review from rettigl January 17, 2024 17:14
@coveralls
Copy link

coveralls commented Jan 18, 2024

Pull Request Test Coverage Report for Build 7585289570

  • 58 of 60 (96.67%) changed or added relevant lines in 6 files are covered.
  • 428 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-3.7%) to 40.812%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pynxtools/dataconverter/readers/utils.py 33 35 94.29%
Files with Coverage Reduction New Missed Lines %
pynxtools/dataconverter/readers/mpes/reader.py 3 83.33%
pynxtools/dataconverter/readers/xps/reader_utils.py 10 25.32%
pynxtools/dataconverter/readers/xps/file_parser.py 18 36.21%
pynxtools/dataconverter/readers/xps/reader.py 155 15.71%
pynxtools/dataconverter/readers/xps/xml/xml_specs.py 242 7.83%
Totals Coverage Status
Change from base Build 7567202073: -3.7%
Covered Lines: 4471
Relevant Lines: 10955

💛 - Coveralls

Copy link
Collaborator

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this looks quite good to me. However, there are a lot of "NOT IN SCHEMA" entries in the log files, some of them new, some of them also old. I think this should be carefully checked if this is supposed to be like this...

If I run the conversion in the tutorials E1 notebook, with definitions on "fairmat" branch, I get the following errors:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.
The required group, /ENTRY[entry]/SAMPLE[sample]/bias/potentiostat, hasn't been supplied while its optional parent, /ENTRY[entry]/SAMPLE[sample]/bias, is supplied.

Still, the file is being created.
If I run with "mpes-fixes" branch, I get:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.

@rettigl
Copy link
Collaborator

rettigl commented Jan 18, 2024

Conversion in E2 fails with:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[36], [line 1](vscode-notebook-cell:?execution_count=36&line=1)
----> [1](vscode-notebook-cell:?execution_count=36&line=1) convert(input_file=["config_file.json", "WSe2_eln.yaml"],
      [2](vscode-notebook-cell:?execution_count=36&line=2)         objects=res_xarray,
      [3](vscode-notebook-cell:?execution_count=36&line=3)         reader='mpes',
      [4](vscode-notebook-cell:?execution_count=36&line=4)         nxdl='NXmpes',
      [5](vscode-notebook-cell:?execution_count=36&line=5)         output='WSe2.mpes.nxs')

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:269, in convert(input_file, reader, nxdl, output, generate_template, fair, undocumented, **kwargs)
    [264](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:264)         logger.info(
    [265](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:265)             f"NO DOCUMENTATION: The path, {path}, is being written but has no documentation."
    [266](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:266)         )
    [268](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:268) helpers.add_default_root_attributes(data=data, filename=os.path.basename(output))
--> [269](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:269) Writer(data=data, nxdl_f_path=nxdl_f_path, output_path=output).write()
    [271](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/convert.py:271) logger.info(f"The output file generated: {output}.")

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:339, in Writer.write(self)
    [337](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:337) """Writes the NeXus file with previously validated data from the reader with NXDL attrs."""
    [338](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:338) try:
--> [339](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:339)     self._put_data_into_hdf5()
    [340](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:340) finally:
    [341](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:341)     self.output_nexus.close()

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:302, in Writer._put_data_into_hdf5(self)
    [296](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:296)         raise IOError(
    [297](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:297)             f"Unknown error occured writing the path: {path} "
    [298](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:298)             f"with the following message: {str(exc)}"
    [299](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:299)         ) from exc
    [301](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:301) for links in hdf5_links_for_later:
--> [302](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:302)     dataset = handle_dicts_entries(*links)
    [303](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:303)     if dataset is None:
    [304](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:304)         # If target of a link is invalid to be linked
    [305](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:305)         del self.data[links[-1]]

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:146, in handle_dicts_entries(data, grp, entry_name, output_path, path)
    [144](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:144) elif "link" in data.keys():
    [145](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:145)     if ":/" not in data["link"]:
--> [146](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:146)         grp[entry_name] = h5py.SoftLink(path)  # internal link
    [147](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:147)     else:
    [148](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/pynxtools/pynxtools/dataconverter/writer.py:148)         grp[entry_name] = h5py.ExternalLink(file, path)  # external link

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File /mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:929, in Dataset.__setitem__(self, args, val)
    [925](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:925) else:
    [926](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:926)     # If the input data is already an array, let HDF5 do the conversion.
    [927](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:927)     # If it's a list or similar, don't make numpy guess a dtype for it.
    [928](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:928)     dt = None if isinstance(val, numpy.ndarray) else self.dtype.base
--> [929](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:929)     val = numpy.asarray(val, order='C', dtype=dt)
    [931](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:931) # Check for array dtype compatibility and convert
    [932](https://vscode-remote+ssh-002dremote-002bpcr840-002egnz-002empg-002ede.vscode-resource.vscode-cdn.net/mnt/pcshare/users/Laurenz/AreaB/pynxtools/.pyenv/lib/python3.8/site-packages/h5py/_hl/dataset.py:932) if self.dtype.subdtype is not None:

TypeError: float() argument must be a string or a number, not 'SoftLink'

@rettigl
Copy link
Collaborator

rettigl commented Jan 18, 2024

This happens at
grafik
because of
grafik

I think we should add a test that the writer does not try to create a link if the link target does not exist...

@domna
Copy link
Collaborator Author

domna commented Jan 19, 2024

Generally, this looks quite good to me. However, there are a lot of "NOT IN SCHEMA" entries in the log files, some of them new, some of them also old. I think this should be carefully checked if this is supposed to be like this...

If I run the conversion in the tutorials E1 notebook, with definitions on "fairmat" branch, I get the following errors:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.
The required group, /ENTRY[entry]/SAMPLE[sample]/bias/potentiostat, hasn't been supplied while its optional parent, /ENTRY[entry]/SAMPLE[sample]/bias, is supplied.

This is expected as for fairmat the config file does not match.

Still, the file is being created. If I run with "mpes-fixes" branch, I get:

The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/energy_scan_mode is required and hasn't been supplied by the reader.
[info]: Path /ENTRY[entry]/PROCESS[process]/energy_calibration/coefficients not found. Skipping the entry.

This is because it is named differently in metadata in the example file than the file we have in pynxtools. Changing this value "@attrs:metadata/energy_correction/calibration/coeffs" to this one "@attrs:metadata/energy_correction/calibration/coefficients" fixes it (L376 in config_file.json). The error about the energy_scan_mode is odd, because it should be set to optional in mpes-fixes and I also don't get this error. Can you try this one again?
Here is the associated PR with updated example files.

Sorry, I also should've mentioned that I did not yet test the examples with this. But I think E1 should be good now and I will check the error for the E2 as well.

@domna
Copy link
Collaborator Author

domna commented Jan 19, 2024

I think we should add a test that the writer does not try to create a link if the link target does not exist...

We check for that and abort if the target does not exist:

logger.warning("No path '%s' available to be linked.", path)

The problem in E2 is that it is trying to write an empty field, I think.
Does it work if you set the metadata manually w/o gathering it from the archive?

@rettigl
Copy link
Collaborator

rettigl commented Jan 19, 2024

The problem in E2 is that it is trying to write an empty field, I think.
Does it work if you set the metadata manually w/o gathering it from the archive?

I can try that. It's probably in the template at that point, but None. So this check should be included (that's what I meant).

@domna
Copy link
Collaborator Author

domna commented Jan 19, 2024

The problem in E2 is that it is trying to write an empty field, I think.
Does it work if you set the metadata manually w/o gathering it from the archive?

I can try that. It's probably in the template at that point, but None. So this check should be included (that's what I meant).

I already checked it. It also has problems when the metadata is set manually and I think it's because there are some additional keys in the associated eln file. I'm working on it rn.

Yes, with the latest changes everything is working for E2 :)

@rettigl please feel free to give it another check but it should be good to go then.

Copy link
Collaborator

@rettigl rettigl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to be working now, also with the examples.

@domna domna merged commit 2831461 into master Jan 22, 2024
@domna domna deleted the mpes-reader-update branch January 22, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants