|
| 1 | +# Use python to create NeXus files |
| 2 | + |
| 3 | +__The goal__ |
| 4 | + |
| 5 | +Use python to create a NeXus file (.nxs) by hardcoding via the python package `h5py`. NeXus files can also be created by our software [`pynxtools`](https://github.com/FAIRmat-NFDI/pynxtools) automatically, but ONLY IF a reader for the specific device/instrument/data-structure exists. This How-To is intended as easy access to FAIR data structures _via_ NeXus. For static-datastructures (i.e., always the same type of standard measurement) or one-time examples (small data publications), this may provide a feasible solution. For large scaled automated file processing, storage, and validation, it is advisable to use [`pynxtools`](https://github.com/FAIRmat-NFDI/pynxtools) and its measurement method specific [plugins](../reference/plugins.md) |
| 6 | + |
| 7 | +You can find the necessary file downloads [here](https://zenodo.org/records/13373909). |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +## Make NeXus file by python |
| 12 | + |
| 13 | +Install `h5py` via `pip`: |
| 14 | +```console |
| 15 | +`pip install h5py` |
| 16 | +``` |
| 17 | + |
| 18 | +Then you can create a NeXus file by the python script called [h5py_nexus_file_creation.py](https://zenodo.org/records/13373909/files/h5py_nexus_file_creation.py?download=1). |
| 19 | + |
| 20 | +``` |
| 21 | +# Import h5py, to write an hdf5 file |
| 22 | +import h5py |
| 23 | +
|
| 24 | +# create a h5py file in writing mode with given name "NXopt_minimal_example", file extension "nxs" |
| 25 | +f = h5py.File("NXopt_minimal_example.nxs", "w") |
| 26 | +
|
| 27 | +# there are only 3 fundamental objects: >group<, >attribute< and >datafield<. |
| 28 | +
|
| 29 | +
|
| 30 | +# create a >group< called "entry" |
| 31 | +f.create_group('/entry') |
| 32 | +
|
| 33 | +# assign the >group< called "entry" an >attribute< |
| 34 | +# The attribute is "NX_class"(a NeXus class) with the value of this class is "NXentry" |
| 35 | +f['/entry'].attrs['NX_class'] = 'NXentry' |
| 36 | +
|
| 37 | +# create >datafield< called "definition" inside the entry, and assign it the value "NXoptical_spectroscopy" |
| 38 | +# This field is important, as it is used in validation process to identify the NeXus definition. |
| 39 | +f['/entry/definition'] = 'NXoptical_spectroscopy' |
| 40 | +``` |
| 41 | + |
| 42 | +This proves a starting point of the NeXus file. We will go through these functions in the following. |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +## Add NeXus concepts by python |
| 47 | + |
| 48 | +Go to [FAIRmat NeXus definitions](<https://fairmat-nfdi.github.io/nexus_definitions/index.html#>) |
| 49 | + |
| 50 | +Scroll down until you see the search box named "Quick search". |
| 51 | + |
| 52 | +Type "NXoptical" and press start the search. |
| 53 | + |
| 54 | +You see several search results, select the one with is named "NXoptical\_spectroscopy". |
| 55 | + |
| 56 | +Then you are (ideally) on this page: [NXoptical_spectroscopy NeXus definition](<https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXoptical_spectroscopy.html>) |
| 57 | + |
| 58 | +You see a tree-like structure of the NeXus definition NXoptical\_spectrosocopy with several tree nodes: Status, Description, Symbols, Groups\_cited, Structure. For now, only the part in Structure is of interest. This contains the information which has to be written in the python code to add fields/groups/attributes to the NeXus file. |
| 59 | + |
| 60 | +Use your browser search (CRTL+F) and search for "required". Ideally, your browser highlights all concepts which are required. You have to add those to the python script to extend your created .nxs file. (Which fields/groups/attributes are "required" was defined by the respective scientific community, to ensure that the data serves the FAIR principles.) |
| 61 | + |
| 62 | +In the following, it will be shown how the python script has to be extended for the three fundamental objects: |
| 63 | + |
| 64 | +1. Attribute |
| 65 | + |
| 66 | +2. Datafield |
| 67 | + |
| 68 | +3. Group |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | + |
| 74 | +### Adding an attribute |
| 75 | + |
| 76 | +Search for the first concept/object in the NeXus file which is not created yet. It is: |
| 77 | + |
| 78 | +**@version**: (required) [NX\_CHAR](<https://fairmat-nfdi.github.io/nexus_definitions/nxdl-types.html#nx-char>) [⤆](<https://fairmat-nfdi.github.io/nexus_definitions/classes/base_classes/NXentry.html#nxentry-definition-version-attribute>) |
| 79 | + |
| 80 | +1. It is located in the tree at position: ENTRY/definition/ |
| 81 | + |
| 82 | +2. The "@" indicates that this is an attribute of the concept "definition". |
| 83 | + |
| 84 | +3. The name of the attribute is "version". |
| 85 | + |
| 86 | +4. Since it is "required", that means this attribute has to be added so that the resulting NeXus file is compliant with the NeXus definition "NXoptical\_spectroscopy". |
| 87 | + |
| 88 | +5. The "NX\_CHAR" indicates the datatype. This should be a string: "The preferred string representation is UTF-8" (more information see [here](<https://manual.nexusformat.org/nxdl-types.html>)) |
| 89 | + |
| 90 | + |
| 91 | + |
| 92 | +Now the python script has to be extended in the following: |
| 93 | + |
| 94 | +``` |
| 95 | +f['/entry/definition'].attrs['version'] = 'v2024.02' |
| 96 | +``` |
| 97 | + |
| 98 | +This h5py command adds the attribute named "version" with the value "v2024.02" to the HDF5 dataset called "/entry/definition". The same is done for the URL attribute: |
| 99 | + |
| 100 | +``` |
| 101 | +f['/entry/definition'].attrs['URL'] = 'https://github.com/FAIRmat-NFDI/nexus_definitions/blob/f75a29836431f35d68df6174e3868a0418523397/contributed_definitions/NXoptical_spectroscopy.nxdl.xml' |
| 102 | +``` |
| 103 | + |
| 104 | +For your use case, you may want to use a different version of the NeXus definitions, since these are changed over time. In the following, it is shown where to obtain the correct version and URL. |
| 105 | + |
| 106 | +__Get the values: *version* and *URL*__ |
| 107 | + |
| 108 | +At the time, you create the NeXus definition. Go to the page of the respectively used NeXus concept, i.e. [NXoptical_spectroscopy](<https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXoptical_spectroscopy.html>) |
| 109 | + |
| 110 | +Scroll down until you find "**NXDL Source**:" and follow this link, i.e. [NXoptical_spectroscopy.nxdl.xml](<https://github.com/FAIRmat-NFDI/nexus_definitions/blob/fairmat/contributed_definitions/NXoptical_spectroscopy.nxdl.xml>) |
| 111 | + |
| 112 | +This is the GitHub website, in which the latest (FAIRmat) NeXus definition of NXoptical\_spectroscopy is stored in the NeXus definition language file (.nxdl). The information is structured in the xml format. |
| 113 | + |
| 114 | +Now you have to copy the permalink of this file. Go to the top right side of the website. Find the Menu made by 3 dots: |
| 115 | + |
| 116 | + |
| 117 | + |
| 118 | +Copy the permalink and insert it as value for the "URL" attribute (Step 1, Red box in the image) |
| 119 | + |
| 120 | +Go to "nexus\_definitions" (Step 2, Red box in the image) |
| 121 | + |
| 122 | + |
| 123 | + |
| 124 | +On the right side, you should see below "Releases" the "tags" (Red box in the image). Follow this link. |
| 125 | + |
| 126 | +Copy the latest tag, which should look similar to "v2024.02". Insert it as value for the "version" attribute. |
| 127 | + |
| 128 | +__Disclaimer__ |
| 129 | +When specifying this version tag, it would be better to include the "GitHub commit id" as well. In this way, a [pynxtools generated version tag](https://github.com/FAIRmat-NFDI/pynxtools/blob/c13716915bf8f69068c3b94d1423681b580fd437/src/pynxtools/_build_wrapper.py#L17) might look like this: |
| 130 | +`v2022.07.post1.dev1278+g1d7000f4`. If you have pynxtools installed, you can get the tag by: |
| 131 | + |
| 132 | +```python |
| 133 | +>>> from pynxtools import get_nexus_version |
| 134 | +>>> get_nexus_version() |
| 135 | +'v2022.07.post1.dev1284+gf75a2983' |
| 136 | +``` |
| 137 | + |
| 138 | + |
| 139 | + |
| 140 | +### Adding a datafield |
| 141 | + |
| 142 | +Two attributes were added to "ENTRY/definition", both of which were required. By now, this part of the NeXus file fulfills the requirements of the application definition NXoptical\_spectroscopy. |
| 143 | + |
| 144 | +The next required concept of [NXoptical_spectrsocopy](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXoptical_spectroscopy.html) is "**experiment\_type"**. |
| 145 | + |
| 146 | +**experiment\_type**: (required) [NX\_CHAR](<https://fairmat-nfdi.github.io/nexus_definitions/nxdl-types.html#nx-char>) |
| 147 | + |
| 148 | +1. It is located in the tree at position: ENTRY/ |
| 149 | + |
| 150 | +2. There is no "@" in front of "**experiment\_type"**. So, this may be a group or a datafield. |
| 151 | + |
| 152 | +3. The name of this group/datafield is "**experiment\_type**". |
| 153 | + |
| 154 | +4. The "required" indicates that this group/datafield has to be added to be in line with the NeXus definition "NXoptical\_spectroscopy". |
| 155 | + |
| 156 | +5. The "NX\_CHAR" indicates the datatype. This should be a string: "The preferred string representation is UTF-8" (more information see [here](<https://manual.nexusformat.org/nxdl-types.html>)). |
| 157 | + |
| 158 | +6. The "NX\_CHAR" indicates that this is a datafield. It is NOT a group. |
| 159 | + A group is a NeXus class. "NXentry" is for example a NeXus class, while "NX_CHAR" indicates the datatype of the field. |
| 160 | + Whether or not the underscore "_" is present after NX, indicates therefore if it is a NeXus class or datafield. |
| 161 | + |
| 162 | +Read the documentation at "▶ Specify the type of the optical experiment. ..." by extending it via click on the triangle symbol. You should see something like this: |
| 163 | + |
| 164 | + |
| 165 | + |
| 166 | +There, the value of the datafield has to be one of the shown list, since it is an enumeration (e.g. "transmission spectroscopy"). Note that this is case sensitive. |
| 167 | + |
| 168 | +Therefore, the python script has to be extended by: |
| 169 | + |
| 170 | +``` |
| 171 | +f['/entry/experiment_type'] = 'transmission spectroscopy' |
| 172 | +``` |
| 173 | + |
| 174 | + |
| 175 | + |
| 176 | + |
| 177 | + |
| 178 | +### Adding a group |
| 179 | + |
| 180 | +The first required group in NXoptical\_spectroscopy on the "ENTRY/" level is "**INSTRUMENT**: (required) [NXinstrument](<https://fairmat-nfdi.github.io/nexus_definitions/classes/base_classes/NXinstrument.html#nxinstrument>) [⤆"](<https://fairmat-nfdi.github.io/nexus_definitions/classes/base_classes/NXentry.html#nxentry-instrument-group>) |
| 181 | + |
| 182 | +1. It is located in the tree at position: NXentry/ |
| 183 | + |
| 184 | +2. There is no "@" in front of "**INSTRUMENT"** and because the "NXinstrument" is a NeXus class, this has to be implemented as group in the python script. |
| 185 | + |
| 186 | +3. The "required" indicates that this group has to be added to be in line with the NeXus definition "NXoptical\_spectroscopy". |
| 187 | + |
| 188 | +4. The "NXinstrument" indicates that it is a NeXus class (or group in python), as it starts with "NX" - without an underscore "_". It can also not be found at the [data types](https://manual.nexusformat.org/nxdl-types.html#data-types-allowed-in-nxdl-specifications). |
| 189 | + |
| 190 | +5. As this is a group, attributes or values may be assigned to it. |
| 191 | + |
| 192 | +6. As this is a group, it can contain many datafields or groups. |
| 193 | + |
| 194 | +7. The uppercase notation of "**INSTRUMENT**" means: |
| 195 | + |
| 196 | + 1. You can give INSTRUMENT [almost](https://manual.nexusformat.org/datarules.html) any name, such as "abc" or "Raman\_setup" (see "regex" or regular expression). |
| 197 | + |
| 198 | + 2. You can create as many groups with the class NXinstrument as you want. Their names have to be different. |
| 199 | + |
| 200 | + 3. For more information see the [NeXus rules](../learn/nexus-rules.md) |
| 201 | + |
| 202 | +The respective python code to implement a NXinstrument class (or equivalently in python group) with the name "experiment\_setup\_1" is: |
| 203 | + |
| 204 | +``` |
| 205 | +f.create_group('/entry/experiment_setup_1') |
| 206 | +f['/entry/experiment_setup_1'].attrs['NX_class'] = 'NXinstrument' |
| 207 | +``` |
| 208 | + |
| 209 | +The first line creates the group with the name "experiment\_setup\_1". |
| 210 | + |
| 211 | +The second line assigns this group the attribute with the name "NX\_class" and its value "NXinstrument". |
| 212 | + |
| 213 | + |
| 214 | + |
| 215 | + |
| 216 | + |
| 217 | +### Finishing the NeXus file |
| 218 | + |
| 219 | +This has to be done by using the respective NeXus definition website: |
| 220 | + |
| 221 | +[NXoptical_spectroscopy](<https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXoptical_spectroscopy.html>) |
| 222 | + |
| 223 | +And by searching for all "required" entries. The next required entries are located inside the NXinstrument class: |
| 224 | + |
| 225 | +1. **beam\_TYPE**: (required) [NXbeam](<https://fairmat-nfdi.github.io/nexus_definitions/classes/base_classes/NXbeam.html#nxbeam>) [⤆](<https://fairmat-nfdi.github.io/nexus_definitions/classes/base_classes/NXinstrument.html#nxinstrument-beam-group>) |
| 226 | + |
| 227 | +2. **detector\_TYPE**: (required) [NXdetector](<https://fairmat-nfdi.github.io/nexus_definitions/classes/base_classes/NXdetector.html#nxdetector>) [⤆](<https://fairmat-nfdi.github.io/nexus_definitions/classes/base_classes/NXinstrument.html#nxinstrument-detector-group>) |
| 228 | + |
| 229 | +Both are groups. "**beam\_TYPE"** could be named: "beam\_abc" or "beam\_Raman\_setup". Use the knowledge above to extend the python script to create those NeXus file entries. |
| 230 | + |
| 231 | +__Note for required NeXus concepts__ |
| 232 | + |
| 233 | +Above in the definition of NXoptical\_spectroscopy, you as well may found a required entry "**depends\_on**: (required) [NX\_CHAR](<https://fairmat-nfdi.github.io/nexus_definitions/nxdl-types.html#nx-char>) [⤆"](<https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXcoordinate_system.html#nxcoordinate-system-depends-on-field>). This is at the level of "ENTRY/reference\_frames/beam\_ref\_frame". If you don't have the group "**beam\_ref\_frame"** because this is "optional", then you don't need to have this field. |
| 234 | + |
| 235 | + |
| 236 | + |
| 237 | +[_Continue by validating the NeXus file_](validate-nexus-file.md) |
| 238 | + |
| 239 | +## Feedback and contact |
| 240 | + |
| 241 | +1. Best way is to contact the FAIRmat team directly by creating a [Github Issue](https://github.com/FAIRmat-NFDI/nexus_definitions/issues/new). |
| 242 | + |
| 243 | +2. ron.hildebrandt(at)physik.hu-berlin.de |
| 244 | + |
| 245 | + |
| 246 | + |
0 commit comments