Skip to content

Commit a055210

Browse files
authored
Update comp docs (#45)
* started faqs, removed h5md * start new comp structure * first draft of comp basics * middle of combining workflows from comp and howto-cust * new custom workflows structure started * filling with bullets * referencing tasks in different uploads * refining custom workflows simple workflow * added nested workflows in multiple entries * copied over ELN sections from nomad-tutorial-workflow * rewrote howto>manage>eln, started custom tasks section of workflows * started to add complex workflow example * move complex workflow up * more complex example * full draft of customization > define workflows * refinement of the remaining comp docs for now * update examples overview * review comments + fixed links * added tool to find all unused assets * add remove unused assets * added trash default and empty trash functions * removed print icons * removed unused assets * add tool instructions to README * final checks --------- Co-authored-by: jrudz <rudzinski@mpip-mainz.mpg.de>
1 parent a09e232 commit a055210

File tree

96 files changed

+1052
-2977
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+1052
-2977
lines changed

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,37 @@ This will install all requirements in a virtual environment and start the local
7373
uv run --extra dev pytest
7474
```
7575

76+
77+
### How to check and remove unused assets
78+
79+
This repository includes a utility to help keep the documentation tree clean by detecting and safely handling **unreferenced assets** such as files in `images/` and `data/` directories.
80+
81+
#### Tool: `utils/find_unused_assets.py`
82+
83+
This script checks whether files stored in `images/` or `data/` subdirectories (under `docs/`) are actually referenced by any Markdown files in the **same folder**. If not, they are considered unreferenced.
84+
85+
#### Usage
86+
87+
From the root of the repository:
88+
89+
```bash
90+
# List all unreferenced assets
91+
python utils/find_unused_assets.py
92+
```
93+
94+
```bash
95+
# Move unreferenced assets to .trash/ (safe mode)
96+
python utils/find_unused_assets.py --remove
97+
```
98+
> By default, unreferenced files are **not deleted**. They are moved to a `.trash/` folder at the project root so you can review and recover them if needed.
99+
100+
Here you can check if the deletions have mistakenly broken any links before permanent deletion.
101+
102+
```bash
103+
# Permanently delete all assets in .trash/
104+
python utils/find_unused_assets.py --empty-trash
105+
```
106+
76107
---
77108
## Appendix
78109

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# NOMAD Basics: A Computational Perspective
2+
3+
## What you will learn
4+
5+
- An overview of how NOMAD processes and organizes computational data.
6+
7+
## Recommended preparation
8+
9+
- [Tutorial > Navigating to NOMAD](../../tutorial/nomad_repo.md)
10+
- [Tutorial > Uploading and publishing data](../../tutorial/upload_publish.md)
11+
- [Tutorial > Exploring data](../../tutorial/explore.md)
12+
13+
## Further resources
14+
15+
- [Tutorial > Managing workflows and projects](../../tutorial/workflows_projects.md)
16+
- [How-to guides > Programmatic use > Publish data using Python](../../howto/programmatic/publish_python.md)
17+
- [How-to guides > Customization > Define Workflows](../../howto/customization/workflows.md)
18+
19+
## Processing of supported simulation data
20+
21+
NOMAD ingests the raw input and output files from standard simulation software by first identifying a representative file (denoted the **mainfile**) and then employing a [Parser](../../reference/glossary.md#parser) code to extract relevant (meta)data from not only the mainfile, but also other files (**auxillary files**) associated with that simulation via the parser.
22+
23+
<div class="click-zoom">
24+
<label>
25+
<input type="checkbox">
26+
<img src="./images/parsing_illustration.png" alt="" width="80%" title="Click to zoom in">
27+
</label>
28+
</div>
29+
30+
The extracted (meta)data are stored within a structured schema&mdash;the NOMAD [Metainfo](../../reference/glossary.md/#metainfo)&mdash;to provide context for each quantity, enabling interoperability and comparison between, e.g., simulation software. The Metainfo is constructed from [Sections and Subsections](../../reference/glossary.md#section-and-subsection) and [Quantities](../../reference/glossary.md#quantity), which can be conveniently browsed by users with the [Metinfo Browser](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo){:target="_blank"}:
31+
32+
<div class="click-zoom">
33+
<label>
34+
<input type="checkbox">
35+
<img src="./images/nomad_metainfo.png" alt="" width="100%" title="Click to zoom in">
36+
</label>
37+
</div>
38+
39+
40+
In the same upload, there might be multiple mainfiles and auxiliary files organized in a folder tree structure. A separate [Entry](../../reference/glossary.md/#entry) will be created for each mainfile identified. For each entry, an [Archive](../../reference/glossary.md#archive) is created that contains all the extracted (meta)data in a _structured_, _well defined_, and _machine readable_ format. This **metadata** provides context to the raw data, i.e., what were the input methodological parameters, on which material the calculation was performed, etc.
41+
42+
43+
44+
See the explanation pages [From files to data](../../explanation/basics.md), [Data structure](../../explanation/data.md) and [Processing](../../explanation/processing.md) for a more general description of NOMAD processing.
45+
46+
<!--TODO: add our own supported parsers list with improved info-->
47+
See [Supported Parsers](https://nomad-lab.eu/prod/v1/staging/docs/reference/parsers.html){:target="_blank"} for a full list of supported codes, mainfiles, auxiliary files, etc.
48+
49+
## Archive sections relevant for computational data
50+
51+
Under the [`Entry` section of the metainfo browser](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo/nomad.datamodel.datamodel.EntryArchive){:target="_blank"}, there are several sections and quantities being populated by the parsers. For computational data, the relevant sections are:
52+
53+
- `metadata`: contains general and non-code specific metadata. This is mainly information about authors, creation of the entry time, identifiers (id), etc.
54+
- `run`: contains the [Parsed](../../explanation/processing.md#parsing) and [Normalized](../../explanation/processing.md#normalizing) raw data, according to the *legacy* NOMAD simulation schema, [`runschema`](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo/runschema).
55+
- `data`: contains the [Parsed](../../explanation/processing.md#parsing) and [Normalized](../../explanation/processing.md#normalizing) raw data, according to the *new* NOMAD simulation schema, [`nomad_simulations`](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo/nomad_simulations).
56+
- `workflow2`: contains metadata about the specific workflow performed within the entry. This is mainly a set of well-defined workflows, e.g., `GeometryOptimization`, and their parameters.
57+
- `results`: contains the [Normalized](../../explanation/processing.md#normalizing) and [Search Indexed](../../explanation/basics.md#storing-and-indexing) metadata. This is mainly relevant for searching, filtering, and visualizing data in NOMAD.
58+
59+
60+
### Normalization
61+
62+
The parser code reads the code-specific mainfile and auxiliary files and populates the `run` and `workflow2` sections of the `archive`. Subsequently, a cascade of additional code is executed, which varies depending on the exact sections and quantities populated by the parser. This code is responsible for: 1. normalizing or _homogenizing_ certain metadata parsed from different codes, and 2. populating the `results` section.
63+
64+
65+
### Search indexing
66+
67+
Only a fraction of the stored metadata is made avaialable for search. In terms of the parsed and normalized quantities, the `results` section stores the searchable quantities. These metadata can be use to filter the database via the GUI or API.
68+
69+
## Organization in NOMAD
70+
71+
### Entries
72+
73+
The compilation of all (meta)data obtained from processing of a single mainfile forms an entry&mdash;the fundamental unit of storage within the NOMAD database&mdash;including simulation input/output, author information, and additional general overarching metadata (e.g., references or comments), as well as an `entry_id`&mdash;a unique identifier.
74+
75+
Once the processing is finished, the uploads page will show if each mainfile process was a `SUCCESS` or `FAILURE`. The entry information can be browsed by clicking on the :fontawesome-solid-arrow-right: icon. The GUI provides the following structure for navigating an entry:
76+
77+
**OVERVIEW tab**
78+
79+
![Overview page](images/overview_page.png){.screenshot}
80+
81+
The overview page contains a summary of the parsed metadata, e.g., tabular information about the material and methodology of the calculation (in the example, a G0W0 calculation done with the [exciting](https://www.exciting-code.org/){:target="_blank"} code for bulk Si<sub>2</sub>), along with a visualization of the system and some relevant properties.
82+
83+
**FILES tab**
84+
85+
The files page contains a browser for the uploaded file structure, with tools for viewing both the processed and raw data.
86+
87+
<!-- TODO - Add image -->
88+
89+
**DATA tab**
90+
91+
The `DATA` page contains a browser for searching through the metadata stored for the entry, according to the NOMAD Metainfo structure. A downloadable JSON version of the archive can be accessed by clicking on the :fontawesome-solid-cloud-arrow-down: icon.
92+
93+
![Data page](images/data_page.png){.screenshot}
94+
95+
96+
**LOGS tab**
97+
98+
The `LOGS` page contains a list of info, warning, and error messages from the processing codes (i.e., parsers and normalizers). These provide insight into any potential issues with the upload, especially in the case that the entry displays the `FAILURE` processing status. Please help improve NOMAD by reporting any major issues that you find: [NOMAD > Support](https://nomad-lab.eu/nomad-lab/support.html){:target="_blank"}.
99+
100+
![Logs page](images/logs_page.png){.screenshot}
101+
102+
103+
### Uploads
104+
NOMAD entries can be organized hierarchically into uploads. Since the parsing execution is dependent on automated identification of representative files, users are free to arbitrarily group simulations together upon upload. In this case, multiple entries will be created with the corresponding simulation data. An additional unique identifier, `upload_id`, will be provided for this group of entries. Although the grouping of entries into an upload is not necessarily scientifically meaningful, it is practically useful for submitting batches of files from multiple simulations to NOMAD.
105+
106+
### Workflows
107+
NOMAD offers flexibility in the construction of workflows. NOMAD also allows the creation of custom workflows, which are completely general directed graphs, allowing users to link NOMAD entries with one another in order to provide the provenance of the simulation data. Custom workflows are contained within their own entries and, thus, have their own set of unique identifiers. To create a custom workflow, the user is required to upload a workflow yaml file describing the inputs and outputs of each entry within the workflow, with respect to sections of the NOMAD Metainfo schema.
108+
109+
### Datasets
110+
At the highest level, NOMAD groups entries with the use of data sets. A NOMAD data set allows the user to group a large number of entries, without any specification of links between individual entries. A DOI is also generated when a data set is published, providing a convenient route for referencing all data used for a particular investigation within a publication.
111+
112+
<!-- TODO - add some diagrams to explain the organization and remove anything that is not necessary to explain here? -->
-68.3 KB
Binary file not shown.
-38.1 MB
Binary file not shown.
-2.47 KB
Binary file not shown.
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Frequently Asked Questions
2+
3+
<!--
4+
Briefly explain the purpose of the FAQ:
5+
- who is it for (e.g., developers, end-users, admins)?
6+
- provide guidance on how to search for answers effectively
7+
-->
8+
9+
## General Questions
10+
11+
!!! Warning
12+
13+
Coming soon ...
14+
15+
<!--
16+
- What is Nomad?
17+
- Who is it for?
18+
- How to get started?
19+
- What are the system requirements?
20+
-->
21+
22+
## Installation & Setup
23+
24+
!!! Warning
25+
26+
Coming soon ...
27+
28+
<!--
29+
- installation how-to for required code
30+
- dependencies
31+
- how to update
32+
- how to uninstall
33+
-->
34+
35+
## Usage & Features
36+
37+
!!! Warning
38+
39+
Coming soon ...
40+
41+
<!--
42+
- what are common tasks, how are they executed
43+
- what are the key features
44+
- how to customize settings
45+
-->
46+
47+
<!-- Here are the current nomad deployments (links at bottom of nomad-lab.eu (attached):
48+
49+
Versioning
50+
official (prod): updated most infrequently (no exact timeline),
51+
beta (staging): updated more frequently than prod (no exact timeline),
52+
test: linked to either prod or beta version (unclear to me),
53+
develop: updated nightly (link not on website https://nomad-lab.eu/prod/v1/develop/gui/about/information),
54+
example oasis: update nightly,
55+
56+
Other Info
57+
official, beta, and develop share a database.,
58+
test has its own database that is wiped occasionally, such that one can test publishing there.,
59+
example oasis also has its own database. it does not appear that there is a clear data-wiping strategy since it is mainly intended for testing plugins -->
60+
61+
62+
## Troubleshooting
63+
64+
!!! Warning
65+
66+
Coming soon ...
67+
68+
<!-- ### Getting Help
69+
70+
### Finding resources? -->
71+
72+
73+
<!--
74+
- Why am I getting [specific error message]?
75+
- How do I reset/reconfigure?
76+
- Where can I find logs for debugging?
77+
- How do I report a bug?
78+
-->
79+
80+
## Licensing & Support
81+
82+
!!! Warning
83+
84+
Coming soon ...
85+
86+
<!--
87+
- Is it open-source or proprietory?
88+
- How do I contact support?
89+
- Where can I find the documentation/community forum?
90+
-->
91+
92+
## Advanced Topics
93+
94+
!!! Warning
95+
96+
Coming soon ...
97+
98+
<!--
99+
- API integration
100+
- Customization & extensions
101+
- Performance optimization
102+
-->
103+
104+
## Preparing and Managing Raw Data
105+
106+
??? info "What happens to the VASP POTCAR upon upload?"
107+
For VASP data, NOMAD complies with the licensing of the `POTCAR` files. In agreement with [Georg Kresse](https://www.vasp.at/info/team/){:target="_blank"}, NOMAD extracts the most important information of the `POTCAR` file and stores them in a stripped version called `POTCAR.stripped`. The `POTCAR` files are then automatically removed from the upload, so that you can safely publish your data.
108+
109+
??? info "Can I upload large MD trajectories?"
110+
NOMAD has a file size limit of 30 GB per upload. We additionally advise users to further trim their trajectories for efficient use of the platform tools. In general, it is best to upload a representative set of trajectory frames (depending on the use case), to be findable and understandable to other researchers, and then link the entry to the full raw trajectory within your own (local) storage solution, so that it can be easily accessed upon request. Please see the relevant guides for more information: [`nomad-simulation-parsers` >> Guide to preparing Gromacs trajectories for upload to NOMAD ](https://fairmat-nfdi.github.io/nomad-parser-plugins-simulation/parsers/gromacs/gromacs_about.html){:target="_blank"}
111+
<!-- #TODO - Add sub-section link -->
112+
113+
??? info "What do I do if my MD engine is not supported?"
114+
The most robust approach for integrating your data into NOMAD is via a standardized parser plugin. However, many modern simulation engines that use fully-flexible scriptable input and non-fixed output files challenge or prevent this approach. For these cases, we provide the `H5MD-NOMAD` specification (i.e., schema and file format) that enables users to self-organize and upload data from any MD software package. See [`nomad-simulation-parsers` > H5MD > About](https://fairmat-nfdi.github.io/nomad-parser-plugins-simulation/parsers/h5md/h5md_about.html){:target="_blank"} for details.
115+
<!-- TODO Add sub-page links on the parsers overview page for H5MD -->
116+
117+
??? info "How should I organize my files for upload"
118+
119+
We recommend that the user keeps the folder structure and files generated by the simulation code, but without reaching the [uploads limits](../../howto/manage/upload.md#upload-limits).
120+
<!-- TODO add some more specifics about constructing uploads -->
121+
122+
## Customization and Development
123+
124+
!!! Warning
125+
126+
Coming soon ...
127+
128+
129+
## Additional Resources
130+
131+
!!! Warning
132+
133+
Coming soon ...
134+
135+
<!--
136+
- Links to tutorials, guides, or forums
137+
- Contact information
138+
-->

0 commit comments

Comments
 (0)