FAIRmat-NFDI
diff --git a/‎README.md‎
Lines changed: 31 additions & 0 deletions b/‎README.md‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎docs/examples/computational_data/basics.md‎
Lines changed: 112 additions & 0 deletions b/‎docs/examples/computational_data/basics.md‎
Lines changed: 112 additions & 0 deletions
diff --git a/‎docs/examples/computational_data/data/Si_gw.zip‎
-68.3 KB b/‎docs/examples/computational_data/data/Si_gw.zip‎
-68.3 KB
diff --git a/‎docs/examples/computational_data/data/example_files.zip‎
-38.1 MB b/‎docs/examples/computational_data/data/example_files.zip‎
-38.1 MB
diff --git a/‎docs/examples/computational_data/data/workflowyaml_files.zip‎
-2.47 KB b/‎docs/examples/computational_data/data/workflowyaml_files.zip‎
-2.47 KB
diff --git a/‎docs/examples/computational_data/faqs.md‎
Lines changed: 138 additions & 0 deletions b/‎docs/examples/computational_data/faqs.md‎
Lines changed: 138 additions & 0 deletions
@@ -73,6 +73,37 @@ This will install all requirements in a virtual environment and start the local
 uv run --extra dev pytest
 ```
 
+
+### How to check and remove unused assets
+
+This repository includes a utility to help keep the documentation tree clean by detecting and safely handling **unreferenced assets** such as files in `images/` and `data/` directories.
+
+#### Tool: `utils/find_unused_assets.py`
+
+This script checks whether files stored in `images/` or `data/` subdirectories (under `docs/`) are actually referenced by any Markdown files in the **same folder**. If not, they are considered unreferenced.
+
+#### Usage
+
+From the root of the repository:
+
+```bash
+# List all unreferenced assets
+python utils/find_unused_assets.py
+```
+
+```bash
+# Move unreferenced assets to .trash/ (safe mode)
+python utils/find_unused_assets.py --remove
+```
+> By default, unreferenced files are **not deleted**. They are moved to a `.trash/` folder at the project root so you can review and recover them if needed.
+
+Here you can check if the deletions have mistakenly broken any links before permanent deletion.
+
+```bash
+# Permanently delete all assets in .trash/
+python utils/find_unused_assets.py --empty-trash
+```
+
 ---
 ## Appendix
 
 
@@ -0,0 +1,112 @@
+# NOMAD Basics: A Computational Perspective
+
+## What you will learn
+
+- An overview of how NOMAD processes and organizes computational data.
+
+## Recommended preparation
+
+- [Tutorial > Navigating to NOMAD](../../tutorial/nomad_repo.md)
+- [Tutorial > Uploading and publishing data](../../tutorial/upload_publish.md)
+- [Tutorial > Exploring data](../../tutorial/explore.md)
+
+## Further resources
+
+- [Tutorial > Managing workflows and projects](../../tutorial/workflows_projects.md)
+- [How-to guides > Programmatic use > Publish data using Python](../../howto/programmatic/publish_python.md)
+- [How-to guides > Customization > Define Workflows](../../howto/customization/workflows.md)
+
+## Processing of supported simulation data
+
+NOMAD ingests the raw input and output files from standard simulation software by first identifying a representative file (denoted the **mainfile**) and then employing a [Parser](../../reference/glossary.md#parser) code to extract relevant (meta)data from not only the mainfile, but also other files (**auxillary files**) associated with that simulation via the parser.
+
+<div class="click-zoom">
+    <label>
+        <input type="checkbox">
+        <img src="./images/parsing_illustration.png" alt="" width="80%" title="Click to zoom in">
+    </label>
+</div>
+
+The extracted (meta)data are stored within a structured schema&mdash;the NOMAD [Metainfo](../../reference/glossary.md/#metainfo)&mdash;to provide context for each quantity, enabling interoperability and comparison between, e.g., simulation software. The Metainfo is constructed from [Sections and Subsections](../../reference/glossary.md#section-and-subsection) and [Quantities](../../reference/glossary.md#quantity), which can be conveniently browsed by users with the [Metinfo Browser](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo){:target="_blank"}:
+
+<div class="click-zoom">
+    <label>
+        <input type="checkbox">
+        <img src="./images/nomad_metainfo.png" alt="" width="100%" title="Click to zoom in">
+    </label>
+</div>
+
+
+In the same upload, there might be multiple mainfiles and auxiliary files organized in a folder tree structure. A separate [Entry](../../reference/glossary.md/#entry) will be created for each mainfile identified. For each entry, an [Archive](../../reference/glossary.md#archive) is created that contains all the extracted (meta)data in a _structured_, _well defined_, and _machine readable_ format. This **metadata** provides context to the raw data, i.e., what were the input methodological parameters, on which material the calculation was performed, etc.
+
+
+
+See the explanation pages [From files to data](../../explanation/basics.md), [Data structure](../../explanation/data.md) and [Processing](../../explanation/processing.md) for a more general description of NOMAD processing.
+
+<!--TODO: add our own supported parsers list with improved info-->
+See [Supported Parsers](https://nomad-lab.eu/prod/v1/staging/docs/reference/parsers.html){:target="_blank"} for a full list of supported codes, mainfiles, auxiliary files, etc.
+
+## Archive sections relevant for computational data
+
+Under the [`Entry` section of the metainfo browser](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo/nomad.datamodel.datamodel.EntryArchive){:target="_blank"}, there are several sections and quantities being populated by the parsers. For computational data, the relevant sections are:
+
+- `metadata`: contains general and non-code specific metadata. This is mainly information about authors, creation of the entry time, identifiers (id), etc.
+- `run`: contains the [Parsed](../../explanation/processing.md#parsing) and [Normalized](../../explanation/processing.md#normalizing) raw data, according to the *legacy* NOMAD simulation schema, [`runschema`](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo/runschema).
+- `data`:  contains the [Parsed](../../explanation/processing.md#parsing) and [Normalized](../../explanation/processing.md#normalizing) raw data, according to the *new* NOMAD simulation schema, [`nomad_simulations`](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo/nomad_simulations).
+- `workflow2`: contains metadata about the specific workflow performed within the entry. This is mainly a set of well-defined workflows, e.g., `GeometryOptimization`, and their parameters.
+- `results`: contains the [Normalized](../../explanation/processing.md#normalizing) and [Search Indexed](../../explanation/basics.md#storing-and-indexing) metadata. This is mainly relevant for searching, filtering, and visualizing data in NOMAD.
+
+
+### Normalization
+
+The parser code reads the code-specific mainfile and auxiliary files and populates the `run` and `workflow2` sections of the `archive`. Subsequently, a cascade of additional code is executed, which varies depending on the exact sections and quantities populated by the parser. This code is responsible for: 1. normalizing or _homogenizing_ certain metadata parsed from different codes, and 2. populating the `results` section.
+
+
+### Search indexing
+
+Only a fraction of the stored metadata is made avaialable for search. In terms of the parsed and normalized quantities, the `results` section stores the searchable quantities. These metadata can be use to filter the database via the GUI or API.
+
+## Organization in NOMAD
+
+### Entries
+
+The compilation of all (meta)data obtained from processing of a single mainfile forms an entry&mdash;the fundamental unit of storage within the NOMAD database&mdash;including simulation input/output, author information, and additional general overarching metadata (e.g., references or comments), as well as an `entry_id`&mdash;a unique identifier.
+
+Once the processing is finished, the uploads page will show if each mainfile process was a `SUCCESS` or `FAILURE`. The entry information can be browsed by clicking on the :fontawesome-solid-arrow-right: icon. The GUI provides the following structure for navigating an entry:
+
+**OVERVIEW tab**
+
+![Overview page](images/overview_page.png){.screenshot}
+
+The overview page contains a summary of the parsed metadata, e.g., tabular information about the material and methodology of the calculation (in the example, a G0W0 calculation done with the [exciting](https://www.exciting-code.org/){:target="_blank"} code for bulk Si<sub>2</sub>), along with a visualization of the system and some relevant properties.
+
+**FILES tab**
+
+The files page contains a browser for the uploaded file structure, with tools for viewing both the processed and raw data.
+
+<!-- TODO - Add image -->
+
+**DATA tab**
+
+The `DATA` page contains a browser for searching through the metadata stored for the entry, according to the NOMAD Metainfo structure. A downloadable JSON version of the archive can be accessed by clicking on the :fontawesome-solid-cloud-arrow-down: icon.
+
+![Data page](images/data_page.png){.screenshot}
+
+
+**LOGS tab**
+
+The `LOGS` page contains a list of info, warning, and error messages from the processing codes (i.e., parsers and normalizers). These provide insight into any potential issues with the upload, especially in the case that the entry displays the `FAILURE` processing status. Please help improve NOMAD by reporting any major issues that you find: [NOMAD > Support](https://nomad-lab.eu/nomad-lab/support.html){:target="_blank"}.
+
+![Logs page](images/logs_page.png){.screenshot}
+
+
+### Uploads
+NOMAD entries can be organized hierarchically into uploads. Since the parsing execution is dependent on automated identification of representative files, users are free to arbitrarily group simulations together upon upload. In this case, multiple entries will be created with the corresponding simulation data. An additional unique identifier, `upload_id`, will be provided for this group of entries. Although the grouping of entries into an upload is not necessarily scientifically meaningful, it is practically useful for submitting batches of files from multiple simulations to NOMAD.
+
+### Workflows
+NOMAD offers flexibility in the construction of workflows. NOMAD also allows the creation of custom workflows, which are completely general directed graphs, allowing users to link NOMAD entries with one another in order to provide the provenance of the simulation data. Custom workflows are contained within their own entries and, thus, have their own set of unique identifiers. To create a custom workflow, the user is required to upload a workflow yaml file describing the inputs and outputs of each entry within the workflow, with respect to sections of the NOMAD Metainfo schema.
+
+### Datasets
+At the highest level, NOMAD groups entries with the use of data sets. A NOMAD data set allows the user to group a large number of entries, without any specification of links between individual entries. A DOI is also generated when a data set is published, providing a convenient route for referencing all data used for a particular investigation within a publication.
+
+<!-- TODO - add some diagrams to explain the organization and remove anything that is not necessary to explain here? -->
@@ -0,0 +1,138 @@
+# Frequently Asked Questions
+
+<!--
+Briefly explain the purpose of the FAQ:
+- who is it for (e.g., developers, end-users, admins)?
+- provide guidance on how to search for answers effectively
+-->
+
+## General Questions
+
+!!! Warning
+
+    Coming soon ...
+
+<!--
+- What is Nomad?
+- Who is it for?
+- How to get started?
+- What are the system requirements?
+-->
+
+## Installation & Setup
+
+!!! Warning
+
+    Coming soon ...
+
+<!--
+- installation how-to for required code
+- dependencies
+- how to update
+- how to uninstall
+-->
+
+## Usage & Features
+
+!!! Warning
+
+    Coming soon ...
+
+<!--
+- what are common tasks, how are they executed
+- what are the key features
+- how to customize settings
+-->
+
+<!-- Here are the current nomad deployments (links at bottom of nomad-lab.eu (attached):
+
+Versioning
+official (prod): updated most infrequently (no exact timeline),
+beta (staging): updated more frequently than prod (no exact timeline),
+test: linked to either prod or beta version (unclear to me),
+develop: updated nightly (link not on website https://nomad-lab.eu/prod/v1/develop/gui/about/information),
+example oasis: update nightly,
+
+Other Info
+official, beta, and develop share a database.,
+test has its own database that is wiped occasionally, such that one can test publishing there.,
+example oasis also has its own database. it does not appear that there is a clear data-wiping strategy since it is mainly intended for testing plugins -->
+
+
+## Troubleshooting
+
+!!! Warning
+
+    Coming soon ...
+
+<!-- ### Getting Help
+
+### Finding resources? -->
+
+
+<!--
+- Why am I getting [specific error message]?
+- How do I reset/reconfigure?
+- Where can I find logs for debugging?
+- How do I report a bug?
+-->
+
+## Licensing & Support
+
+!!! Warning
+
+    Coming soon ...
+
+<!--
+- Is it open-source or proprietory?
+- How do I contact support?
+- Where can I find the documentation/community forum?
+-->
+
+## Advanced Topics
+
+!!! Warning
+
+    Coming soon ...
+
+<!--
+- API integration
+- Customization & extensions
+- Performance optimization
+-->
+
+## Preparing and Managing Raw Data
+
+??? info "What happens to the VASP POTCAR upon upload?"
+    For VASP data, NOMAD complies with the licensing of the `POTCAR` files. In agreement with [Georg Kresse](https://www.vasp.at/info/team/){:target="_blank"}, NOMAD extracts the most important information of the `POTCAR` file and stores them in a stripped version called `POTCAR.stripped`. The `POTCAR` files are then automatically removed from the upload, so that you can safely publish your data.
+
+??? info "Can I upload large MD trajectories?"
+    NOMAD has a file size limit of 30 GB per upload. We additionally advise users to further trim their trajectories for efficient use of the platform tools. In general, it is best to upload a representative set of trajectory frames (depending on the use case), to be findable and understandable to other researchers, and then link the entry to the full raw trajectory within your own (local) storage solution, so that it can be easily accessed upon request. Please see the relevant guides for more information: [`nomad-simulation-parsers` >> Guide to preparing Gromacs trajectories for upload to NOMAD ](https://fairmat-nfdi.github.io/nomad-parser-plugins-simulation/parsers/gromacs/gromacs_about.html){:target="_blank"}
+    <!-- #TODO - Add sub-section link -->
+
+??? info "What do I do if my MD engine is not supported?"
+    The most robust approach for integrating your data into NOMAD is via a standardized parser plugin. However, many modern simulation engines that use fully-flexible scriptable input and non-fixed output files challenge or prevent this approach. For these cases, we provide the `H5MD-NOMAD` specification (i.e., schema and file format) that enables users to self-organize and upload data from any MD software package. See [`nomad-simulation-parsers` > H5MD > About](https://fairmat-nfdi.github.io/nomad-parser-plugins-simulation/parsers/h5md/h5md_about.html){:target="_blank"} for details.
+    <!-- TODO Add sub-page links on the parsers overview page for H5MD  -->
+
+??? info "How should I organize my files for upload"
+
+    We recommend that the user keeps the folder structure and files generated by the simulation code, but without reaching the [uploads limits](../../howto/manage/upload.md#upload-limits).
+    <!-- TODO add some more specifics about constructing uploads -->
+
+## Customization and Development
+
+!!! Warning
+
+    Coming soon ...
+
+
+## Additional Resources
+
+!!! Warning
+
+    Coming soon ...
+
+<!--
+- Links to tutorials, guides, or forums
+- Contact information
+-->