Skip to content

Commit e65c41c

Browse files
committed
Tiny fixes and README.md.
1 parent eca81c7 commit e65c41c

File tree

6 files changed

+441
-58
lines changed

6 files changed

+441
-58
lines changed

pyproject.toml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ dependencies = [
3636
"Parent Project" = "https://github.com/FAIRmat-NFDI"
3737

3838
[project.optional-dependencies]
39+
nomad = ["nomad-lab >= 1.4.0"]
40+
3941
dev = [
4042
"mypy",
4143
"ruff>=0.14.0",
@@ -52,6 +54,11 @@ docs = [
5254
"mkdocs-simple-hooks",
5355
]
5456

57+
nomad_api = [
58+
"requests",
59+
"nomad-lab >= 1.4.0",
60+
]
61+
5562
[project.entry-points."pynxtools.reader"]
5663
spm = "pynxtools_spm.reader:SPMReader"
5764

@@ -70,7 +77,7 @@ version_scheme = "no-guess-dev"
7077
local_scheme = "node-and-date"
7178

7279
[tool.ruff]
73-
include = ["src/pynxtools_spm/*.py", "tests/*.py"]
80+
include = ["src/pynxtools_spm/**/*.py", "tests/**/*.py"]
7481
lint.select = [
7582
"E", # pycodestyle
7683
"W", # pycodestyle
@@ -101,3 +108,8 @@ ignore_missing_imports = true
101108
follow_imports = "silent"
102109
no_strict_optional = true
103110
disable_error_code = "import, annotation-unchecked"
111+
112+
[tool.uv]
113+
extra-index-url = [
114+
"https://gitlab.mpcdf.mpg.de/api/v4/projects/2187/packages/pypi/simple",
115+
]

run_uploader.sh

100644100755
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
set -e
55

66
current_dir=$(pwd)
7-
uploader_script="/home/rubel/NOMAD-FAIRmat/nomad-distro-dev-RM/packages/pynxtools-spm/src/pynxtools_spm/nomad_uploader/example_upload_script.py"
8-
venv="/home/rubel/NOMAD-FAIRmat/nomad-distro-dev-RM/.venv"
7+
uploader_script="/home/rubel/NOMAD-FAIRmat/GH/pynxtools-spm/src/pynxtools_spm/nomad_uploader/example_upload_script.py"
8+
venv="/home/rubel/NOMAD-FAIRmat/GH/pynxtools-spm/.venv"
99
python_3="$venv/bin/python3"
1010
echo "Running uploader script..."
1111
"$python_3" "$uploader_script" > "$current_dir/debug.txt" 2>&1
Lines changed: 342 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,342 @@
1+
# NOMAD Uploader
2+
3+
A comprehensive Python tool for automated conversion and upload of SPM (Scanning Probe Microscopy) experimental data to the NOMAD (FAIR data management platform). This module converts raw SPM data files (STS, STM, AFM) to NeXus format and uploads them to NOMAD with metadata management.
4+
5+
## Features
6+
7+
- **Automated SPM Data Conversion**: Converts raw SPM files (`.dat`, `.sxm`) to NeXus format (`NXsts`, `NXstm`, `NXafm`)
8+
- **Batch Processing**: Process multiple files in parallel using multiprocessing
9+
- **NOMAD Integration**: Direct upload to NOMAD with OAuth2 authentication
10+
- **Metadata Management**: Modify and manage upload metadata before publishing
11+
- **Status Tracking**: Real-time monitoring of upload and processing status
12+
- **Automatic Decompression**: Handles compressed files automatically
13+
- **Logging**: Comprehensive logging for debugging and tracking progress
14+
- **Error Handling**: Robust error handling with retry mechanisms
15+
16+
## Directory Structure
17+
18+
```
19+
nomad_uploader/
20+
├── README.md # This file
21+
├── uploader.py # Main uploader orchestration
22+
├── nomad_upload_api.py # NOMAD API client
23+
├── reader_config_setup.py # SPM conversion configuration
24+
├── example_upload_script.py # Example usage script
25+
├── helper.py # Utility functions
26+
└── files_movers.py # File management utilities
27+
```
28+
29+
## Module Overview
30+
31+
### `uploader.py`
32+
Main orchestration module containing:
33+
- `NOMADSettings`: Configuration for NOMAD connection and authentication
34+
- `DataProcessingSettings`: Configuration for SPM data processing
35+
- `run_uploader_with()`: Main entry point for uploading data
36+
37+
### `nomad_upload_api.py`
38+
NOMAD REST API client with functions:
39+
- `get_authentication_token()`: OAuth2 authentication
40+
- `upload_to_NOMAD()`: Upload files to NOMAD
41+
- `check_upload_status()`: Monitor upload/processing status
42+
- `publish_upload()`: Publish uploads to NOMAD
43+
- `edit_upload_metadata()`: Modify upload metadata
44+
- `delete_upload()`: Delete failed uploads
45+
- `create_dataset()`: Group uploads into datasets
46+
- `trigger_reprocess_upload()`: Trigger NOMAD reprocessing
47+
48+
### `reader_config_setup.py`
49+
SPM data conversion module:
50+
- `SPMConvertInputParameters`: Configuration for SPM conversion
51+
- `convert_spm_experiments()`: Convert raw SPM data to NeXus format
52+
53+
## Quick Start
54+
55+
### Basic Usage
56+
57+
```python
58+
from pynxtools_spm.nomad_uploader.uploader import (
59+
run_uploader_with,
60+
NOMADSettings,
61+
DataProcessingSettings,
62+
)
63+
from pathlib import Path
64+
65+
# Configure NOMAD connection
66+
nomad_settings = NOMADSettings(
67+
url_protocol="https",
68+
url_domain="nomad-lab.eu",
69+
url_version="prod/v1/develop/api/v1/",
70+
username="your_username",
71+
password="your_password",
72+
token="", # Will be auto-generated
73+
modify_upload_metadata=True,
74+
publish_to_nomad=False,
75+
)
76+
77+
# Configure data processing
78+
data_proc_settings = DataProcessingSettings(
79+
raw_file_exts=(".dat", ".sxm"),
80+
single_batch_processing_time=90, # seconds
81+
logger_dir=Path("./logs"),
82+
src_dir=Path("/path/to/spm/data"),
83+
sts_eln=Path("/path/to/sts_eln.yaml"),
84+
stm_eln=Path("/path/to/stm_eln.yaml"),
85+
afm_eln=Path("/path/to/afm_eln.yaml"),
86+
number_of_uploads=10,
87+
create_pseudo_file=True,
88+
pseudo_exts=".done",
89+
)
90+
91+
# Run uploader
92+
if __name__ == "__main__":
93+
run_uploader_with(
94+
nomad_settings=nomad_settings,
95+
data_proc_settings=data_proc_settings,
96+
)
97+
```
98+
99+
## Configuration
100+
101+
### NOMADSettings
102+
103+
| Parameter | Type | Required | Default | Description |
104+
|-----------|------|----------|---------|-------------|
105+
| `url_protocol` | str | Yes | - | Protocol (`https` or `http`) |
106+
| `url_domain` | str | Yes | - | NOMAD domain (e.g., `nomad-lab.eu`) |
107+
| `url_version` | str | Yes | - | API version path (e.g., `prod/v1/develop/api/v1/`) |
108+
| `username` | str | Yes | - | NOMAD username |
109+
| `password` | str | Yes | - | NOMAD password |
110+
| `token` | str | Yes | - | OAuth2 token (auto-generated on first run) |
111+
| `url` | str | No | Auto | Full API URL (auto-constructed if not provided) |
112+
| `modify_upload_metadata` | bool | No | False | Whether to modify metadata before publish |
113+
| `publish_to_nomad` | bool | No | False | Automatically publish uploads to NOMAD |
114+
| `max_upload_attempt` | int | No | 20 | Max retry attempts for upload status check |
115+
| `nomad_processing_time` | int | No | 3 | Wait time (seconds) between status checks |
116+
117+
### DataProcessingSettings
118+
119+
| Parameter | Type | Required | Default | Description |
120+
|-----------|------|----------|---------|-------------|
121+
| `raw_file_exts` | tuple | Yes | - | Supported file extensions (e.g., `.dat`, `.sxm`) |
122+
| `single_batch_processing_time` | int | Yes | - | Processing timeout per batch (seconds) |
123+
| `logger_dir` | Path | Yes | - | Directory for log files |
124+
| `src_dir` | Path | Yes | - | Source directory with raw SPM files |
125+
| `sts_eln` | Path | Yes | - | Path to STS ELN (Electronic Lab Notebook) file |
126+
| `stm_eln` | Path | Yes | - | Path to STM ELN file |
127+
| `afm_eln` | Path | Yes | - | Path to AFM ELN file |
128+
| `spm_params_obj_l` | List | No | [] | List of conversion parameters (auto-populated) |
129+
| `dst_dir` | Path | No | None | Destination for processed files |
130+
| `create_pseudo_file` | bool | No | True | Create marker file after successful upload |
131+
| `pseudo_exts` | str | No | `.done` | Extension for marker file |
132+
| `sts_config` | Path | No | None | Optional STS-specific config |
133+
| `stm_config` | Path | No | None | Optional STM-specific config |
134+
| `afm_config` | Path | No | None | Optional AFM-specific config |
135+
| `number_of_uploads` | int | No | 10 | Max files to process per batch |
136+
| `delete_failed_uploads` | bool | No | False | Delete uploads on timeout |
137+
| `upload_metadata` | dict | No | None | Metadata to apply to all uploads |
138+
| `file_specific_eln` | dict | No | None | Map filenames to specific ELN files |
139+
140+
## Metadata Management
141+
142+
### Modifying Upload Metadata
143+
144+
```python
145+
# Example metadata structure
146+
metadata = {
147+
"metadata": {
148+
"upload_name": "My SPM Experiment",
149+
"coauthors": ["colleague@institution.edu"],
150+
"references": ["https://doi.org/10.xxxx/xxxxx"],
151+
"datasets": "dataset_id",
152+
"embargo_length": 0, # 0 = public, >0 = days of embargo
153+
"comment": "Description of the experiment"
154+
}
155+
}
156+
157+
# Apply to settings
158+
data_proc_settings.upload_metadata = metadata
159+
```
160+
161+
## Logging
162+
163+
The uploader generates detailed logs in the specified `logger_dir`:
164+
165+
- **`upload.log`**: Upload operations, status checks, API interactions
166+
- **`converter.log`**: SPM data conversion progress and NeXus generation
167+
168+
### Log Levels
169+
170+
- `INFO`: Standard operation messages
171+
- `ERROR`: Failed operations and errors
172+
- `DEBUG`: Detailed debugging information
173+
174+
### Example Log Entry
175+
176+
```
177+
2024-02-06 10:45:23,456 - uploader - INFO - Upload request with Upload ID (7BWDvsn7TmeNyOBHTcgpwA) corresponding to (...)
178+
2024-02-06 10:45:25,789 - uploader - INFO - Upload status for 7BWDvsn7TmeNyOBHTcgpwA: Process process_upload completed successfully
179+
```
180+
181+
## File Processing Workflow
182+
183+
```
184+
Raw SPM File (.dat/.sxm)
185+
186+
[Automatic Detection] (STS/STM/AFM)
187+
188+
[NeXus Conversion] (pynxtools reader)
189+
190+
Intermediate Files
191+
├── NeXus file (.nxs)
192+
└── Metadata file
193+
194+
[Zip Creation]
195+
196+
ZIP Archive
197+
198+
[NOMAD Upload] (OAuth2)
199+
200+
Upload ID assigned
201+
202+
[Status Monitoring] (polling)
203+
204+
[Processing Complete]
205+
206+
[Optional Metadata Edit]
207+
208+
[Optional Publishing]
209+
210+
Marker File Created (.done)
211+
```
212+
213+
## API Workflow
214+
215+
### Authentication Flow
216+
217+
```
218+
NOMADSettings
219+
220+
[OAuth2 Password Grant]
221+
222+
Access Token
223+
224+
API Requests
225+
```
226+
227+
### Upload Status States
228+
229+
1. **Adding files**: Files being uploaded to NOMAD
230+
2. **Process process_upload completed successfully**: Data converted to standardized format
231+
3. **Process process_publish_upload completed successfully**: Published to NOMAD
232+
233+
## Error Handling
234+
235+
### Common Issues and Solutions
236+
237+
#### Authentication Failed
238+
```
239+
Error: Authentication token not found in response
240+
Solution: Verify username, password, and NOMAD API endpoint
241+
```
242+
243+
#### Upload Status Message Not Found
244+
```
245+
Error: Upload status message not found in response
246+
Solution: Check response structure from NOMAD API
247+
```
248+
249+
#### Timeout During Processing
250+
```
251+
Error: Upload is time out for upload ID: XXX
252+
Solution: Increase `max_upload_attempt` or `nomad_processing_time` in NOMADSettings
253+
```
254+
255+
## Advanced Usage
256+
257+
### Processing Only Unprocessed Files
258+
259+
The uploader automatically creates `.done` marker files for successful uploads. On subsequent runs, it only processes files without corresponding `.done` markers.
260+
261+
```python
262+
# First run - processes all files
263+
# Creates: file.dat.done for each successful upload
264+
265+
# Second run - skips already processed files
266+
# Only processes new files without .done markers
267+
```
268+
269+
### Batch Processing with Custom ELN Mapping
270+
271+
```python
272+
data_proc_settings.file_specific_eln = {
273+
"sample1.dat": Path("/path/to/sample1_eln.yaml"),
274+
"sample2.sxm": Path("/path/to/sample2_eln.yaml"),
275+
}
276+
# Default ELN will be used for files not in this mapping
277+
```
278+
279+
### Publishing to NOMAD
280+
281+
```python
282+
nomad_settings.publish_to_nomad = True
283+
nomad_settings.modify_upload_metadata = True
284+
285+
# Configure metadata for publishing
286+
data_proc_settings.upload_metadata = {
287+
"metadata": {
288+
"embargo_length": 0, # Make public immediately
289+
"upload_name": "Published Dataset",
290+
}
291+
}
292+
```
293+
294+
## Performance Tuning
295+
296+
### For Large Batches
297+
298+
```python
299+
data_proc_settings.number_of_uploads = 50 # Process more files per batch
300+
data_proc_settings.single_batch_processing_time = 300 # Longer timeout
301+
nomad_settings.max_upload_attempt = 30 # number of NOMAD API retries
302+
```
303+
304+
### For Large Files
305+
306+
```python
307+
data_proc_settings.single_batch_processing_time = 600 # 10 minutes
308+
nomad_settings.nomad_processing_time = 5 # Longer wait between checks
309+
```
310+
311+
## NOMAD REST API Reference
312+
313+
### Key Endpoints Used
314+
315+
- **Authentication**: `POST /auth/token`
316+
- **Upload Creation**: `POST /uploads?file_name=...&upload_name=...`
317+
- **Upload Status**: `GET /uploads/{upload_id}`
318+
- **Metadata Edit**: `POST /uploads/{upload_id}/edit`
319+
- **Publishing**: `POST /uploads/{upload_id}/action/publish`
320+
- **Dataset Creation**: `POST /datasets/`
321+
322+
For complete NOMAD API documentation, visit: https://nomad-lab.eu/docs
323+
324+
## Examples
325+
326+
See `example_upload_script.py` for a complete working example with real configuration.
327+
328+
## Contributing
329+
330+
When modifying the uploader:
331+
1. Update type hints (`Optional[Literal[...]]` for restricted values)
332+
2. Maintain consistent logger usage (pass `upload_logger` to all functions)
333+
3. Add comprehensive docstrings
334+
4. Update this README with new features
335+
336+
337+
## Support
338+
339+
For issues or questions:
340+
- Check the logs in `logger_dir` for detailed error messages
341+
- Review the NOMAD documentation at https://nomad-lab.eu
342+
- Open an issue on the project repository

0 commit comments

Comments
 (0)