|
| 1 | +# NOMAD Uploader |
| 2 | + |
| 3 | +A comprehensive Python tool for automated conversion and upload of SPM (Scanning Probe Microscopy) experimental data to the NOMAD (FAIR data management platform). This module converts raw SPM data files (STS, STM, AFM) to NeXus format and uploads them to NOMAD with metadata management. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Automated SPM Data Conversion**: Converts raw SPM files (`.dat`, `.sxm`) to NeXus format (`NXsts`, `NXstm`, `NXafm`) |
| 8 | +- **Batch Processing**: Process multiple files in parallel using multiprocessing |
| 9 | +- **NOMAD Integration**: Direct upload to NOMAD with OAuth2 authentication |
| 10 | +- **Metadata Management**: Modify and manage upload metadata before publishing |
| 11 | +- **Status Tracking**: Real-time monitoring of upload and processing status |
| 12 | +- **Automatic Decompression**: Handles compressed files automatically |
| 13 | +- **Logging**: Comprehensive logging for debugging and tracking progress |
| 14 | +- **Error Handling**: Robust error handling with retry mechanisms |
| 15 | + |
| 16 | +## Directory Structure |
| 17 | + |
| 18 | +``` |
| 19 | +nomad_uploader/ |
| 20 | +├── README.md # This file |
| 21 | +├── uploader.py # Main uploader orchestration |
| 22 | +├── nomad_upload_api.py # NOMAD API client |
| 23 | +├── reader_config_setup.py # SPM conversion configuration |
| 24 | +├── example_upload_script.py # Example usage script |
| 25 | +├── helper.py # Utility functions |
| 26 | +└── files_movers.py # File management utilities |
| 27 | +``` |
| 28 | + |
| 29 | +## Module Overview |
| 30 | + |
| 31 | +### `uploader.py` |
| 32 | +Main orchestration module containing: |
| 33 | +- `NOMADSettings`: Configuration for NOMAD connection and authentication |
| 34 | +- `DataProcessingSettings`: Configuration for SPM data processing |
| 35 | +- `run_uploader_with()`: Main entry point for uploading data |
| 36 | + |
| 37 | +### `nomad_upload_api.py` |
| 38 | +NOMAD REST API client with functions: |
| 39 | +- `get_authentication_token()`: OAuth2 authentication |
| 40 | +- `upload_to_NOMAD()`: Upload files to NOMAD |
| 41 | +- `check_upload_status()`: Monitor upload/processing status |
| 42 | +- `publish_upload()`: Publish uploads to NOMAD |
| 43 | +- `edit_upload_metadata()`: Modify upload metadata |
| 44 | +- `delete_upload()`: Delete failed uploads |
| 45 | +- `create_dataset()`: Group uploads into datasets |
| 46 | +- `trigger_reprocess_upload()`: Trigger NOMAD reprocessing |
| 47 | + |
| 48 | +### `reader_config_setup.py` |
| 49 | +SPM data conversion module: |
| 50 | +- `SPMConvertInputParameters`: Configuration for SPM conversion |
| 51 | +- `convert_spm_experiments()`: Convert raw SPM data to NeXus format |
| 52 | + |
| 53 | +## Quick Start |
| 54 | + |
| 55 | +### Basic Usage |
| 56 | + |
| 57 | +```python |
| 58 | +from pynxtools_spm.nomad_uploader.uploader import ( |
| 59 | + run_uploader_with, |
| 60 | + NOMADSettings, |
| 61 | + DataProcessingSettings, |
| 62 | +) |
| 63 | +from pathlib import Path |
| 64 | + |
| 65 | +# Configure NOMAD connection |
| 66 | +nomad_settings = NOMADSettings( |
| 67 | + url_protocol="https", |
| 68 | + url_domain="nomad-lab.eu", |
| 69 | + url_version="prod/v1/develop/api/v1/", |
| 70 | + username="your_username", |
| 71 | + password="your_password", |
| 72 | + token="", # Will be auto-generated |
| 73 | + modify_upload_metadata=True, |
| 74 | + publish_to_nomad=False, |
| 75 | +) |
| 76 | + |
| 77 | +# Configure data processing |
| 78 | +data_proc_settings = DataProcessingSettings( |
| 79 | + raw_file_exts=(".dat", ".sxm"), |
| 80 | + single_batch_processing_time=90, # seconds |
| 81 | + logger_dir=Path("./logs"), |
| 82 | + src_dir=Path("/path/to/spm/data"), |
| 83 | + sts_eln=Path("/path/to/sts_eln.yaml"), |
| 84 | + stm_eln=Path("/path/to/stm_eln.yaml"), |
| 85 | + afm_eln=Path("/path/to/afm_eln.yaml"), |
| 86 | + number_of_uploads=10, |
| 87 | + create_pseudo_file=True, |
| 88 | + pseudo_exts=".done", |
| 89 | +) |
| 90 | + |
| 91 | +# Run uploader |
| 92 | +if __name__ == "__main__": |
| 93 | + run_uploader_with( |
| 94 | + nomad_settings=nomad_settings, |
| 95 | + data_proc_settings=data_proc_settings, |
| 96 | + ) |
| 97 | +``` |
| 98 | + |
| 99 | +## Configuration |
| 100 | + |
| 101 | +### NOMADSettings |
| 102 | + |
| 103 | +| Parameter | Type | Required | Default | Description | |
| 104 | +|-----------|------|----------|---------|-------------| |
| 105 | +| `url_protocol` | str | Yes | - | Protocol (`https` or `http`) | |
| 106 | +| `url_domain` | str | Yes | - | NOMAD domain (e.g., `nomad-lab.eu`) | |
| 107 | +| `url_version` | str | Yes | - | API version path (e.g., `prod/v1/develop/api/v1/`) | |
| 108 | +| `username` | str | Yes | - | NOMAD username | |
| 109 | +| `password` | str | Yes | - | NOMAD password | |
| 110 | +| `token` | str | Yes | - | OAuth2 token (auto-generated on first run) | |
| 111 | +| `url` | str | No | Auto | Full API URL (auto-constructed if not provided) | |
| 112 | +| `modify_upload_metadata` | bool | No | False | Whether to modify metadata before publish | |
| 113 | +| `publish_to_nomad` | bool | No | False | Automatically publish uploads to NOMAD | |
| 114 | +| `max_upload_attempt` | int | No | 20 | Max retry attempts for upload status check | |
| 115 | +| `nomad_processing_time` | int | No | 3 | Wait time (seconds) between status checks | |
| 116 | + |
| 117 | +### DataProcessingSettings |
| 118 | + |
| 119 | +| Parameter | Type | Required | Default | Description | |
| 120 | +|-----------|------|----------|---------|-------------| |
| 121 | +| `raw_file_exts` | tuple | Yes | - | Supported file extensions (e.g., `.dat`, `.sxm`) | |
| 122 | +| `single_batch_processing_time` | int | Yes | - | Processing timeout per batch (seconds) | |
| 123 | +| `logger_dir` | Path | Yes | - | Directory for log files | |
| 124 | +| `src_dir` | Path | Yes | - | Source directory with raw SPM files | |
| 125 | +| `sts_eln` | Path | Yes | - | Path to STS ELN (Electronic Lab Notebook) file | |
| 126 | +| `stm_eln` | Path | Yes | - | Path to STM ELN file | |
| 127 | +| `afm_eln` | Path | Yes | - | Path to AFM ELN file | |
| 128 | +| `spm_params_obj_l` | List | No | [] | List of conversion parameters (auto-populated) | |
| 129 | +| `dst_dir` | Path | No | None | Destination for processed files | |
| 130 | +| `create_pseudo_file` | bool | No | True | Create marker file after successful upload | |
| 131 | +| `pseudo_exts` | str | No | `.done` | Extension for marker file | |
| 132 | +| `sts_config` | Path | No | None | Optional STS-specific config | |
| 133 | +| `stm_config` | Path | No | None | Optional STM-specific config | |
| 134 | +| `afm_config` | Path | No | None | Optional AFM-specific config | |
| 135 | +| `number_of_uploads` | int | No | 10 | Max files to process per batch | |
| 136 | +| `delete_failed_uploads` | bool | No | False | Delete uploads on timeout | |
| 137 | +| `upload_metadata` | dict | No | None | Metadata to apply to all uploads | |
| 138 | +| `file_specific_eln` | dict | No | None | Map filenames to specific ELN files | |
| 139 | + |
| 140 | +## Metadata Management |
| 141 | + |
| 142 | +### Modifying Upload Metadata |
| 143 | + |
| 144 | +```python |
| 145 | +# Example metadata structure |
| 146 | +metadata = { |
| 147 | + "metadata": { |
| 148 | + "upload_name": "My SPM Experiment", |
| 149 | + "coauthors": ["colleague@institution.edu"], |
| 150 | + "references": ["https://doi.org/10.xxxx/xxxxx"], |
| 151 | + "datasets": "dataset_id", |
| 152 | + "embargo_length": 0, # 0 = public, >0 = days of embargo |
| 153 | + "comment": "Description of the experiment" |
| 154 | + } |
| 155 | +} |
| 156 | + |
| 157 | +# Apply to settings |
| 158 | +data_proc_settings.upload_metadata = metadata |
| 159 | +``` |
| 160 | + |
| 161 | +## Logging |
| 162 | + |
| 163 | +The uploader generates detailed logs in the specified `logger_dir`: |
| 164 | + |
| 165 | +- **`upload.log`**: Upload operations, status checks, API interactions |
| 166 | +- **`converter.log`**: SPM data conversion progress and NeXus generation |
| 167 | + |
| 168 | +### Log Levels |
| 169 | + |
| 170 | +- `INFO`: Standard operation messages |
| 171 | +- `ERROR`: Failed operations and errors |
| 172 | +- `DEBUG`: Detailed debugging information |
| 173 | + |
| 174 | +### Example Log Entry |
| 175 | + |
| 176 | +``` |
| 177 | +2024-02-06 10:45:23,456 - uploader - INFO - Upload request with Upload ID (7BWDvsn7TmeNyOBHTcgpwA) corresponding to (...) |
| 178 | +2024-02-06 10:45:25,789 - uploader - INFO - Upload status for 7BWDvsn7TmeNyOBHTcgpwA: Process process_upload completed successfully |
| 179 | +``` |
| 180 | + |
| 181 | +## File Processing Workflow |
| 182 | + |
| 183 | +``` |
| 184 | +Raw SPM File (.dat/.sxm) |
| 185 | + ↓ |
| 186 | +[Automatic Detection] (STS/STM/AFM) |
| 187 | + ↓ |
| 188 | +[NeXus Conversion] (pynxtools reader) |
| 189 | + ↓ |
| 190 | +Intermediate Files |
| 191 | + ├── NeXus file (.nxs) |
| 192 | + └── Metadata file |
| 193 | + ↓ |
| 194 | +[Zip Creation] |
| 195 | + ↓ |
| 196 | +ZIP Archive |
| 197 | + ↓ |
| 198 | +[NOMAD Upload] (OAuth2) |
| 199 | + ↓ |
| 200 | +Upload ID assigned |
| 201 | + ↓ |
| 202 | +[Status Monitoring] (polling) |
| 203 | + ↓ |
| 204 | +[Processing Complete] |
| 205 | + ↓ |
| 206 | +[Optional Metadata Edit] |
| 207 | + ↓ |
| 208 | +[Optional Publishing] |
| 209 | + ↓ |
| 210 | +Marker File Created (.done) |
| 211 | +``` |
| 212 | + |
| 213 | +## API Workflow |
| 214 | + |
| 215 | +### Authentication Flow |
| 216 | + |
| 217 | +``` |
| 218 | +NOMADSettings |
| 219 | + ↓ |
| 220 | +[OAuth2 Password Grant] |
| 221 | + ↓ |
| 222 | +Access Token |
| 223 | + ↓ |
| 224 | +API Requests |
| 225 | +``` |
| 226 | + |
| 227 | +### Upload Status States |
| 228 | + |
| 229 | +1. **Adding files**: Files being uploaded to NOMAD |
| 230 | +2. **Process process_upload completed successfully**: Data converted to standardized format |
| 231 | +3. **Process process_publish_upload completed successfully**: Published to NOMAD |
| 232 | + |
| 233 | +## Error Handling |
| 234 | + |
| 235 | +### Common Issues and Solutions |
| 236 | + |
| 237 | +#### Authentication Failed |
| 238 | +``` |
| 239 | +Error: Authentication token not found in response |
| 240 | +Solution: Verify username, password, and NOMAD API endpoint |
| 241 | +``` |
| 242 | + |
| 243 | +#### Upload Status Message Not Found |
| 244 | +``` |
| 245 | +Error: Upload status message not found in response |
| 246 | +Solution: Check response structure from NOMAD API |
| 247 | +``` |
| 248 | + |
| 249 | +#### Timeout During Processing |
| 250 | +``` |
| 251 | +Error: Upload is time out for upload ID: XXX |
| 252 | +Solution: Increase `max_upload_attempt` or `nomad_processing_time` in NOMADSettings |
| 253 | +``` |
| 254 | + |
| 255 | +## Advanced Usage |
| 256 | + |
| 257 | +### Processing Only Unprocessed Files |
| 258 | + |
| 259 | +The uploader automatically creates `.done` marker files for successful uploads. On subsequent runs, it only processes files without corresponding `.done` markers. |
| 260 | + |
| 261 | +```python |
| 262 | +# First run - processes all files |
| 263 | +# Creates: file.dat.done for each successful upload |
| 264 | + |
| 265 | +# Second run - skips already processed files |
| 266 | +# Only processes new files without .done markers |
| 267 | +``` |
| 268 | + |
| 269 | +### Batch Processing with Custom ELN Mapping |
| 270 | + |
| 271 | +```python |
| 272 | +data_proc_settings.file_specific_eln = { |
| 273 | + "sample1.dat": Path("/path/to/sample1_eln.yaml"), |
| 274 | + "sample2.sxm": Path("/path/to/sample2_eln.yaml"), |
| 275 | +} |
| 276 | +# Default ELN will be used for files not in this mapping |
| 277 | +``` |
| 278 | + |
| 279 | +### Publishing to NOMAD |
| 280 | + |
| 281 | +```python |
| 282 | +nomad_settings.publish_to_nomad = True |
| 283 | +nomad_settings.modify_upload_metadata = True |
| 284 | + |
| 285 | +# Configure metadata for publishing |
| 286 | +data_proc_settings.upload_metadata = { |
| 287 | + "metadata": { |
| 288 | + "embargo_length": 0, # Make public immediately |
| 289 | + "upload_name": "Published Dataset", |
| 290 | + } |
| 291 | +} |
| 292 | +``` |
| 293 | + |
| 294 | +## Performance Tuning |
| 295 | + |
| 296 | +### For Large Batches |
| 297 | + |
| 298 | +```python |
| 299 | +data_proc_settings.number_of_uploads = 50 # Process more files per batch |
| 300 | +data_proc_settings.single_batch_processing_time = 300 # Longer timeout |
| 301 | +nomad_settings.max_upload_attempt = 30 # number of NOMAD API retries |
| 302 | +``` |
| 303 | + |
| 304 | +### For Large Files |
| 305 | + |
| 306 | +```python |
| 307 | +data_proc_settings.single_batch_processing_time = 600 # 10 minutes |
| 308 | +nomad_settings.nomad_processing_time = 5 # Longer wait between checks |
| 309 | +``` |
| 310 | + |
| 311 | +## NOMAD REST API Reference |
| 312 | + |
| 313 | +### Key Endpoints Used |
| 314 | + |
| 315 | +- **Authentication**: `POST /auth/token` |
| 316 | +- **Upload Creation**: `POST /uploads?file_name=...&upload_name=...` |
| 317 | +- **Upload Status**: `GET /uploads/{upload_id}` |
| 318 | +- **Metadata Edit**: `POST /uploads/{upload_id}/edit` |
| 319 | +- **Publishing**: `POST /uploads/{upload_id}/action/publish` |
| 320 | +- **Dataset Creation**: `POST /datasets/` |
| 321 | + |
| 322 | +For complete NOMAD API documentation, visit: https://nomad-lab.eu/docs |
| 323 | + |
| 324 | +## Examples |
| 325 | + |
| 326 | +See `example_upload_script.py` for a complete working example with real configuration. |
| 327 | + |
| 328 | +## Contributing |
| 329 | + |
| 330 | +When modifying the uploader: |
| 331 | +1. Update type hints (`Optional[Literal[...]]` for restricted values) |
| 332 | +2. Maintain consistent logger usage (pass `upload_logger` to all functions) |
| 333 | +3. Add comprehensive docstrings |
| 334 | +4. Update this README with new features |
| 335 | + |
| 336 | + |
| 337 | +## Support |
| 338 | + |
| 339 | +For issues or questions: |
| 340 | +- Check the logs in `logger_dir` for detailed error messages |
| 341 | +- Review the NOMAD documentation at https://nomad-lab.eu |
| 342 | +- Open an issue on the project repository |
0 commit comments