Skip to content

Commit ade088f

Browse files
committed
Update README.md
1 parent e65c41c commit ade088f

File tree

4 files changed

+190
-35
lines changed

4 files changed

+190
-35
lines changed

src/pynxtools_spm/nomad_uploader/README.md

Lines changed: 82 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,8 @@ if __name__ == "__main__":
135135
| `number_of_uploads` | int | No | 10 | Max files to process per batch |
136136
| `delete_failed_uploads` | bool | No | False | Delete uploads on timeout |
137137
| `upload_metadata` | dict | No | None | Metadata to apply to all uploads |
138-
| `file_specific_eln` | dict | No | None | Map filenames to specific ELN files |
138+
| `file_to_convert_data` | dict | No | None | Map file paths to ELN files and technique types |
139+
| `single_file_pynx_convert_time` | int | No | 5 | Timeout for single file conversion (seconds) |
139140

140141
## Metadata Management
141142

@@ -254,6 +255,32 @@ Solution: Increase `max_upload_attempt` or `nomad_processing_time` in NOMADSetti
254255

255256
## Advanced Usage
256257

258+
### Custom File-Specific Configuration
259+
260+
Map specific files to custom ELN files and technique types using `file_to_convert_data`:
261+
262+
```python
263+
data_proc_settings = DataProcessingSettings(
264+
# ... other settings ...
265+
file_to_convert_data={
266+
"/path/to/data/stm/sample1.sxm": {
267+
"eln": "/path/to/custom_eln1.yaml", # Optional: use specific ELN
268+
"technique": "stm", # Required: specify technique (stm/sts/afm)
269+
},
270+
"/path/to/data/sts/measurement.dat": {
271+
"eln": "", # Empty: use default ELN from sts_eln setting
272+
"technique": "sts",
273+
},
274+
"/path/to/data/afm/surface.sxm": {
275+
"eln": "/path/to/custom_eln2.yaml",
276+
"technique": "afm",
277+
},
278+
}
279+
)
280+
```
281+
282+
**Important**: Use full file paths as keys to avoid collisions when files in different directories have the same name. The `__post_init__` method automatically creates both full-path and filename lookups.
283+
257284
### Processing Only Unprocessed Files
258285

259286
The uploader automatically creates `.done` marker files for successful uploads. On subsequent runs, it only processes files without corresponding `.done` markers.
@@ -266,14 +293,52 @@ The uploader automatically creates `.done` marker files for successful uploads.
266293
# Only processes new files without .done markers
267294
```
268295

296+
### Automatic Technique Detection
297+
298+
The uploader uses a hybrid approach to determine the SPM technique:
299+
300+
1. **Explicit Configuration**: If `file_to_convert_data` specifies a `technique` for a file, it uses that value
301+
2. **File Extension Fallback**: If no explicit technique is configured:
302+
- `.dat` files → STS (Scanning Tunneling Spectroscopy)
303+
- `.sxm` files → STM (Scanning Tunneling Microscopy) by default
304+
305+
**Best Practice**: Use `file_to_convert_data` to explicitly specify techniques, especially when:
306+
- `.sxm` files should be processed as AFM instead of STM
307+
- You have multiple files with the same name in different directories
308+
- You want to override default technique detection
309+
310+
### Zip File Creation
311+
312+
Each successful conversion creates a zip file containing:
313+
- **NeXus output file** (`.nxs`)
314+
- **Original raw data file** (`.dat` or `.sxm`)
315+
- **ELN metadata file** (`.yaml`)
316+
- **Config file** (if specified)
317+
318+
The zip file uses basename extraction to ensure clean file names in the archive.
319+
269320
### Batch Processing with Custom ELN Mapping
270321

322+
**Deprecated**: The old `file_specific_eln` parameter has been replaced with `file_to_convert_data`.
323+
271324
```python
272-
data_proc_settings.file_specific_eln = {
273-
"sample1.dat": Path("/path/to/sample1_eln.yaml"),
274-
"sample2.sxm": Path("/path/to/sample2_eln.yaml"),
325+
# Old approach (deprecated):
326+
# data_proc_settings.file_specific_eln = {
327+
# "sample1.dat": Path("/path/to/sample1_eln.yaml"),
328+
# }
329+
330+
# New approach (recommended):
331+
data_proc_settings.file_to_convert_data = {
332+
"/full/path/to/sample1.dat": {
333+
"eln": "/path/to/sample1_eln.yaml",
334+
"technique": "sts"
335+
},
336+
"/full/path/to/sample2.sxm": {
337+
"eln": "/path/to/sample2_eln.yaml",
338+
"technique": "afm" # Explicitly specify AFM for .sxm file
339+
},
275340
}
276-
# Default ELN will be used for files not in this mapping
341+
# Default ELN (sts_eln, stm_eln, afm_eln) used for files not in this mapping
277342
```
278343

279344
### Publishing to NOMAD
@@ -329,9 +394,18 @@ See `example_upload_script.py` for a complete working example with real configur
329394

330395
When modifying the uploader:
331396
1. Update type hints (`Optional[Literal[...]]` for restricted values)
332-
2. Maintain consistent logger usage (pass `upload_logger` to all functions)
333-
3. Add comprehensive docstrings
334-
4. Update this README with new features
397+
2. Maintain consistent logger usage (pass `upload_logger` and `converter_logger` to functions)
398+
3. Use `Path.name` for extracting filenames instead of string splitting
399+
4. Add comprehensive docstrings
400+
5. Update this README with new features
401+
6. Test with various file types (.dat, .sxm) and techniques (STS, STM, AFM)
402+
403+
### Recent Changes
404+
405+
- **v2.0**: Replaced `file_specific_eln` with `file_to_convert_data` for better technique specification
406+
- **Improved**: File path handling using `Path.name` instead of string manipulation
407+
- **Enhanced**: Automatic technique detection with explicit configuration support
408+
- **Fixed**: Collision handling when multiple files have the same name in different directories
335409

336410

337411
## Support

src/pynxtools_spm/nomad_uploader/example_upload_script.py

Lines changed: 76 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from pathlib import Path
77

88
current_dir = Path(__file__).resolve().parent
9+
project_dir = current_dir.parent.parent.parent
910

1011
nomad_settings = NOMADSettings(
1112
url_protocol="https",
@@ -14,35 +15,100 @@
1415
url="https://nomad-lab.eu/prod/v1/develop/api/v1/",
1516
username="Mozumder",
1617
password="*#R516660a*#",
17-
token="eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJmb1hmZnM5QlFQWHduLU54Yk5PYlExOFhnZnlKU1FNRkl6ZFVnWjhrZzdVIn0.eyJleHAiOjE3NzAzODExMzMsImlhdCI6MTc3MDI5NDczMywianRpIjoiYjk2YjI1NTUtYmU4My00Mjk0LWFlMDMtOTFjMTNmN2RlNmMwIiwiaXNzIjoiaHR0cHM6Ly9ub21hZC1sYWIuZXUvZmFpcmRpL2tleWNsb2FrL2F1dGgvcmVhbG1zL2ZhaXJkaV9ub21hZF9wcm9kIiwic3ViIjoiOGVjOGMwOWUtOWZiMC00YTQ3LTllNTEtNDRkNWFjNjg5YWQwIiwidHlwIjoiQmVhcmVyIiwiYXpwIjoibm9tYWRfcHVibGljIiwic2lkIjoiYjZlODVlZmUtMzViOS00M2UzLTgxMWItOWU3ZWZlMTc4MTVkIiwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJuYW1lIjoiUnViZWwgTW96dW1kZXIiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJtb3p1bWRlciIsInNlc3Npb25fc3RhdGUiOiJiNmU4NWVmZS0zNWI5LTQzZTMtODExYi05ZTdlZmUxNzgxNWQiLCJnaXZlbl9uYW1lIjoiUnViZWwiLCJmYW1pbHlfbmFtZSI6Ik1venVtZGVyIiwiZW1haWwiOiJtb3p1bWRlckBwaHlzaWsuaHUtYmVybGluLmRlIn0.ekOAgc5AIqm_uriTRj8yxt-Kmtuz5IHf6Bs3lQIaW0WK_ds7UD21itBtKFKTZlUHq2Ti6gcfsU1WqkdRJ3_5d1xSblJoKWL974o8YuAhDMNLkcv4HqthjQ_pnp3XSb_y0JXsPB0tOImjf-86sLqYqsgn9FHcln-OZb73guLlrKiwyXt4cK5pB-vO8JFKdZvChnVHhb7wyBYkJuLYtvhjQgASLDqDSAPHm6mk9ZGtqcr1KtzlopMzK2YovfgNiJSWrKyH6yI2O4wcSJzc6374N3SAQeIClYGExs0f39jKS0dfhdROI753SUvDAO-niDiyxT8LChwHtuM1IlnFSw-5ng",
18+
token="",
1819
modify_upload_metadata=True,
1920
publish_to_nomad=False,
2021
)
21-
local_src_dir = Path(
22-
"/home/rubel/NOMAD-FAIRmat/nomad-distro-dev-RM/packages/pynxtools-spm/tests/data/nanonis"
23-
)
24-
total_upload = 1
22+
nanonis = project_dir / "tests/data/nanonis"
23+
24+
total_upload = 3
25+
2526
data_proc_settings = DataProcessingSettings(
2627
raw_file_exts=(
2728
".dat",
2829
".sxm",
2930
),
3031
single_batch_processing_time=total_upload * 90, # seconds
31-
src_dir=local_src_dir, # Path("/home/rubel/NOMAD-FAIRmat/SPMfolder/DataFilesForUpload"),
32+
src_dir=nanonis,
3233
# copy_file_elsewhere=False,
3334
dst_dir="",
3435
create_pseudo_file=True,
3536
pseudo_exts=".done",
3637
spm_params_obj_l=[],
37-
sts_eln=local_src_dir / "sts/version_gen_5e_with_described_nxdata/eln_data.yaml",
38+
sts_eln=nanonis / "sts/version_gen_5e_with_described_nxdata/eln_data.yaml",
3839
sts_config="",
39-
stm_eln=local_src_dir / "stm/version_gen_5_with_described_nxdata/eln_data.yaml",
40+
stm_eln=nanonis / "stm/version_gen_5_with_described_nxdata/eln_data.yaml",
4041
stm_config="",
41-
afm_eln=local_src_dir / "afm/version_gen_4_with_described_nxdata/eln_data.yaml",
42+
afm_eln=nanonis / "afm/version_gen_4_with_described_nxdata/eln_data.yaml",
4243
afm_config="",
4344
logger_dir=current_dir,
4445
number_of_uploads=total_upload,
45-
file_specific_eln={"raw_file_name": "eln_file_path"},
46+
file_to_convert_data={
47+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
48+
"eln": "",
49+
"technique": "stm",
50+
},
51+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
52+
"eln": "",
53+
"technique": "sts",
54+
},
55+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
56+
"eln": "",
57+
"technique": "stm",
58+
},
59+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
60+
"eln": "",
61+
"technique": "sts",
62+
},
63+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
64+
"eln": "",
65+
"technique": "stm",
66+
},
67+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
68+
"eln": "",
69+
"technique": "sts",
70+
},
71+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
72+
"eln": "",
73+
"technique": "stm",
74+
},
75+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
76+
"eln": "",
77+
"technique": "sts",
78+
},
79+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
80+
"eln": "",
81+
"technique": "stm",
82+
},
83+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
84+
"eln": "",
85+
"technique": "sts",
86+
},
87+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
88+
"eln": "",
89+
"technique": "stm",
90+
},
91+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
92+
"eln": "",
93+
"technique": "sts",
94+
},
95+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
96+
"eln": "",
97+
"technique": "stm",
98+
},
99+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
100+
"eln": "",
101+
"technique": "sts",
102+
},
103+
f"{nanonis / 'stm/version_gen_5_with_described_nxdata/Au_mica_2023_Y_A_diPAMY_195.sxm'}": {
104+
"eln": "",
105+
"technique": "stm",
106+
},
107+
f"{nanonis / 'sts/version_gen_5e_with_described_nxdata/STS_nanonis_generic_5e_1.dat'}": {
108+
"eln": "",
109+
"technique": "sts",
110+
},
111+
},
46112
# metadata = {
47113
# "metadata": {
48114
# "upload_name": upload_name,

src/pynxtools_spm/nomad_uploader/reader_config_setup.py

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,14 @@ def convert_spm_experiments(
5555
if not input_params.expriement_type:
5656
converter_logger.error("Experiment type is required to run an SPM experiment")
5757

58-
input_params.input_file = (*input_params.input_file, input_params.eln)
5958
input_params.input_file = tuple(
60-
Path(file) if isinstance(file, str) else file
61-
for file in input_params.input_file
59+
map(Path, (*input_params.input_file, input_params.eln))
6260
)
61+
6362
if input_params.config:
6463
input_params.input_file = (
6564
*input_params.input_file,
66-
input_params.config,
65+
Path(input_params.config),
6766
)
6867

6968
zip_file = None
@@ -88,24 +87,22 @@ def convert_spm_experiments(
8887
converter_logger.addHandler(converter_handeler)
8988
try:
9089
kwargs = asdict(input_params)
91-
print("#### kwargs:", kwargs)
9290
kwargs["input_file"] = tuple(map(str, input_params.input_file))
9391
kwargs["output"] = str(input_params.output)
9492
# with converter_logger:
9593
convert(**kwargs)
96-
print("#### kwargs after conversion:", kwargs)
9794
if input_params.create_zip:
9895
with zipfile.ZipFile(zip_file, "w") as zipf:
9996
zipf.write(
10097
str(input_params.output),
101-
arcname=str(input_params.output).split("/")[-1],
98+
arcname=Path(input_params.output).name,
10299
)
103100
for file in map(str, input_params.input_file):
104-
zipf.write(file, arcname=file.split("/")[-1])
101+
zipf.write(file, arcname=Path(file).name)
105102
input_params.zip_file_path = Path(zip_file)
106103

107104
except Exception as e:
108-
print("NeXusConverterError:", e)
105+
converter_logger.error(f"Error: {e}")
109106
finally:
110107
# Prevent propagatting other logs through this handler
111108
converter_logger.removeHandler(converter_handeler)

src/pynxtools_spm/nomad_uploader/uploader.py

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -87,11 +87,19 @@ class DataProcessingSettings:
8787
number_of_uploads: int = 10
8888
delete_failed_uploads: bool = False
8989
upload_metadata: Optional[dict] = None
90-
# Raw file to ELN file mapping.
91-
file_specific_eln: Optional[dict] = None
90+
# File to converter data
91+
# file_to_converter_data = {'file_name': {'eln': 'eln_file_path',
92+
# 'technique': 'stm/sts/afm'}}
93+
file_to_convert_data: Optional[dict] = None
9294
# Time for individual file conversion
9395
single_file_pynx_convert_time: int = 5 # seconds
9496

97+
def __post_init__(self):
98+
file_obj = {}
99+
for file, obj in self.file_to_convert_data.items():
100+
file_obj[Path(file).name] = obj
101+
self.file_to_convert_data = file_obj
102+
95103

96104
def create_preseudo_file(
97105
params_obj: SPMConvertInputParameters,
@@ -132,9 +140,11 @@ def get_unprocessed_files(src_dir: Path, data_proc_settings) -> list:
132140
file is not present.
133141
"""
134142
process_status_map = {}
143+
# Collect all raw files
135144
for file in src_dir.glob("**/*.*"):
136145
if file.is_file() and file.suffix in data_proc_settings.raw_file_exts:
137146
process_status_map[file] = False
147+
# Mark processed files as True
138148
for file in src_dir.glob("**/*.*"):
139149
if file.is_file() and file.suffix == data_proc_settings.pseudo_exts:
140150
# Remove extra pseudo extension
@@ -156,10 +166,17 @@ def set_and_store_prepared_parameters(
156166
spm_tech: Optional[Literal["stm", "sts", "afm"]] = None,
157167
) -> None:
158168
params_obj = None
159-
spec_eln_file = data_proc_settings.file_specific_eln.get(
160-
file.name, data_proc_settings.sts_eln
169+
spec_eln_file = (
170+
data_proc_settings.file_to_convert_data.get(file.name, {}).get("eln")
171+
if data_proc_settings.file_to_convert_data
172+
else None
161173
)
162-
if file.suffix == ".dat":
174+
technique = (
175+
data_proc_settings.file_to_convert_data.get(file.name, {}).get("technique")
176+
if data_proc_settings.file_to_convert_data
177+
else None
178+
)
179+
if technique == "sts" or file.suffix == ".dat":
163180
params_obj = SPMConvertInputParameters(
164181
input_file=(file,),
165182
eln=spec_eln_file if spec_eln_file else data_proc_settings.sts_eln,
@@ -169,7 +186,7 @@ def set_and_store_prepared_parameters(
169186
raw_extension="dat",
170187
create_zip=True,
171188
)
172-
elif file.suffix == ".sxm":
189+
elif technique == "stm" or file.suffix == ".sxm":
173190
params_obj = SPMConvertInputParameters(
174191
input_file=(file,),
175192
eln=spec_eln_file if spec_eln_file else data_proc_settings.stm_eln,
@@ -179,7 +196,7 @@ def set_and_store_prepared_parameters(
179196
raw_extension="sxm",
180197
create_zip=True,
181198
)
182-
elif file.suffix == ".sxm":
199+
elif technique == "afm" or file.suffix == ".sxm":
183200
params_obj = SPMConvertInputParameters(
184201
input_file=(file,),
185202
eln=spec_eln_file if spec_eln_file else data_proc_settings.afm_eln,
@@ -351,8 +368,9 @@ def queue_results(input_params, lock, results_q):
351368
upload_id = upload_to_NOMAD(
352369
nomad_settings.url, nomad_settings.token, zip_to_upload
353370
)
371+
354372
upload_logger.info(
355-
f"Upload request with Upload ID ({upload_id}) corresponding to {complete_param_obj.input_file}."
373+
f"Upload request with Upload ID ({upload_id}) corresponding to files \n{'\n'.join(map(str, complete_param_obj.input_file))}."
356374
)
357375
# trigger_reprocess_upload(
358376
# nomad_settings.url, nomad_settings.token, upload_id

0 commit comments

Comments
 (0)