added logic to create file references and added logging#4
Conversation
f16db24 to
d3c5ad1
Compare
d3c5ad1 to
b3f995e
Compare
| file_path: str, study_uuid: str, dataset_uuid: str, crate_path: pathlib.Path | ||
| ) -> list[APIModels.FileReference]: | ||
|
|
||
| relative_path = pathlib.Path(file_path).relative_to(crate_path).as_posix() |
There was a problem hiding this comment.
using as_posix because i'm assuming our system will be posix & we don't want this (and therefore the UUID) to vary depending on whether a windows system was used to ingest the objects.
bia_ro_crate/model/example/S-BIAD1494/ro-crate-version/ro-crate-metadata.json
Outdated
Show resolved
Hide resolved
kbab
left a comment
There was a problem hiding this comment.
Left some comments - also not sure if you are currently writing to an API. I think the API versions start from 0 not 1. If this is the case the version in the functions converting ROCrate models to API ones needs to change.
| "file_path": str(relative_path), | ||
| "version": 1, | ||
| "size_in_bytes": pathlib.Path(file_path).stat().st_size, | ||
| "format": pathlib.Path(file_path).suffix, |
There was a problem hiding this comment.
This is interesting. I think in the original bia_shared_models this property was from biostudies - which distinguished between file / directory (and I think we also used this in the past to flag files in zip archives).
During biostudies ingest we take this from the BioStudiesAPIFile object and its value is usually 'file'. However, EMPIAR ingest populates this with the suffix of the file path (compare value of format from example from biostudies ingest with EMPIAR ingest).
If we decide to go this way it would be useful to know whether to identify special suffixes e.g. (ome.zarr, nii.gz, ome.zarr.zip, zarr.zip, ome.tiff) and whether to standardise suffixes (e.g. TIF vs tif vs tiff vs TIFF) - we have a function for this when creating images/image_representations
bia_ro_crate/ro_crate_to_bia/entity_conversion/ImageAcquisitionProtocol.py
Show resolved
Hide resolved
test/test_ro_crate_to_bia.py
Outdated
| assert cli_out == expected_out | ||
| # Account for different ordering of JSON objects due to file reference order being somewhat arbitrary. | ||
| assert len(cli_out) == len(expected_out) | ||
| for json_obj in cli_out: |
There was a problem hiding this comment.
Could there be a scenario where two identical objects are created instead of two distinct ones? e.g. If we have FileReference1 twice instead of FileReference1 and FileReference2. In such a scenario the above will pass the test. However, if the for loop and following assertion explicitly go through all expected objects, the test will fail.
There was a problem hiding this comment.
Oh interesting edge case - will update as you suggest
…e testing more resistant to edge cases
ticket: https://app.clickup.com/t/8698qrqq9