@@ -16,17 +16,17 @@ kept in sync:
1616
1717 - ``{core_schema}.json ``: core schema files, e.g.,
1818 ``acquisition.json ``, ``subject.json ``.
19- - ``metadata.nd.json ``: top-level metadata file, containing
20- all core schema fields.
19+ - ``metadata.nd.json ``: top-level metadata file, containing all core schema
20+ schema fields. This file is for reference only and may be removed soon .
2121 - ``original_metadata/{core_schema}.json ``: a copy of each
2222 core schema file as it was originally uploaded to S3.
23232. A **document database (DocDB) ** contains unstructured JSON
24- documents describing the `` metadata.nd.json `` for a data asset.
24+ documents describing the top-level metadata (containing all core schema fields) for a data asset.
25253. **Code Ocean **: data assets are mounted as Code Ocean data assets.
2626 Processed results are also stored in an internal Code Ocean bucket.
2727
28- Once the data is initially uploaded, the DocDB is assumed to be the
29- source of truth for metadata. All updates to existing metadata should
28+ Once the data is initially uploaded and "registered" (added to DocDB and Code Ocean),
29+ the DocDB is assumed to be the source of truth for metadata. All updates to existing metadata should
3030be made in the DocDB.
3131
3232We have automated jobs to keep changes in DocDB, S3, and Code Ocean in sync.
@@ -57,17 +57,12 @@ The workflow is generally as follows:
5757 sync, update them.
5858 - If the metadata.nd.json file is outdated, update it.
59593. Paginate S3 to get all prefixes for a particular bucket.
60- 4. For each prefix, process by checking if it is a new data asset
60+ 4. For each prefix, process by checking if it is a new derived data asset
6161 and adding it to DocDB if necessary.
6262
63- - If the metadata record exists in S3 but not in DocDB, copy it
64- to DocDB.
65- - If the metadata record for a derived asset does not exist in S3,
66- create it and save it to S3. Assume a Lambda function will move it
67- over to DocDB. Metadata records for raw assets are created during
63+ - If the metadata record for a derived asset does not exist in DocDB,
64+ register the asset. Raw assets are registered during
6865 the upload process, **not ** by this job.
69- - In both cases above, ensure the original metadata folder and core
70- files are in sync with the metadata.nd.json file.
7166
7267Please refer to the job's docstrings for more details on the implementation.
7368
0 commit comments