@@ -20,35 +20,34 @@ kept in sync:
2020 all core schema fields.
2121 - ``original_metadata/{core_schema}.json ``: a copy of each
2222 core schema file as it was originally uploaded to S3.
23- 2. A **document database (DocDB) ** contains unstructured json
23+ 2. A **document database (DocDB) ** contains unstructured JSON
2424 documents describing the ``metadata.nd.json `` for a data asset.
25- 3. **Code Ocean **: data assets are mounted as CodeOcean data asssets .
25+ 3. **Code Ocean **: data assets are mounted as Code Ocean data assets .
2626 Processed results are also stored in an internal Code Ocean bucket.
2727
2828Once the data is initially uploaded, the DocDB is assumed to be the
2929source of truth for metadata. All updates to existing metadata should
3030be made in the DocDB.
3131
32- We have automated jobs to keep changes in DocDB and S3 in sync.
32+ We have automated jobs to keep changes in DocDB, S3, and Code Ocean in sync.
3333This repository contains the code for these index jobs:
3434
35351. `AindIndexBucketJob <#aindindexbucketjob >`__: Syncs changes in S3 and DocDB.
36- 2. `CodeOceanIndexBucketJob <#codeoceanindexbucketjob >`__: Syncs changes in CodeOcean and DocDB.
36+ 2. `CodeOceanIndexBucketJob <#codeoceanindexbucketjob >`__: Syncs changes in Code Ocean and DocDB.
3737
3838
3939AindIndexBucketJob
4040------------------
4141
4242The `AindIndexBucketJob ` handles syncing changes from DocDB to S3 for a
43- particular S3 bucket. There is a `IndexAindBucketsJob ` wrapper job that
43+ particular S3 bucket. There is an `IndexAindBucketsJob ` wrapper job that
4444runs the `AindIndexBucketJob ` for a list of buckets.
4545
46-
4746The workflow is generally as follows:
4847
49481. Paginate DocDB to get records for a particular bucket.
5049
51- - Typically we filter for records that have been updated in the last
50+ - Typically, we filter for records that have been updated in the last
5251 14 days.
53522. For each DocDB record, process by syncing any changes in DocDB to S3.
5453
@@ -66,7 +65,7 @@ The workflow is generally as follows:
6665 - If the metadata record exists in S3 but not in DocDB, copy it
6766 to DocDB.
6867 - If the metadata record does not exist in S3, create it and save
69- it to S3. Assume a lambda function will move it over to DocDB.
68+ it to S3. Assume a Lambda function will move it over to DocDB.
7069 - In both cases above, ensure the original metadata folder and core
7170 files are in sync with the metadata.nd.json file.
7271
@@ -77,16 +76,16 @@ CodeOceanIndexBucketJob
7776-----------------------
7877
7978The `CodeOceanIndexBucketJob ` updates the external links for DocDB records
80- with their CO data asset ids and indexes Code Ocean (CO) processed results.
79+ with their Code Ocean (CO) data asset IDs and indexes CO processed results.
8180
8281The workflow is generally as follows:
8382
84831. For records in AIND buckets, update the external links with CO data
85- asset ids if needed.
84+ asset IDs if needed.
8685
87- - Retrieve a list of CO data asset ids and locations
88- - Paginate through docdb records where the location does not match
89- internal CO bucket
86+ - Retrieve a list of CO data asset IDs and locations.
87+ - Paginate through DocDB records where the location does not match
88+ the internal CO bucket.
9089 - Add or remove the external links from the DocDB record as needed.
91902. Index CO processed results from the CO internal bucket.
9291
0 commit comments