AllenNeuralDynamics
diff --git a/‎.readthedocs.yaml‎
Lines changed: 13 additions & 0 deletions b/‎.readthedocs.yaml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 12 additions & 14 deletions b/‎README.md‎
Lines changed: 12 additions & 14 deletions
diff --git a/‎docs/Makefile‎
Lines changed: 20 additions & 0 deletions b/‎docs/Makefile‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/make.bat‎
Lines changed: 35 additions & 0 deletions b/‎docs/make.bat‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/source/Contributing.rst‎
Lines changed: 273 additions & 0 deletions b/‎docs/source/Contributing.rst‎
Lines changed: 273 additions & 0 deletions
@@ -0,0 +1,13 @@
+version: 2
+
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.10"
+
+python:
+  install:
+    - method: pip
+      path: .
+      extra_requirements:
+        - dev
@@ -4,20 +4,18 @@
 ![Code Style](https://img.shields.io/badge/code%20style-black-black)
 [![semantic-release: angular](https://img.shields.io/badge/semantic--release-angular-e10079?logo=semantic-release)](https://github.com/semantic-release/semantic-release)
 
-Script to create metadata analytics table and write to redshift table. 
-This script will parse through a list of s3 buckets and document whether data asset records in each of those buckets does or does not contain `metadata.nd.json`
+Index jobs for AIND metadata in AWS DocumentDB and S3.
 
+AIND metadata for data assets is stored in various places and must be
+kept in sync:
 
-## Usage
-- Define the environment variables in the `.env.template` 
-  - REDSHIFT_SECRETS_NAME: defining secrets name for Amazon Redshift
-  - BUCKETS: list of buckets. comma separated format (ex: "bucket_name1, bucket_name2")
-  - TABLE_NAME: name of table in redshift
-  - FOLDERS_FILEPATH: Intended filepath for txt file
-  - METADATA_DIRECTORY: Intended path for directory containing copies of metadata records
-  - AWS_DEFAULT_REGION: Default AWS region.
-- Records containing metadata.nd.json file will be copies to `METADATA_DIRECTORY` and compared against list of all records in `FOLDERS_FILEPATH`
-- An analytics table containing columns `s3_prefix`, `bucket_name`, and `metadata_bool` will be written to `TABLE_NAME` in Redshift
+1. **S3 buckets** store raw metadata files, including the ``metadata.nd.json``.
+2. A **document database (DocDB)** contains unstructured json
+   documents describing the ``metadata.nd.json`` for a data asset.
+3. **Code Ocean**: data assets are mounted as CodeOcean data asssets.
+   Processed results are also stored in an internal Code Ocean bucket.
 
-## Development
-- It's a bit tedious, but the dependencies listed in the `pyproject.toml` file needs to be manually updated
+We have automated jobs to keep changes in DocDB and S3 in sync.
+This repository contains the code for these index jobs.
+
+More information including a user guide and contributor guidelines can be found at [readthedocs](https://aind-data-asset-indexer.readthedocs.io).
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
@@ -0,0 +1,273 @@
+Contributor Guidelines
+======================
+
+This document will go through best practices for contributing to this
+project. We welcome and appreciate contributions or ideas for
+improvement.
+
+-  `Bug Reports and Feature
+   Requests <#bug-reports-and-feature-requests>`__
+-  `Local Installation for
+   Development <#local-installation-for-development>`__
+-  `Branches and Pull Requests <#branches-and-pull-requests>`__
+-  `Release Cycles <#release-cycles>`__
+
+Bug Reports and Feature Requests
+--------------------------------
+
+Before creating a pull request, we ask contributors to please open a bug
+report or feature request first:
+`issues <https://github.com/AllenNeuralDynamics/aind-data-asset-indexer/issues/new/choose>`__
+
+We will do our best to monitor and maintain the backlog of issues.
+
+Local Installation and Development
+----------------------------------
+
+For development,
+
+-  For new features or non-urgent bug fixes, create a branch off of
+   ``dev``
+-  For an urgent hotfix to our production environment, create a branch
+   off of ``main``
+
+Consult the `Branches and Pull Requests <#branches-and-pull-requests>`__
+and `Release Cycles <#release-cycles>`__ for more details.
+
+From the root directory, run:
+
+.. code:: bash
+
+   pip install -e .[dev]
+
+to install the relevant code for development.
+
+.. _running-indexer-jobs-locally:
+
+Running indexer jobs locally
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+The jobs are intended to be run as scheduled AWS ECS tasks in the same VPC
+as the DocDB instance. The job settings are stored in AWS Parameter Store.
+
+If you wish to run the jobs locally, follow these steps:
+
+1. In a new terminal, start ssh session. Credentials can be found in AWS
+   Secrets Manager.
+
+.. code:: bash
+
+   ssh -L 27017:{docdb_host}:27017 {ssh_username}@{ssh_host} -N -v
+
+2. For the `IndexAindBucketsJob`, you will need to set the ``INDEXER_PARAM_NAME``.
+   Then, run the following:
+
+.. code:: python
+
+   from aind_data_asset_indexer.index_aind_buckets import IndexAindBucketsJob
+   from aind_data_asset_indexer.models import AindIndexBucketsJobSettings
+
+   if __name__ == "__main__":
+      main_job_settings = AindIndexBucketsJobSettings.from_param_store(param_store_name=INDEXER_PARAM_NAME)
+      main_job_settings.doc_db_host = "localhost"
+      main_job = IndexAindBucketsJob(job_settings=main_job_settings)
+      main_job.run_job()
+
+3. For the `CodeOceanIndexBucketJob`, you will need to set the ``CO_INDEXER_PARAM_NAME``
+   and ``DEVELOPER_CODEOCEAN_ENDPOINT``. Then, run the following:
+
+.. code:: python
+
+   from aind_data_asset_indexer.models import CodeOceanIndexBucketJobSettings
+   from aind_data_asset_indexer.codeocean_bucket_indexer import CodeOceanIndexBucketJob
+
+   if __name__ == "__main__":
+      main_job_settings = CodeOceanIndexBucketJobSettings.from_param_store(param_store_name=CO_INDEXER_PARAM_NAME)
+      main_job_settings.doc_db_host = "localhost"
+      main_job_settings.temp_codeocean_endpoint=DEVELOPER_CODEOCEAN_ENDPOINT
+      main_job = CodeOceanIndexBucketJob(job_settings=main_job_settings)
+      main_job.run_job()
+
+4. Close the ssh session when you are done.
+
+
+Branches and Pull Requests
+--------------------------
+
+Branch naming conventions
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Name your branch using the following format:
+``<type>-<issue_number>-<short_summary>``
+
+where:
+
+-  ``<type>`` is one of:
+
+   -  **build**: Changes that affect the build system
+      or external dependencies (e.g., pyproject.toml, setup.py)
+   -  **ci**: Changes to our CI configuration files and scripts
+      (examples: .github/workflows/ci.yml)
+   -  **docs**: Changes to our documentation
+   -  **feat**: A new feature
+   -  **fix**: A bug fix
+   -  **perf**: A code change that improves performance
+   -  **refactor**: A code change that neither fixes a bug nor adds
+      a feature, but will make the codebase easier to maintain
+   -  **test**: Adding missing tests or correcting existing tests
+   -  **hotfix**: An urgent bug fix to our production code
+-  ``<issue_number>`` references the GitHub issue this branch will close
+-  ``<short_summary>`` is a brief description that shouldn’t be more than 3
+   words.
+
+Some examples:
+
+-  ``feat-12-adds-email-field``
+-  ``fix-27-corrects-endpoint``
+-  ``test-43-updates-server-test``
+
+We ask that a separate issue and branch are created if code is added
+outside the scope of the reference issue.
+
+Commit messages
+~~~~~~~~~~~~~~~
+
+Please format your commit messages as ``<type>: <short summary>`` where
+``<type>`` is from the list above and the short summary is one or two
+sentences.
+
+Testing and docstrings
+~~~~~~~~~~~~~~~~~~~~~~
+
+We strive for complete code coverage and docstrings, and we also run
+code format checks.
+
+-  To run the code format check:
+
+.. code:: bash
+
+   flake8 .
+
+-  There are some helpful libraries that will automatically format the
+   code and import statements:
+
+.. code:: bash
+
+   black .
+
+and
+
+.. code:: bash
+
+   isort .
+
+Strings that exceed the maximum line length may still need to be
+formatted manually.
+
+-  To run the docstring coverage check and report:
+
+.. code:: bash
+
+   interrogate -v .
+
+This project uses NumPy’s docstring format: `Numpy docstring
+standards <https://numpydoc.readthedocs.io/en/latest/format.html>`__
+
+Many IDEs can be configured to automatically format docstrings in the
+NumPy convention.
+
+-  To run the unit test coverage check and report:
+
+.. code:: bash
+
+   coverage run -m unittest discover && coverage report
+
+-  To view a more detailed html version of the report, run:
+
+.. code:: bash
+
+   coverage run -m unittest discover && coverage report
+   coverage html
+
+and then open ``htmlcov/index.html`` in a browser.
+
+Pull requests
+~~~~~~~~~~~~~
+
+Pull requests and reviews are required before merging code into this
+project. You may open a ``Draft`` pull request and ask for a preliminary
+review on code that is currently a work-in-progress.
+
+Before requesting a review on a finalized pull request, please verify
+that the automated checks have passed first.
+
+Release Cycles
+--------------------------
+
+For this project, we have adopted the `Git
+Flow <https://www.gitkraken.com/learn/git/git-flow>`__ system. We will
+strive to release new features and bug fixes on a two week cycle. The
+rough workflow is:
+
+Hotfixes
+~~~~~~~~
+
+-  A ``hotfix`` branch is created off of ``main``
+-  A Pull Request into is ``main`` is opened, reviewed, and merged into
+   ``main``
+-  A new ``tag`` with a patch bump is created, and a new ``release`` is
+   deployed
+-  The ``main`` branch is merged into all other branches
+
+Feature branches and bug fixes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+-  A new branch is created off of ``dev``
+-  A Pull Request into ``dev`` is opened, reviewed, and merged
+
+Release branch
+~~~~~~~~~~~~~~
+
+-  A new branch ``release-v{new_tag}`` is created
+-  Documentation updates and bug fixes are created off of the
+   ``release-v{new_tag}`` branch.
+-  Commits added to the ``release-v{new_tag}`` are also merged into
+   ``dev``
+-  Once ready for release, a Pull Request from ``release-v{new_tag}``
+   into ``main`` is opened for final review
+-  A new tag will automatically be generated
+-  Once merged, a new GitHub Release is created manually
+
+Pre-release checklist
+~~~~~~~~~~~~~~~~~~~~~
+
+-  ☐ Increment ``__version__`` in
+   ``aind_data_asset-indexer/__init__.py`` file
+-  ☐ Run linters, unit tests, and integration tests
+-  ☐ Verify code is deployed and tested in test environment
+-  ☐ Update examples
+-  ☐ Update documentation
+
+   -  Run:
+
+   .. code:: bash
+
+      sphinx-apidoc -o docs/source/ src
+      sphinx-build -b html docs/source/ docs/build/html
+
+-  ☐ Update and build UML diagrams
+
+   -  To build UML diagrams locally using a docker container:
+
+   .. code:: bash
+
+      docker pull plantuml/plantuml-server
+      docker run -d -p 8080:8080 plantuml/plantuml-server:jetty
+
+Post-release checklist
+~~~~~~~~~~~~~~~~~~~~~~
+
+-  ☐ Merge ``main`` into ``dev`` and feature branches
+-  ☐ Edit release notes if needed
+-  ☐ Post announcement