Skip to content

Commit 073faa6

Browse files
authored
docs: readthedocs (#99)
* docs: Contributor Guidelines * fix: docstrings format for sphinx build * docs: add User Guide * docs: update readme
1 parent 7febec5 commit 073faa6

18 files changed

+950
-42
lines changed

.readthedocs.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
version: 2
2+
3+
build:
4+
os: ubuntu-22.04
5+
tools:
6+
python: "3.10"
7+
8+
python:
9+
install:
10+
- method: pip
11+
path: .
12+
extra_requirements:
13+
- dev

README.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,18 @@
44
![Code Style](https://img.shields.io/badge/code%20style-black-black)
55
[![semantic-release: angular](https://img.shields.io/badge/semantic--release-angular-e10079?logo=semantic-release)](https://github.com/semantic-release/semantic-release)
66

7-
Script to create metadata analytics table and write to redshift table.
8-
This script will parse through a list of s3 buckets and document whether data asset records in each of those buckets does or does not contain `metadata.nd.json`
7+
Index jobs for AIND metadata in AWS DocumentDB and S3.
98

9+
AIND metadata for data assets is stored in various places and must be
10+
kept in sync:
1011

11-
## Usage
12-
- Define the environment variables in the `.env.template`
13-
- REDSHIFT_SECRETS_NAME: defining secrets name for Amazon Redshift
14-
- BUCKETS: list of buckets. comma separated format (ex: "bucket_name1, bucket_name2")
15-
- TABLE_NAME: name of table in redshift
16-
- FOLDERS_FILEPATH: Intended filepath for txt file
17-
- METADATA_DIRECTORY: Intended path for directory containing copies of metadata records
18-
- AWS_DEFAULT_REGION: Default AWS region.
19-
- Records containing metadata.nd.json file will be copies to `METADATA_DIRECTORY` and compared against list of all records in `FOLDERS_FILEPATH`
20-
- An analytics table containing columns `s3_prefix`, `bucket_name`, and `metadata_bool` will be written to `TABLE_NAME` in Redshift
12+
1. **S3 buckets** store raw metadata files, including the ``metadata.nd.json``.
13+
2. A **document database (DocDB)** contains unstructured json
14+
documents describing the ``metadata.nd.json`` for a data asset.
15+
3. **Code Ocean**: data assets are mounted as CodeOcean data asssets.
16+
Processed results are also stored in an internal Code Ocean bucket.
2117

22-
## Development
23-
- It's a bit tedious, but the dependencies listed in the `pyproject.toml` file needs to be manually updated
18+
We have automated jobs to keep changes in DocDB and S3 in sync.
19+
This repository contains the code for these index jobs.
20+
21+
More information including a user guide and contributor guidelines can be found at [readthedocs](https://aind-data-asset-indexer.readthedocs.io).

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/make.bat

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=source
11+
set BUILDDIR=build
12+
13+
%SPHINXBUILD% >NUL 2>NUL
14+
if errorlevel 9009 (
15+
echo.
16+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
17+
echo.installed, then set the SPHINXBUILD environment variable to point
18+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
19+
echo.may add the Sphinx directory to PATH.
20+
echo.
21+
echo.If you don't have Sphinx installed, grab it from
22+
echo.https://www.sphinx-doc.org/
23+
exit /b 1
24+
)
25+
26+
if "%1" == "" goto help
27+
28+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29+
goto end
30+
31+
:help
32+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33+
34+
:end
35+
popd

docs/source/Contributing.rst

Lines changed: 273 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
Contributor Guidelines
2+
======================
3+
4+
This document will go through best practices for contributing to this
5+
project. We welcome and appreciate contributions or ideas for
6+
improvement.
7+
8+
- `Bug Reports and Feature
9+
Requests <#bug-reports-and-feature-requests>`__
10+
- `Local Installation for
11+
Development <#local-installation-for-development>`__
12+
- `Branches and Pull Requests <#branches-and-pull-requests>`__
13+
- `Release Cycles <#release-cycles>`__
14+
15+
Bug Reports and Feature Requests
16+
--------------------------------
17+
18+
Before creating a pull request, we ask contributors to please open a bug
19+
report or feature request first:
20+
`issues <https://github.com/AllenNeuralDynamics/aind-data-asset-indexer/issues/new/choose>`__
21+
22+
We will do our best to monitor and maintain the backlog of issues.
23+
24+
Local Installation and Development
25+
----------------------------------
26+
27+
For development,
28+
29+
- For new features or non-urgent bug fixes, create a branch off of
30+
``dev``
31+
- For an urgent hotfix to our production environment, create a branch
32+
off of ``main``
33+
34+
Consult the `Branches and Pull Requests <#branches-and-pull-requests>`__
35+
and `Release Cycles <#release-cycles>`__ for more details.
36+
37+
From the root directory, run:
38+
39+
.. code:: bash
40+
41+
pip install -e .[dev]
42+
43+
to install the relevant code for development.
44+
45+
.. _running-indexer-jobs-locally:
46+
47+
Running indexer jobs locally
48+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
49+
50+
51+
The jobs are intended to be run as scheduled AWS ECS tasks in the same VPC
52+
as the DocDB instance. The job settings are stored in AWS Parameter Store.
53+
54+
If you wish to run the jobs locally, follow these steps:
55+
56+
1. In a new terminal, start ssh session. Credentials can be found in AWS
57+
Secrets Manager.
58+
59+
.. code:: bash
60+
61+
ssh -L 27017:{docdb_host}:27017 {ssh_username}@{ssh_host} -N -v
62+
63+
2. For the `IndexAindBucketsJob`, you will need to set the ``INDEXER_PARAM_NAME``.
64+
Then, run the following:
65+
66+
.. code:: python
67+
68+
from aind_data_asset_indexer.index_aind_buckets import IndexAindBucketsJob
69+
from aind_data_asset_indexer.models import AindIndexBucketsJobSettings
70+
71+
if __name__ == "__main__":
72+
main_job_settings = AindIndexBucketsJobSettings.from_param_store(param_store_name=INDEXER_PARAM_NAME)
73+
main_job_settings.doc_db_host = "localhost"
74+
main_job = IndexAindBucketsJob(job_settings=main_job_settings)
75+
main_job.run_job()
76+
77+
3. For the `CodeOceanIndexBucketJob`, you will need to set the ``CO_INDEXER_PARAM_NAME``
78+
and ``DEVELOPER_CODEOCEAN_ENDPOINT``. Then, run the following:
79+
80+
.. code:: python
81+
82+
from aind_data_asset_indexer.models import CodeOceanIndexBucketJobSettings
83+
from aind_data_asset_indexer.codeocean_bucket_indexer import CodeOceanIndexBucketJob
84+
85+
if __name__ == "__main__":
86+
main_job_settings = CodeOceanIndexBucketJobSettings.from_param_store(param_store_name=CO_INDEXER_PARAM_NAME)
87+
main_job_settings.doc_db_host = "localhost"
88+
main_job_settings.temp_codeocean_endpoint=DEVELOPER_CODEOCEAN_ENDPOINT
89+
main_job = CodeOceanIndexBucketJob(job_settings=main_job_settings)
90+
main_job.run_job()
91+
92+
4. Close the ssh session when you are done.
93+
94+
95+
Branches and Pull Requests
96+
--------------------------
97+
98+
Branch naming conventions
99+
~~~~~~~~~~~~~~~~~~~~~~~~~
100+
101+
Name your branch using the following format:
102+
``<type>-<issue_number>-<short_summary>``
103+
104+
where:
105+
106+
- ``<type>`` is one of:
107+
108+
- **build**: Changes that affect the build system
109+
or external dependencies (e.g., pyproject.toml, setup.py)
110+
- **ci**: Changes to our CI configuration files and scripts
111+
(examples: .github/workflows/ci.yml)
112+
- **docs**: Changes to our documentation
113+
- **feat**: A new feature
114+
- **fix**: A bug fix
115+
- **perf**: A code change that improves performance
116+
- **refactor**: A code change that neither fixes a bug nor adds
117+
a feature, but will make the codebase easier to maintain
118+
- **test**: Adding missing tests or correcting existing tests
119+
- **hotfix**: An urgent bug fix to our production code
120+
- ``<issue_number>`` references the GitHub issue this branch will close
121+
- ``<short_summary>`` is a brief description that shouldn’t be more than 3
122+
words.
123+
124+
Some examples:
125+
126+
- ``feat-12-adds-email-field``
127+
- ``fix-27-corrects-endpoint``
128+
- ``test-43-updates-server-test``
129+
130+
We ask that a separate issue and branch are created if code is added
131+
outside the scope of the reference issue.
132+
133+
Commit messages
134+
~~~~~~~~~~~~~~~
135+
136+
Please format your commit messages as ``<type>: <short summary>`` where
137+
``<type>`` is from the list above and the short summary is one or two
138+
sentences.
139+
140+
Testing and docstrings
141+
~~~~~~~~~~~~~~~~~~~~~~
142+
143+
We strive for complete code coverage and docstrings, and we also run
144+
code format checks.
145+
146+
- To run the code format check:
147+
148+
.. code:: bash
149+
150+
flake8 .
151+
152+
- There are some helpful libraries that will automatically format the
153+
code and import statements:
154+
155+
.. code:: bash
156+
157+
black .
158+
159+
and
160+
161+
.. code:: bash
162+
163+
isort .
164+
165+
Strings that exceed the maximum line length may still need to be
166+
formatted manually.
167+
168+
- To run the docstring coverage check and report:
169+
170+
.. code:: bash
171+
172+
interrogate -v .
173+
174+
This project uses NumPy’s docstring format: `Numpy docstring
175+
standards <https://numpydoc.readthedocs.io/en/latest/format.html>`__
176+
177+
Many IDEs can be configured to automatically format docstrings in the
178+
NumPy convention.
179+
180+
- To run the unit test coverage check and report:
181+
182+
.. code:: bash
183+
184+
coverage run -m unittest discover && coverage report
185+
186+
- To view a more detailed html version of the report, run:
187+
188+
.. code:: bash
189+
190+
coverage run -m unittest discover && coverage report
191+
coverage html
192+
193+
and then open ``htmlcov/index.html`` in a browser.
194+
195+
Pull requests
196+
~~~~~~~~~~~~~
197+
198+
Pull requests and reviews are required before merging code into this
199+
project. You may open a ``Draft`` pull request and ask for a preliminary
200+
review on code that is currently a work-in-progress.
201+
202+
Before requesting a review on a finalized pull request, please verify
203+
that the automated checks have passed first.
204+
205+
Release Cycles
206+
--------------------------
207+
208+
For this project, we have adopted the `Git
209+
Flow <https://www.gitkraken.com/learn/git/git-flow>`__ system. We will
210+
strive to release new features and bug fixes on a two week cycle. The
211+
rough workflow is:
212+
213+
Hotfixes
214+
~~~~~~~~
215+
216+
- A ``hotfix`` branch is created off of ``main``
217+
- A Pull Request into is ``main`` is opened, reviewed, and merged into
218+
``main``
219+
- A new ``tag`` with a patch bump is created, and a new ``release`` is
220+
deployed
221+
- The ``main`` branch is merged into all other branches
222+
223+
Feature branches and bug fixes
224+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225+
226+
- A new branch is created off of ``dev``
227+
- A Pull Request into ``dev`` is opened, reviewed, and merged
228+
229+
Release branch
230+
~~~~~~~~~~~~~~
231+
232+
- A new branch ``release-v{new_tag}`` is created
233+
- Documentation updates and bug fixes are created off of the
234+
``release-v{new_tag}`` branch.
235+
- Commits added to the ``release-v{new_tag}`` are also merged into
236+
``dev``
237+
- Once ready for release, a Pull Request from ``release-v{new_tag}``
238+
into ``main`` is opened for final review
239+
- A new tag will automatically be generated
240+
- Once merged, a new GitHub Release is created manually
241+
242+
Pre-release checklist
243+
~~~~~~~~~~~~~~~~~~~~~
244+
245+
- ☐ Increment ``__version__`` in
246+
``aind_data_asset-indexer/__init__.py`` file
247+
- ☐ Run linters, unit tests, and integration tests
248+
- ☐ Verify code is deployed and tested in test environment
249+
- ☐ Update examples
250+
- ☐ Update documentation
251+
252+
- Run:
253+
254+
.. code:: bash
255+
256+
sphinx-apidoc -o docs/source/ src
257+
sphinx-build -b html docs/source/ docs/build/html
258+
259+
- ☐ Update and build UML diagrams
260+
261+
- To build UML diagrams locally using a docker container:
262+
263+
.. code:: bash
264+
265+
docker pull plantuml/plantuml-server
266+
docker run -d -p 8080:8080 plantuml/plantuml-server:jetty
267+
268+
Post-release checklist
269+
~~~~~~~~~~~~~~~~~~~~~~
270+
271+
- ☐ Merge ``main`` into ``dev`` and feature branches
272+
- ☐ Edit release notes if needed
273+
- ☐ Post announcement

0 commit comments

Comments
 (0)