Update docs.

guzman-raphael · guzman-raphael · commit fec90c4c96e8 · 2023-02-07T20:44:50.000Z
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -15,7 +15,8 @@
 		"ghcr.io/eitsupi/devcontainer-features/jq-likes:1": {},
 		"ghcr.io/guiyomh/features/vim:0": {}
 	},
-	"onCreateCommand": "pip install -e . && MYSQL_VER=5.7 MINIO_VER=RELEASE.2022-08-11T04-37-28Z docker compose -f local-docker-compose.yml up --build -d",
+	"onCreateCommand": "pip install -e .",
+	"postStartCommand": "MYSQL_VER=5.7 MINIO_VER=RELEASE.2022-08-11T04-37-28Z docker compose -f local-docker-compose.yml up --build -d",
 	"customizations": {
 		"vscode": {
 			"extensions": [
diff --git a/README.md b/README.md
@@ -10,27 +10,19 @@
 DataJoint for Python is a framework for scientific workflow management based on relational principles. DataJoint is built on the foundation of the relational data model and prescribes a consistent method for organizing, populating, computing, and querying data.
 
 DataJoint was initially developed in 2009 by Dimitri Yatsenko in Andreas Tolias' Lab at Baylor College of Medicine for the distributed processing and management of large volumes of data streaming from regular experiments. Starting in 2011, DataJoint has been available as an open-source project adopted by other labs and improved through contributions from several developers.
-Presently, the primary developer of DataJoint open-source software is the company DataJoint (https://datajoint.com). Related resources are listed at https://datajoint.org.
+Presently, the primary developer of DataJoint open-source software is the company DataJoint (https://datajoint.com).
 
-## Installation
+- [Getting Started](https://datajoint.com/docs/core/datajoint-python/latest/getting-started/)
+- [DataJoint Elements](https://datajoint.com/docs/elements/) - Catalog of example pipelines
+- [DataJoint CodeBook](https://codebook.datajoint.io) - Interactive online tutorials
+- Contribute
 
-```
-pip3 install datajoint
-```
+  - [Development Environment](https://datajoint.com/docs/core/datajoint-python/latest/develop/)
+  - [Guidelines](https://datajoint.com/docs/community/contribute/)
 
-If you already have an older version of DataJoint installed using `pip`, upgrade with
-
-```bash
-pip3 install --upgrade datajoint
-```
-
-## Documentation and Tutorials
-
-- https://datajoint.org -- start page
-- https://docs.datajoint.org -- up-to-date documentation
-- https://tutorials.datajoint.io -- step-by-step tutorials
-- https://elements.datajoint.org -- catalog of example pipelines
-- https://codebook.datajoint.io -- interactive online tutorials
+- Legacy Resources (To be replaced by above)
+  - [Documentation](https://docs.datajoint.org)
+  - [Tutorials](https://tutorials.datajoint.org)
 
 ## Citation
 
@@ -39,136 +31,3 @@ pip3 install --upgrade datajoint
 - DataJoint ([RRID:SCR_014543](https://scicrunch.org/resolver/SCR_014543)) - DataJoint for Python (version `<Enter version number>`)
 
 - Yatsenko D, Reimer J, Ecker AS, Walker EY, Sinz F, Berens P, Hoenselaar A, Cotton RJ, Siapas AS, Tolias AS. DataJoint: managing big scientific data using MATLAB or Python. bioRxiv. 2015 Jan 1:031658. doi: https://doi.org/10.1101/031658
-
-## Python Native Blobs
-
-<details>
-<summary>Click to expand details</summary>
-
-DataJoint 0.12 adds full support for all native python data types in blobs: tuples, lists, sets, dicts, strings, bytes, `None`, and all their recursive combinations.
-The new blobs are a superset of the old functionality and are fully backward compatible.
-In previous versions, only MATLAB-style numerical arrays were fully supported.
-Some Python datatypes such as dicts were coerced into numpy recarrays and then fetched as such.
-
-However, since some Python types were coerced into MATLAB types, old blobs and new blobs may now be fetched as different types of objects even if they were inserted the same way.
-For example, new `dict` objects will be returned as `dict` while the same types of objects inserted with `datajoint 0.11` will be recarrays.
-
-Since this is a big change, we chose to temporarily disable this feature by default in DataJoint for Python 0.12.x, allowing users to adjust their code if necessary.
-From 13.x, the flag will default to True (on), and will ultimately be removed when corresponding decode support for the new format is added to datajoint-matlab (see: datajoint-matlab #222, datajoint-python #765).
-
-The flag is configured by setting the `enable_python_native_blobs` flag in `dj.config`.
-
-```python
-import datajoint as dj
-dj.config["enable_python_native_blobs"] = True
-```
-
-You can safely enable this setting if both of the following are true:
-
-- The only kinds of blobs your pipeline have inserted previously were numerical arrays.
-- You do not need to share blob data between Python and MATLAB.
-
-Otherwise, read the following explanation.
-
-DataJoint v0.12 expands DataJoint's blob serialization mechanism with
-improved support for complex native python datatypes, such as dictionaries
-and lists of strings.
-
-Prior to DataJoint v0.12, certain python native datatypes such as
-dictionaries were 'squashed' into numpy structured arrays when saved into
-blob attributes. This facilitated easier data sharing between MATLAB
-and Python for certain record types. However, this created a discrepancy
-between insert and fetch datatypes which could cause problems in other
-portions of users pipelines.
-
-DataJoint v0.12, removes the squashing behavior, instead encoding native python datatypes in blobs directly.
-However, this change creates a compatibility problem for pipelines
-which previously relied on the type squashing behavior since records
-saved via the old squashing format will continue to fetch
-as structured arrays, whereas new record inserted in DataJoint 0.12 with
-`enable_python_native_blobs` would result in records returned as the
-appropriate native python type (dict, etc).  
-Furthermore, DataJoint for MATLAB does not yet support unpacking native Python datatypes.
-
-With `dj.config["enable_python_native_blobs"]` set to `False`,
-any attempt to insert any datatype other than a numpy array will result in an exception.
-This is meant to get users to read this message in order to allow proper testing
-and migration of pre-0.12 pipelines to 0.12 in a safe manner.
-
-The exact process to update a specific pipeline will vary depending on
-the situation, but generally the following strategies may apply:
-
-- Altering code to directly store numpy structured arrays or plain
-  multidimensional arrays. This strategy is likely best one for those
-  tables requiring compatibility with MATLAB.
-- Adjust code to deal with both structured array and native fetched data
-  for those tables that are populated with `dict`s in blobs in pre-0.12 version.
-  In this case, insert logic is not adjusted, but downstream consumers
-  are adjusted to handle records saved under the old and new schemes.
-- Migrate data into a fresh schema, fetching the old data, converting blobs to
-  a uniform data type and re-inserting.
-- Drop/Recompute imported/computed tables to ensure they are in the new
-  format.
-
-As always, be sure that your data is safely backed up before modifying any
-important DataJoint schema or records.
-
-</details>
-
-## API docs
-
-The API documentation can be built with mkdocs using the docker compose file in
-`docs/` with the following command:
-
-```bash
-MODE="LIVE" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
-```
-
-The site will then be available at `http://localhost/`. When finished, be sure to run
-the same command as above, but replace `up --build` with `down`.
-
-## Running Tests Locally
-
-<details>
-<summary>Click to expand details</summary>
-
-- Create an `.env` with desired development environment values e.g.
-
-```sh
-PY_VER=3.9
-MYSQL_VER=5.7
-DISTRO=alpine
-MINIO_VER=RELEASE.2022-01-03T18-22-58Z
-HOST_UID=1000
-```
-
-- `cp local-docker-compose.yml docker-compose.yml`
-- `docker-compose up -d` (Note configured `JUPYTER_PASSWORD`)
-- Select a means of running Tests e.g. Docker Terminal, or Local Terminal (see bottom)
-- Add entry in `/etc/hosts` for `127.0.0.1 fakeservices.datajoint.io`
-- Run desired tests. Some examples are as follows:
-
-| Use Case                    | Shell Code                                                                      |
-| --------------------------- | ------------------------------------------------------------------------------- |
-| Run all tests               | `nosetests -vsw tests --with-coverage --cover-package=datajoint`                |
-| Run one specific class test | `nosetests -vs --tests=tests.test_fetch:TestFetch.test_getattribute_for_fetch1` |
-| Run one specific basic test | `nosetests -vs --tests=tests.test_external_class:test_insert_and_fetch`         |
-
-### Launch Docker Terminal
-
-- Shell into `datajoint-python_app_1` i.e. `docker exec -it datajoint-python_app_1 sh`
-
-### Launch Local Terminal
-
-- See `datajoint-python_app` environment variables in `local-docker-compose.yml`
-- Launch local terminal
-- `export` environment variables in shell
-- Add entry in `/etc/hosts` for `127.0.0.1 fakeservices.datajoint.io`
-
-### Launch Jupyter Notebook for Interactive Use
-
-- Navigate to `localhost:8888`
-- Input Jupyter password
-- Launch a notebook i.e. `New > Python 3`
-
-</details>
diff --git a/docs/docker-compose.yaml b/docs/docker-compose.yaml
@@ -1,6 +1,4 @@
 # MODE="LIVE|QA|BUILD" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
-#
-# navigate to http://localhost/
 version: "2.4"
 services:
   docs:
@@ -18,7 +16,7 @@ services:
       - ..:/main
     user: ${HOST_UID}:anaconda
     ports:
-      - 80:80
+      - 8080:80
     command:
       - sh
       - -c
diff --git a/docs/mkdocs.yaml b/docs/mkdocs.yaml
@@ -8,14 +8,15 @@ nav:
   - Getting Started: getting-started/index.md
   - Existing Pipelines: concepts/existing-pipelines.md
   - Query Language:
-    - Common Commands:  query-lang/common-commands.md
-    - Operators: query-lang/operators.md
-    - Iteration: query-lang/iteration.md
-    - Query Caching: query-lang/query-caching.md
+      - Common Commands: query-lang/common-commands.md
+      - Operators: query-lang/operators.md
+      - Iteration: query-lang/iteration.md
+      - Query Caching: query-lang/query-caching.md
   - Reproducibility:
-    - Table Tiers: reproduce/table-tiers.md
-    - Make Method: reproduce/make-method.md
+      - Table Tiers: reproduce/table-tiers.md
+      - Make Method: reproduce/make-method.md
   - Tutorials: tutorials.md
+  - Develop: develop.md
   - Changelog: about/changelog.md
   - API: api/ # defer to gen-files + literate-nav
 
@@ -59,12 +60,12 @@ plugins:
             filters:
               - "!^_"
             docstring_style: sphinx # Replaces google default pending docstring updates
-            members_order: source      
+            members_order: source
             group_by_category: false
             line_length: 88
   - gen-files:
       scripts:
-      - ./src/api/make_pages.py
+        - ./src/api/make_pages.py
   - literate-nav:
       nav_file: navigation.md
   - exclude-search:
diff --git a/docs/src/develop.md b/docs/src/develop.md
@@ -0,0 +1,115 @@
+# Develop
+
+Included with the codebase is the recommended development environment configured using [DevContainer](https://containers.dev/).
+
+## Launch Development Environment
+
+Here are some options that provide a great developer experience:
+
+- **Cloud-based IDE**: (*recommended*) Launch using the [GitHub Codespaces](https://github.com/features/codespaces) named `Development`.
+- **Local IDE**:
+  - Ensure you have [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
+  - Ensure you have [Docker](https://docs.docker.com/get-docker/)
+  - Ensure you have [VSCode](https://code.visualstudio.com/)
+  - Install the [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
+  - `git clone` the codebase repository and open it in VSCode
+  - Use the `Dev Containers extension` to `Reopen in Container` (More info in the `Getting started` included with the extension)
+  - Your environment will finish loading once the file tree is populated and the terminal become active
+
+## Features
+
+Once you've successfully launched the development environment, you'll be able to take advantage of our developer tooling to help improve productivity.
+
+### Syntax Tests
+
+The following will verify that there are no syntax errors.
+
+```
+flake8 datajoint --count --select=E9,F63,F7,F82 --show-source --statistics
+```
+
+### Integration Tests
+
+The following will verify there are no regression errors by running our test suite of unit and integration tests.
+
+- Entire test suite:
+  ```
+  nosetests -vw tests
+  ```
+- A single functional test:
+  ```
+  nosetests -vs --tests=tests.test_external_class:test_insert_and_fetch
+  ```
+- A single class test:
+  ```
+  nosetests -vs --tests=tests.test_fetch:TestFetch.test_getattribute_for_fetch1
+  ```
+
+### Style Tests
+
+The following will verify that there are no code styling errors.
+
+```
+flake8 --ignore=E203,E722,W503 datajoint --count --max-complexity=62 --max-line-length=127 --statistics
+```
+
+The following will ensure the codebase has been formatted with [black](https://black.readthedocs.io/en/stable/).
+
+```
+black datajoint --check -v
+```
+
+The following will ensure the test suite has been formatted with [black](https://black.readthedocs.io/en/stable/).
+
+```
+black tests --check -v
+```
+
+### Jupyter
+
+Jupyter notebooks are supported in this environment. This means that when you `import datajoint`, it will use the current state of the source.
+
+Be sure to see the reference documenation if you are new to [running Jupyter notebooks w/ VSCode](https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_create-or-open-a-jupyter-notebook).
+
+### Debugger
+
+[VSCode Debugger](https://code.visualstudio.com/docs/editor/debugging) is a powerful tool that can really accelerate fixes.
+
+Try it as follows:
+
+- Create a python script of your choice
+- `import datajoint` (This will use the current state of the source)
+- Add breakpoints by adding red dots next to line numbers
+- Select the `Run and Debug` tab
+- Start by clicking the button `Run and Debug`
+
+### MySQL CLI
+
+It is often useful in development to connect to DataJoint's relational database backend directly using the MySQL CLI.
+
+Connect as follows to the database running within your developer environment:
+
+```
+mysql -hfakeservices.datajoint.io -uroot -psimple
+```
+
+### Documentation
+
+Our documentation is built using [MkDocs Material](https://squidfunk.github.io/mkdocs-material/). The easiest way to improve the documentation is by using the `docs/docker-compose.yaml` environment. The source can be modified in `docs/src` using markdown.
+
+The docs environment can be run using 3 modes:
+
+- **LIVE**: (*recommended*) This serves the docs locally. It supports live reloading on saves to `docs/src` files but does not support the docs version dropdown. Useful to see changes live.
+  ```
+  MODE="LIVE" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
+  ```
+- **QA**: This serves the docs locally. It supports the docs version dropdown but does not support live reloading. Useful as a final check.
+  ```
+  MODE="QA" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
+  ```
+- **BUILD**: This compiles the docs. Most useful for the docs deployment automation. Other modes are more useful to new contributors.
+  ```
+  MODE="BUILD" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
+  ```
+
+When the docs are served locally, use the VSCode `PORTS` tab (next to `TERMINAL`) to manage access to the forwarded ports. Docs are served on port `8080`.