Merge branch 'r013-docs' of https://github.com/dimitri-yatsenko/datajoint-python into mp

dimitri-yatsenko · dimitri-yatsenko · commit 0243baf4c316 · 2021-09-25T20:19:59.000-05:00
diff --git a/OVERVIEW.md b/OVERVIEW.md
@@ -0,0 +1,24 @@
+# DataJoint Overview
+
+DataJoint is a library for interacting with scientific databases integrating computational dependencies as part of the data model. It is an ideal tool for team projects working on shared data-centric computational workflows.
+
+## Why use databases in scientific studes?
+
+Many scientists are reluctant to use databases due to their perceived unwieldiness, opting instead to use file repositories for managing their shared data. [Gray, 2005](https://arxiv.org/abs/cs/0502008)
+
+Yet databases provide several key advantages when it comes to sharing structured dynamic data:
+ 
+1. **Data structure:** databases communicate and enforce structure reflecting the logic of the scientific study.
+2. **Concurrent access:** databases support transactions to allow multiple agents to read and write the data concurrently.
+3. **Consistency and integrity:** database provide ways to ensure that data operations from multiple parties are combined correctly without loss, misidentification, or mismatches.
+4. **Queries:** Databases simplify and accelerate data queries -- functions on data to obtain precise slices of the data without needing to send the entire dataset for analysis. 
+
+## What does DataJoint bring?
+DataJoint solves several key problems for using databases effectively in scientific projects:
+
+1. **Complete relational data model:** database programming directly from a scientific computing language such as MATLAB and Python without the need for SQL. 
+2. **Data definition language:** to define tables and dependencies in simple and consistent ways.
+3. **Diagramming notation:** to visualize and navigate tables and dependencies.
+4. **Query language:** to create flexible and precise queries with only a few operators.
+5. **Serialization framework:** to store and retrieve numerical arrays and other data structures directly in the database.
+6. **Support for automated distributed computations:** for computational dependencies in the data.
diff --git a/datajoint/connection.py b/datajoint/connection.py
@@ -278,7 +278,7 @@ def query(self, query, args=(), *, as_dict=False, suppress_warnings=True, reconn
         # check cache first:
         use_query_cache = bool(self._query_cache)
         if use_query_cache and not re.match(r"\s*(SELECT|SHOW)", query):
-            raise errors.DataJointError("Only SELECT query are allowed when query caching is on.")
+            raise errors.DataJointError("Only SELECT queries are allowed when query caching is on.")
         if use_query_cache:
             if not config['query_cache']:
                 raise errors.DataJointError("Provide filepath dj.config['query_cache'] when using query caching.")
diff --git a/datajoint/expression.py b/datajoint/expression.py
@@ -36,7 +36,7 @@ class QueryExpression:
     """
     _restriction = None
     _restriction_attributes = None
-    _left = []  # True for left joins, False for inner joins
+    _left = []  # list of booleans True for left joins, False for inner joins
     _original_heading = None  # heading before projections
 
     # subclasses or instantiators must provide values
@@ -260,7 +260,7 @@ def join(self, other, semantic_check=True, left=False):
         if semantic_check:
             assert_join_compatibility(self, other)
         join_attributes = set(n for n in self.heading.names if n in other.heading.names)
-        # needs subquery if FROM class has common attributes with the other's FROM clause
+        # needs subquery if self's FROM clause has common attributes with other's FROM clause
         need_subquery1 = need_subquery2 = bool(
             (set(self.original_heading.names) & set(other.original_heading.names))
             - join_attributes)
@@ -301,7 +301,7 @@ def proj(self, *attributes, **named_attributes):
         self.proj(...) or self.proj(Ellipsis) -- include all attributes (return self)
         self.proj() -- include only primary key
         self.proj('attr1', 'attr2')  -- include primary key and attributes attr1 and attr2
-        self.proj(..., '-attr1', '-attr2')  -- include attributes except attr1 and attr2
+        self.proj(..., '-attr1', '-attr2')  -- include all attributes except attr1 and attr2
         self.proj(name1='attr1') -- include primary key and 'attr1' renamed as name1
         self.proj('attr1', dup='(attr1)') -- include primary key and attribute attr1 twice, with the duplicate 'dup'
         self.proj(k='abs(attr1)') adds the new attribute k with the value computed as an expression (SQL syntax)