Skip to content

Commit 0243baf

Browse files
Merge branch 'r013-docs' of https://github.com/dimitri-yatsenko/datajoint-python into mp
2 parents cc144a6 + 0409ffd commit 0243baf

File tree

3 files changed

+28
-4
lines changed

3 files changed

+28
-4
lines changed

OVERVIEW.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# DataJoint Overview
2+
3+
DataJoint is a library for interacting with scientific databases integrating computational dependencies as part of the data model. It is an ideal tool for team projects working on shared data-centric computational workflows.
4+
5+
## Why use databases in scientific studes?
6+
7+
Many scientists are reluctant to use databases due to their perceived unwieldiness, opting instead to use file repositories for managing their shared data. [Gray, 2005](https://arxiv.org/abs/cs/0502008)
8+
9+
Yet databases provide several key advantages when it comes to sharing structured dynamic data:
10+
11+
1. **Data structure:** databases communicate and enforce structure reflecting the logic of the scientific study.
12+
2. **Concurrent access:** databases support transactions to allow multiple agents to read and write the data concurrently.
13+
3. **Consistency and integrity:** database provide ways to ensure that data operations from multiple parties are combined correctly without loss, misidentification, or mismatches.
14+
4. **Queries:** Databases simplify and accelerate data queries -- functions on data to obtain precise slices of the data without needing to send the entire dataset for analysis.
15+
16+
## What does DataJoint bring?
17+
DataJoint solves several key problems for using databases effectively in scientific projects:
18+
19+
1. **Complete relational data model:** database programming directly from a scientific computing language such as MATLAB and Python without the need for SQL.
20+
2. **Data definition language:** to define tables and dependencies in simple and consistent ways.
21+
3. **Diagramming notation:** to visualize and navigate tables and dependencies.
22+
4. **Query language:** to create flexible and precise queries with only a few operators.
23+
5. **Serialization framework:** to store and retrieve numerical arrays and other data structures directly in the database.
24+
6. **Support for automated distributed computations:** for computational dependencies in the data.

datajoint/connection.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ def query(self, query, args=(), *, as_dict=False, suppress_warnings=True, reconn
278278
# check cache first:
279279
use_query_cache = bool(self._query_cache)
280280
if use_query_cache and not re.match(r"\s*(SELECT|SHOW)", query):
281-
raise errors.DataJointError("Only SELECT query are allowed when query caching is on.")
281+
raise errors.DataJointError("Only SELECT queries are allowed when query caching is on.")
282282
if use_query_cache:
283283
if not config['query_cache']:
284284
raise errors.DataJointError("Provide filepath dj.config['query_cache'] when using query caching.")

datajoint/expression.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ class QueryExpression:
3636
"""
3737
_restriction = None
3838
_restriction_attributes = None
39-
_left = [] # True for left joins, False for inner joins
39+
_left = [] # list of booleans True for left joins, False for inner joins
4040
_original_heading = None # heading before projections
4141

4242
# subclasses or instantiators must provide values
@@ -260,7 +260,7 @@ def join(self, other, semantic_check=True, left=False):
260260
if semantic_check:
261261
assert_join_compatibility(self, other)
262262
join_attributes = set(n for n in self.heading.names if n in other.heading.names)
263-
# needs subquery if FROM class has common attributes with the other's FROM clause
263+
# needs subquery if self's FROM clause has common attributes with other's FROM clause
264264
need_subquery1 = need_subquery2 = bool(
265265
(set(self.original_heading.names) & set(other.original_heading.names))
266266
- join_attributes)
@@ -301,7 +301,7 @@ def proj(self, *attributes, **named_attributes):
301301
self.proj(...) or self.proj(Ellipsis) -- include all attributes (return self)
302302
self.proj() -- include only primary key
303303
self.proj('attr1', 'attr2') -- include primary key and attributes attr1 and attr2
304-
self.proj(..., '-attr1', '-attr2') -- include attributes except attr1 and attr2
304+
self.proj(..., '-attr1', '-attr2') -- include all attributes except attr1 and attr2
305305
self.proj(name1='attr1') -- include primary key and 'attr1' renamed as name1
306306
self.proj('attr1', dup='(attr1)') -- include primary key and attribute attr1 twice, with the duplicate 'dup'
307307
self.proj(k='abs(attr1)') adds the new attribute k with the value computed as an expression (SQL syntax)

0 commit comments

Comments
 (0)