Skip to content

Commit bb9351c

Browse files
add OVERVIEW.md
1 parent eb72758 commit bb9351c

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

OVERVIEW.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# DataJoint Overview
2+
3+
DataJoint is a library for interacting with scientific databases integrating computational dependencies as part of the data model. It is an ideal tool for team projects working on shared data-centric computational workflows.
4+
5+
## Why use databases in scientific studies?
6+
7+
Many scientists are reluctant to use databases due to their perceived unwieldiness, opting instead to use file repositories for managing their shared data. [Gray, 2005](https://arxiv.org/abs/cs/0502008)
8+
9+
Yet databases provide several key advantages when it comes to sharing structured dynamic data:
10+
11+
1. **Data structure:** databases communicate and enforce structure in data that reflects the logic of the scientific study.
12+
2. **Concurrent access:** databases support transactions to allow multiple agents to read and write the data concurrently.
13+
3. **Consistency and integrity:** database provide ways to ensure that data operations from multiple parties are combined correctly without loss, misidentification, or mismatches.
14+
4. **Queries:** Databases simplify and accelerate data queries -- functions on data to obtain precise slices of the data without needing to send the entire dataset for analysis.
15+
16+
## What does DataJoint bring?
17+
DataJoint solves several key problems for using databases effectively in scientific projects:
18+
19+
1. **Complete relational data model:** database programming directly from a scientific computing language such as MATLAB and Python without the need for SQL.
20+
2. **Data definition language:** to define tables and dependencies in simple and consistent ways.
21+
3. **Diagramming notation:** to visualize and navigate tables and dependencies.
22+
4. **Query language:** to create flexible and precise queries with only a few operators.
23+
5. **Serialization framework:** to store and retrieve numerical arrays and other data structures directly in the database.
24+
6. **Automated distributed computations:** computational dependencies
25+
26+
27+
28+

0 commit comments

Comments
 (0)