|
1 | 1 | # DataJoint Overview
|
2 | 2 |
|
3 |
| -DataJoint is a library for interacting with scientific databases integrating computational dependencies as part of the data model. It is an ideal tool for team projects working on shared data-centric computational workflows. |
| 3 | +DataJoint is a library for interacting with scientific databases that support computational dependencies as part of the data model. |
| 4 | +DataJoint serves as a principal framework for organizing data and computations in team projects. |
4 | 5 |
|
5 |
| -## Why use databases in scientific studes? |
6 |
| - |
7 |
| -Many scientists are reluctant to use databases due to their perceived unwieldiness, opting instead to use file repositories for managing their shared data. [Gray, 2005](https://arxiv.org/abs/cs/0502008) |
8 |
| - |
9 |
| -Yet databases provide several key advantages when it comes to sharing structured dynamic data: |
| 6 | +Databases provide several key advantages when it comes to sharing structured dynamic data: |
10 | 7 |
|
11 | 8 | 1. **Data structure:** databases communicate and enforce structure reflecting the logic of the scientific study.
|
12 | 9 | 2. **Concurrent access:** databases support transactions to allow multiple agents to read and write the data concurrently.
|
13 | 10 | 3. **Consistency and integrity:** database provide ways to ensure that data operations from multiple parties are combined correctly without loss, misidentification, or mismatches.
|
14 | 11 | 4. **Queries:** Databases simplify and accelerate data queries -- functions on data to obtain precise slices of the data without needing to send the entire dataset for analysis.
|
15 | 12 |
|
16 |
| -## What does DataJoint bring? |
17 | 13 | DataJoint solves several key problems for using databases effectively in scientific projects:
|
18 | 14 |
|
19 | 15 | 1. **Complete relational data model:** database programming directly from a scientific computing language such as MATLAB and Python without the need for SQL.
|
20 | 16 | 2. **Data definition language:** to define tables and dependencies in simple and consistent ways.
|
21 | 17 | 3. **Diagramming notation:** to visualize and navigate tables and dependencies.
|
22 | 18 | 4. **Query language:** to create flexible and precise queries with only a few operators.
|
23 | 19 | 5. **Serialization framework:** to store and retrieve numerical arrays and other data structures directly in the database.
|
24 |
| -6. **Support for automated distributed computations:** for computational dependencies in the data. |
| 20 | +6. **Support for automated distributed computations:** for computational dependencies in the data. |
0 commit comments