Skip to content

Commit 4b29c1d

Browse files
committed
Add sections to dba page
1 parent d2c284b commit 4b29c1d

File tree

1 file changed

+289
-3
lines changed

1 file changed

+289
-3
lines changed

docs/src/sysadmin/dba.md

Lines changed: 289 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,167 @@
1-
# User Management
1+
# Database Administration
2+
3+
## Hosting
4+
5+
Let’s say a person, a lab, or a multi-lab consortium decide to use DataJoint as their
6+
data pipeline platform.
7+
What IT resources and support will be required?
8+
9+
DataJoint uses a MySQL-compatible database server such as MySQL, MariaDB, Percona
10+
Server, or Amazon Aurora to store the structured data used for all relational
11+
operations.
12+
Large blocks of data associated with these records such as multidimensional numeric
13+
arrays (signals, images, scans, movies, etc) can be stored within the database or
14+
stored in additionally configured [bulk storage](../client/stores.md).
15+
16+
The first decisions you need to make are where this server will be hosted and how it
17+
will be administered.
18+
The server may be hosted on your personal computer, on a dedicated machine in your lab,
19+
or in a cloud-based database service.
20+
21+
### Cloud hosting
22+
23+
Increasingly, many teams make use of cloud-hosted database services, which allow great
24+
flexibility and easy administration of the database server.
25+
A cloud hosting option will be provided through https://works.datajoint.com.
26+
DataJoint Works simplifies the setup for labs that wish to host their data pipelines in
27+
the cloud and allows sharing pipelines between multiple groups and locations.
28+
Being an open-source solution, other cloud services such as Amazon RDS can also be used
29+
in this role, albeit with less DataJoint-centric customization.
30+
31+
### Self hosting
32+
33+
In the most basic configuration, the relational database software and DataJoint are
34+
installed onto a single computer which is used by an individual user.
35+
To support a small group of users, a larger computer can be used instead and configured
36+
for remote access.
37+
As the number of users grows, individual workstations can be installed with the
38+
DataJoint software and used to connect to a larger and more specialized centrally
39+
located database server machine.
40+
41+
For even larger groups or multi-site collaborations, multiple database servers may be
42+
configured in a replicated fashion to support larger workloads and simultaneous
43+
multi-site access.
44+
The following section provides some basic guidelines for these configurations here and
45+
in the subsequent sections of the documentation.
46+
47+
### General server / hardware support requirements
48+
49+
The following table lists some likely scenarios for DataJoint database server
50+
deployments and some reasonable estimates of the required computer hardware.
51+
The required IT/systems support needed to ensure smooth operations in the absence of
52+
local database expertise is also listed.
53+
54+
#### IT infrastructures
55+
56+
| Usage Scenario | DataJoint Database Computer | Required IT Support |
57+
| -- | -- | -- |
58+
| Single User | Personal Laptop or Workstation | Self-Supported or Ad-Hoc General IT Support |
59+
| Small Group (e.g. 2-10 Users) | Workstation or Small Server | Ad-Hoc General or Experienced IT Support |
60+
| Medium Group (e.g. 10-30 Users) | Small to Medium Server | Ad-Hoc/Part Time Experienced or Specialized IT Support |
61+
| Large Group/Department (e.g. 30-50+ Users) | Medium/Large Server or Multi-Server Replication | Part Time/Dedicated Experienced or Specialized IT Support |
62+
| Multi-Location Collaboration (30+ users, Geographically Distributed) | Large Server, Advanced Replication | Dedicated Specialized IT Support |
63+
64+
## Configuration
65+
66+
### Hardware considerations
67+
68+
As in any computer system, CPU, RAM memory, disk storage, and network speed are
69+
important components of performance.
70+
The relational database component of DataJoint is no exception to this rule.
71+
This section discusses the various factors relating to selecting a server for your
72+
DataJoint pipelines.
73+
74+
#### CPU
75+
76+
CPU speed and parallelism (number of cores/threads) will impact the speed of queries
77+
and the number of simultaneous queries which can be efficiently supported by the system.
78+
It is a good rule of thumb to have enough cores to support the number of active users
79+
and background tasks you expect to have running during a typical 'busy' day of usage.
80+
For example, a team of 10 people might want to have 8 cores to support a few active
81+
queries and background tasks.
82+
83+
#### RAM
84+
85+
The amount of RAM will impact the amount of DataJoint data kept in memory, allowing for
86+
faster querying of data since the data can be searched and returned to the user without
87+
needing to access the slower disk drives.
88+
It is a good idea to get enough memory to fully store the more important and frequently
89+
accessed portions of your dataset with room to spare, especially if in-database blob
90+
storage is used instead of external [bulk storage](bulk-storage.md).
91+
92+
#### Disk
93+
94+
The disk storage for a DataJoint database server should have fast random access,
95+
ideally with flash-based storage to eliminate the rotational delay of mechanical hard
96+
drives.
97+
98+
#### Networking
99+
100+
When network connections are used, network speed and latency are important to ensure
101+
that large query results can be quickly transferred across the network and that delays
102+
due to data entry/query round-trip have minimal impact on the runtime of the program.
103+
104+
#### General recommendations
105+
106+
DataJoint datasets can consist of many thousands or even millions of records.
107+
Generally speaking one would want to make sure that the relational database system has
108+
sufficient CPU speed and parallelism to support a typical number of concurrent users
109+
and to execute searches quickly.
110+
The system should have enough RAM to store the primary key values of commonly used
111+
tables and operating system caches.
112+
Disk storage should be fast enough to support quick loading of and searching through
113+
the data.
114+
Lastly, network bandwidth must be sufficient to support transferring user records
115+
quickly.
116+
117+
### Large-scale installations
118+
119+
Database replication may be beneficial if system downtime or precise database
120+
responsiveness is a concern
121+
Replication can allow for easier coordination of maintenance activities, faster
122+
recovery in the event of system problems, and distribution of the database workload
123+
across server machines to increase throughput and responsiveness.
124+
125+
#### Multi-master replication
126+
127+
Multi-master replication configurations allow for all replicas to be used in a read/
128+
write fashion, with the workload being distributed among all machines.
129+
However, multi-master replication is also more complicated, requiring front-end
130+
machines to distribute the workload, similar performance characteristics on all
131+
replicas to prevent bottlenecks, and redundant network connections to ensure the
132+
replicated machines are always in sync.
133+
134+
### Recommendations
135+
136+
It is usually best to go with the simplest solution which can suit the requirements of
137+
the installation, adjusting workloads where possible and adding complexity only as
138+
needs dictate.
139+
140+
Resource requirements of course depend on the data collection and processing needs of
141+
the given pipeline, but there are general size guidelines that can inform any system
142+
configuration decisions.
143+
A reasonably powerful workstation or small server should support the needs of a small
144+
group (2-10 users).
145+
A medium or large server should support the needs of a larger user community (10-30
146+
users).
147+
A replicated or distributed setup of 2 or more medium or large servers may be required
148+
in larger cases.
149+
These requirements can be reduced through the use of external or cloud storage, which
150+
is discussed in the subsequent section.
151+
152+
| Usage Scenario | DataJoint Database Computer | Hardware Recommendation |
153+
| -- | -- | -- |
154+
| Single User | Personal Laptop or Workstation | 4 Cores, 8-16GB or more of RAM, SSD or better storage |
155+
| Small Group (e.g. 2-10 Users) | Workstation or Small Server | 8 or more Cores, 16GB or more of RAM, SSD or better storage |
156+
| Medium Group (e.g. 10-30 Users) | Small to Medium Server | 8-16 or more Cores, 32GB or more of RAM, SSD/RAID or better storage |
157+
| Large Group/Department (e.g. 30-50+ Users) | Medium/Large Server or Multi-Server Replication | 16-32 or more Cores, 64GB or more of RAM, SSD Raid storage, multiple machines |
158+
| Multi-Location Collaboration (30+ users, Geographically Distributed) | Large Server, Advanced Replication | 16-32 or more Cores, 64GB or more of RAM, SSD Raid storage, multiple machines; potentially multiple machines in multiple locations |
159+
160+
### Docker
161+
162+
A Docker image is available for a MySQL server configured to work with DataJoint: https://github.com/datajoint/mysql-docker.
163+
164+
## User Management
2165

3166
Create user accounts on the MySQL server. For example, if your
4167
username is alice, the SQL code for this step is:
@@ -42,7 +205,7 @@ statement.
42205
SHOW GRANTS FOR 'alice'@'%';
43206
```
44207

45-
## Grouping with Wildcards
208+
### Grouping with Wildcards
46209

47210
Depending on the complexity of your installation, using additional
48211
wildcards to group access rules together might make managing user
@@ -61,7 +224,7 @@ GRANT SELECT ON `user\_%\_%`.* TO 'bob'@'%';
61224

62225
to enable `bob` to query all other users tables using the
63226
`user_username_database` convention without needing to explicitly
64-
give him access to ``alice\_%``, ``charlie\_%``, and so on.
227+
give him access to `alice\_%`, `charlie\_%`, and so on.
65228

66229
This convention can be further expanded to create notions of groups
67230
and protected schemas for background processing, etc. For example:
@@ -78,3 +241,126 @@ could allow both bob an alice to read/write into the
78241
```group\_shared``` databases, but in the case of the
79242
```group\_wonderland``` databases, read write access is restricted
80243
to alice.
244+
245+
## Backups and Recovery
246+
247+
Backing up your DataJoint installation is critical to ensuring that your work is safe
248+
and can be continued in the event of system failures, and several mechanisms are
249+
available to use.
250+
251+
Much like your live installation, your backup will consist of two portions:
252+
253+
- Backup of the Relational Data
254+
- Backup of optional external bulk storage
255+
256+
This section primarily deals with backup of the relational data since most of the
257+
optional bulk storage options use "regular" flat-files for storage and can be backed up
258+
via any "normal" disk backup regime.
259+
260+
There are many options to backup MySQL; subsequent sections discuss a few options.
261+
262+
### Cloud hosted backups
263+
264+
In the case of cloud-hosted options, many cloud vendors provide automated backup of
265+
your data, and some facility for downloading such backups externally.
266+
Due to the wide variety of cloud-specific options, discussion of these options falls
267+
outside of the scope of this documentation.
268+
However, since the cloud server is also a MySQL server, other options listed here may
269+
work for your situation.
270+
271+
### Disk-based backup
272+
273+
The simplest option for many cases is to perform a disk-level backup of your MySQL
274+
installation using standard disk backup tools.
275+
It should be noted that all database activity should be stopped for the duration of the
276+
backup to prevent errors with the backed up data.
277+
This can be done in one of two ways:
278+
279+
- Stopping the MySQL server program
280+
- Using database locks
281+
282+
These methods are required since MySQL data operations can be ongoing in the background
283+
even when no user activity is ongoing.
284+
To use a database lock to perform a backup, the following commands can be used as the
285+
MySQL administrator:
286+
287+
```mysql
288+
FLUSH TABLES WITH READ LOCK;
289+
UNLOCK TABLES;
290+
```
291+
292+
The backup should be performed between the issuing of these two commands, ensuring the
293+
database data is consistent on disk when it is backed up.
294+
295+
### MySQLDump
296+
297+
Disk based backups may not be feasible for every installation, or a database may
298+
require constant activity such that stopping it for backups is not feasible.
299+
In such cases, the simplest option is
300+
[MySQLDump](https://dev.mysql.com/doc/mysql-backup-excerpt/8.0/en/using-mysqldump.html),
301+
a command line tool that prints the contents of your database contents in SQL form.
302+
303+
This tool is generally acceptable for most cases and is especially well suited for
304+
smaller installations due to its simplicity and ease of use.
305+
306+
For larger installations, the lower speed of MySQLDump can be a limitation, since it
307+
has to convert the database contents to and from SQL rather than dealing with the
308+
database files directly.
309+
Additionally, since backups are performed within a transaction, the backup will be
310+
valid up to the time the backup began rather than to its completion, which can make
311+
ensuring that the latest data are fully backed up more difficult as the time it takes
312+
to run a backup grows.
313+
314+
### Percona XTraBackup
315+
316+
The Percona `xtrabackup` tool provides near-realtime backup capability of a MySQL
317+
installation, with extended support for replicated databases, and is a good tool for
318+
backing up larger databases.
319+
320+
However, this tool requires local disk access as well as reasonably fast backup media,
321+
since it builds an ongoing transaction log in real time to ensure that backups are
322+
valid up to the point of their completion.
323+
This strategy fails if it cannot keep up with the write speed of the database.
324+
Further, the backups it generates are in binary format and include incomplete database
325+
transactions, which require careful attention to detail when restoring.
326+
327+
As such, this solution is recommended only for advanced use cases or larger databases
328+
where limitations of the other solutions may apply.
329+
330+
### Locking and DDL issues
331+
332+
One important thing to note is that at the time of writing, MySQL's transactional
333+
system is not `data definition language` aware, meaning that changes to table
334+
structures occurring during some backup schemes can result in corrupted backup copies.
335+
If schema changes will be occurring during your backup window, it is a good idea to
336+
ensure that appropriate locking mechanisms are used to prevent these changes during
337+
critical steps of the backup process.
338+
339+
However, on busy installations which cannot be stopped, the use of locks in many backup
340+
utilities may cause issues if your programs expect to write data to the database during
341+
the backup window.
342+
343+
In such cases it might make sense to review the given backup tools for locking related
344+
options or to use other mechanisms such as replicas or alternate backup tools to
345+
prevent interaction of the database.
346+
347+
### Replication and snapshots for backup
348+
349+
Larger databases consisting of many Terabytes of data may take many hours or even days
350+
to backup and restore, and so downtime resulting from system failure can create major
351+
impacts to ongoing work.
352+
353+
While not backup tools per-se, use of MySQL replication and disk snapshots
354+
can be useful to assist in reducing the downtime resulting from a full database outage.
355+
356+
Replicas can be configured so that one copy of the data is immediately online in the
357+
event of server crash.
358+
When a server fails in this case, users and programs simply restart and point to the
359+
new server before resuming work.
360+
361+
Replicas can also reduce the system load generated by regular backup procedures, since
362+
they can be backed up instead of the main server.
363+
Additionally they can allow more flexibility in a given backup scheme, such as allowing
364+
for disk snapshots on a busy system that would not otherwise be able to be stopped.
365+
A replica copy can be stopped temporarily and then resumed while a disk snapshot or
366+
other backup operation occurs.

0 commit comments

Comments
 (0)