-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathglossary.yml
More file actions
92 lines (62 loc) · 3.79 KB
/
glossary.yml
File metadata and controls
92 lines (62 loc) · 3.79 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Shell: |
A command processor that we interact with via the system prompt. In our case, the Shell we're interacting with is on the server.
executable: |
Software that is runnable on a system. Some of these are built in, such as `wc`, but most we will need to load or install into our system.
bash: |
Short for *Bourne* *Again* *Shell* - it is the command processor that is typically run in a text window, has its own programming language and syntax.
Terminal: |
The program on our own machine that we use to interact with a shell (usually on a remote machine)
Directory: |
A container in which we save files. Can be nested (that is, have a directory inside a directory)
metadata: |
Information about a file or dataset, such as the filename, or date created.
"batch jobs": |
Running the same task on a group of files. Can be done locally (on your computer) or on a cluster.
"batch processing": |
A way to automate processing a number of files. For example, we might have a folder of .bam files and want to check the counts on each of them.
cluster: |
A networked group of computers. Usually contains a driver node and compute nodes.
"Compute Job": |
A task that we have assigned a computer to run. This computer can be a compute node in a cluster, or our own machine.
subjob: |
A task that has a parent task. Usually a subjob = 1 node processing.
Registry: |
collection of repositories that you pull docker images from. Example repositories include DockerHub and Quay.io.
Image: |
what you download from a registry - the "recipe" for building the software environment. Stored in a registry. use `apptainer pull` to get image from a registry. Can also generate image from a Dockerfile
Container: |
The executable software environment actually installed and running on a machine. Runnable. Generate from `docker pull` from a repository.
"Snapshot File": |
An single archive file (`.tar.gz`) that contains the Docker container. Generate using `docker save` on a container. Also known as an *image file*.
Dockerfile: |
A file that specifies how to install software and its dependencies. You often base a Dockerfile on an existing Docker image
Tag: |
A bit of metadata that is used to *version* a container.
workflow: |
A way to process files through a sequence of steps. Also known as a pipeline.
"workflow manager": |
A system that works with the cluster manager to orchestrate processing data through a workflow
WDL: |
Short for Workflow Description Language. A standard for specifying a workflow, which includes describing inputs, saving intermediate outputs, and outputting processed files.
Docker: |
Software that builds and runs containers
DockerHub: |
A site to host your docker images. Your images can be private or public.
ephemeral: |
When you are finished with Docker containers, everything that you created in them will disappear when you finish running scripts in the container.
Apptainer: |
A container software that is often used on HPC systems. Can run Docker containers.
Cromwell: |
A workflow runner. Currently works with WDL files.
Sprocket: |
A newer workflow runner that works with WDL files.
"HPC": |
Short for High Performance Computing. See the hpc basics chapter for more info.
SLURM: |
"Simple Linux Utility for Resource Management" - it handles scheduling and distributing jobs among the compute nodes in the cluster.
glob: |
a pattern-matching mechanism used primarily in shells (like Bash) and various programming languages to match filenames and paths. It is also known as "filename expansion" or "wildcard matching."
"Application Program Interface": |
Also known as an API. The public way to interact with a webservice or software app.
"job array": |
A numbered list that usually goes from 1 to the length of the list. Used in SLURM batch processing.