-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.qmd
More file actions
144 lines (93 loc) · 6.85 KB
/
index.qmd
File metadata and controls
144 lines (93 loc) · 6.85 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
# Bash for Bio {.unnumbered}
## Who this course is for
- Have you needed to align a folder of FASTA files and not know how to do it?
- Do you want to automate an R or Python script you wrote to work on a bunch of files?
- Do you want to do all of this on a high performance cluster ({{<glossary HPC>}})?
If so, this course is for you! We will learn enough bash scripting to do useful things on the Fred Hutch computing cluster (affectionately called "gizmo") and automate the boring parts.
## Learning Objectives
- **Apply** bash scripting to execute alignment, and Python/R scripts
- **Navigate** and **process** data on the different filesystems available at FH
- **Manage** software dependencies reproducibly using container-based technologies such as Docker/Apptainer containers or EasyBuild modules
- **Articulate** basic HPC architecture concepts and why they’re useful in your work
- **Leverage** bash scripting to **execute** {{<glossary "batch jobs">}} on a high performance cluster.
- **Utilize** workflow managers such as `cromwell` to process multiple files in a multi-step {{<glossary WDL>}} workflow.
:::{.callout-note}
## Wasn't there another Bash for Bioinformatics book?
I originally wrote a book that was called Bash for Bioinformatics, which was about learning enough bash to use the cloud-based DNANexus platform effectively.
I have renamed that book [Bash for DNANexus](https://laderast.github.io/bash_for_dnanexus/), and named this course *Bash for Bioinformatics*.
This book shares bones with *Bash for DNANexus*, but has more of a focus on running tasks on high performance computing systems ({{<glossary "HPC">}}).
:::
## Prerequisites
- You will need an account on `rhino` and know how to connect to it through VPN. If you have taken the [Intro to Fred Hutch Cluster Computing](https://hutchdatascience.org/FH_Cluster_101/) workshop, then you will be ready.
- We highly recommend reviewing [Intro to Command Line](https://hutchdatascience.org/Intro_to_Command_Line/) and [Intro to Fred Hutch Cluster Computing](https://hutchdatascience.org/FH_Cluster_101/).
- Basic knowledge of the following commands:
- `ls`
- `cd` and basic directory navigation
- `mv`/`cp`/`mkdir`/`rm`
We will assume that you will do all of your work in your home directory on Rhino. We will not be using that much space in your home directory.
:::{.callout-note}
## Terminology
We know that not all of us have the same vocabulary. We try to define terminology as much as possible. These are indicated by double underlines such as this:
{{<glossary "Compute Job">}}
You can click and hold on the term to define it.
:::
## Instructors / TAs
If you need to schedule some time to talk, please schedule with Ted.
- Ted Laderas (Main Instructor), Director of Training and Community, Office of the Chief Data Officer
- Taylor Firman (TA), Research Informatics Lead, Office of the Chief Data Officer
- Scott Chamberlain (TA), Software Developer, Office of the Chief Data Officer
- Chris Lo (TA), Data Science Trainer, Office of the Chief Data Officer
We all have experience running jobs on HPC and `gizmo`. Please reach out if you have any questions.
## Introductions
In chat, please introduce yourself:
- Your Name & Your Group
- What you want to learn in this course
- Favorite Fall activity
## Culture of the course
It is hard work teaching an online/hybrid course. If you can, please turn your camera on - it is difficult to teach to a group of blank screens.
- Learning on the job is challenging
- I will move at learner's pace; we are learning together.
- Teach not for mastery, but teach for empowerment to learn effectively.
We sometimes struggle with our data science in isolation, unaware that someone two doors down from us has gone through the same struggle.
- *We learn and work better with our peers.*
- *Know that if you have a question, other people will have it.*
- *Asking questions is our way of taking care of others.*
We ask you to follow [Participation Guidelines](https://hutchdatascience.org/communitystudios/guidelines/) and [Code of Conduct](https://github.com/fhdsl/coc).
Please note that this is the first time this course has been given - we have done our best to edit all mistakes out there, but there may be mistakes. So be patient and reach out if something isn't working.
If you do find a mistake, please report it to Ted. I'll add you to the acknowledgements below.
## Schedule
Class is on Thursdays, 12:00 - 1:30 PM. There will be an office hour 1/2 hour after class if you need help.
You should complete the readings before class for weeks 3 and 4, so we can hit the ground running.
|Week|Date|Topics|Reading|
|----|----|------|-------|
|Preclass||Review [Intro to Command Line](https://hutchdatascience.org/Intro_to_Command_Line/) and [Cluster 101](https://hutchdatascience.org/FH_Cluster_101/)||
|Week 1|October 9|[Filesystem Basics](01_basics.qmd)|Bite Size Bash|
|Week 2|October 16|[Writing and Running Bash Scripts](02_scripting.qmd)|Bite Size Bash|
|No Class|October 23|OCDO Retreat||
|Week 3|October 30|[Batch Processing and HPC Jobs](03_batch.qmd)|[HPC Basics](hpc-basics.qmd)
|Week 4|November 6|[Testing Scripts/Workflow Managers](04_containers_workflows.qmd)|[Container Basics](container-basics.qmd)|
|On your own time|[Testing Scripts](testing.qmd)||
|On your own time|[Configuring your Bash Shell](configuring.qmd)||
## Reference Texts
- We will be using Julia Evan's [Bite Size Bash](https://wizardzines.com/zines/bite-size-bash/) as our reference text. Julia's explanations are incredibly clear and it will be a valuable reference even beyond this course. The PDF is available in the Google Classroom materials. Please do not share with others - we have a group rate and it is only $12 for individual purchases.
- If you want to know the true power of the command line, I recommend [Data Science at the Command Line](https://jeroenjanssens.com/dsatcl/). This book showcases how much you can get done with just command line.
## Badge of completion
{width="450"}
We offer a [badge of completion](https://www.credly.com/org/fred-hutch/badge/bash-for-bioinformatics) when you finish the course!
What it is:
- A display of what you accomplished in the course, shareable in your professional networks such as LinkedIn, similar to online education services such as Coursera. A way for you to be accountable for your learning.
What it isn't:
- Accreditation through an university or degree-granting program.
Requirements:
- Sign up on the badging spreadsheet (will send link out in class).
- Complete badge-required sections of the exercises for 3 out of 4 assignments. We'll cover this in class.
## Acknowledgements
This course would not be live without the efforts of:
- Emma Bishop
- Scott Chamberlain
- Taylor Firman
- Chris Lo
- Sonu Mishra
- Sitapriya Moorthi
- Dan Tenenbaum
Thanking you for all your help testing and editing this course.