Before you attend the workshop there are a couple of things we would like you to do to get setup so you are able to participate in all sections of the workshop.
For those of you that indicated you did not have an account on the Dartmouth Discovery cluster you should have received an email explaining how to set that up, please make sure this is done and you are able to log into your account BEFORE the workshop begins. YOU WILL NEED A DISCOVERY ACCOUNT!
We will be using a dataset downloaded from the Sequence Read Archive (SRA), a public repository of genomic data. This dataset comes from this paper, and was collected from human airway smooth muscle cells to test gene pathways effected by exposure to Glucocorticoid drugs, which have been historically used for their anti-inflammatory effects to treat asthma.
All the teaching materials are located within the GitHub repository. We suggest you bookmark this page so you can easily get back to this repository each day.
Lessons are provided in Markdown format (files with extension (.md)) and also contain 'code chunks' that you will use to perform the analysis of this dataset. The majority of analysis will be performed using a terminal application or emulator, with an open ssh connection to the Discovery cluster. You may copy and paste the code chunks into your terminal window to perform analyses, or type them out by hand.
If you wish to edit, modify or save the code in its own file (as you would in a real analysis), this can be achieved using a Text Editor application. Numerous free text editors exist, such as Sublime Text, BBEdit, and Atom. Experimenting with your code in a text editor is an excellent way to learn, as well as document your work on a project.
The terminal application you use will depend on the operating system you are using. If you do not already have a terminal emulator on your machine, please download one. Below are some recommendations for different operating systems.
| Operating system | Terminal emulators |
|---|---|
| Mac | Terminal (comes pre-installed) iTerm2 |
| Windows | MobaXterm PuTTY iTerm2 |
| Linux | Konsole, Terminal, etc. (should be pre-installed but depends on the desktop environment you are running) |
We will be using the Integrative Genomics Viewer (IGV), a genome browser produced by researchers at the Broad Institute, to explore and visualize RNA-seq data.
You will need to download and install the IGV desktop application for your operating system before the workshop begins. The latest versions of IGV can be found at their downloads page. After installing IGV, try opening the application on your computer to confirm the installation was successful.
This is optional, but for those of you that are new to the command line, an SFTP client might be an easier way to move files between the HPC environment and your local machine. SFTP stands for Secure File Transfer Protocol and will enable you to drag and drop files as you might in a finder window between your local machine and a remote location, rather than using the command line.
Several free SFTP clients exist, such as FileZilla WinSCP, and Cyberduck, among others.
Conda is an open source package and environment manager that runs on Windows, MacOS and Linux. Conda allows you to install and update software packages as well as organize them efficiently into environments that you can switch between to manage software collections and versions.
We will be using Conda to make sure everyone has the required software to perform the analyses included in the workshop. To start using Conda on Discovery, open your terminal application and start an ssh connection using your username & password:
# Establish the secure shell connection
#### REPLACE 'netID' WITH THE ID ASSOCIATED WITH YOUR DISCOVERY ACCOUNT
ssh netID@discovery7.dartmouth.edu
# Enter your password at the prompt (when you type no characters will show up to preserve privacy)
netID@discovery7.dartmouth.edu password:
# You're in!
(base) [netID@discovery7 ~]$Then run the following command:
source /optnfs/common/miniconda3/etc/profile.d/conda.shWe recommend that you add the above line of code to your .bashrc file in your home directory, otherwise you will need to run this command each time you start a new session on discovery. To do this use the following code:
# navigate to your home directory
cd ~
# open the .bashrc file that is there
nano .bashrcThis will open the existing .bashrc file use the down arrow to move the cursor to the bottom of the file and paste source /optnfs/common/miniconda3/etc/profile.d/conda.sh. Then use the ctrl + x keys to exit the nano text editor, type Y to save the changes you made, and hit return to save the file to the same name (.bashrc).
Now run the following command to create a .conda/ directory in your home drive to store all of your personal conda environments. You only have to run this command once to make this directory, so it does not need to be added to your .bashrc file.
cd ~
mkdir -p .conda/pkgs/cache .conda/envsNow create the conda environment that we will be using for the workshop. This takes about 15 minutes to complete. As you will see, many packages are being installed or updated, all managed for you by conda.
conda env create -f /dartfs-hpc/scratch/rnaseq1/environment.ymlWhen you are ready activate the conda environment, use the following command:
conda activate rnaseq_wYou will see that the activate command has worked when it reads rnaseq_w rather than base to the left of the prompt.
When you are finished using a conda environment, it is good practice to deactivate your session with the following command.
conda deactivateThat's it! This conda environment contains all the software you will need during the workshop. If you run into issues with the setup, please reach out to us at DataAnalyticsCore@groups.dartmouth.edu and someone will be in touch to assist you.
NOTE: Dartmouth's Research Computing team also provides instructions for getting started with Conda on discovery, which you can find here.
We've created a folder on the scratch space for this workshop where everyone can write 'Hello' once they've completed these welcome and setup instructions. To write your own file, use the following code, replacing 'xyz' and 'your_name' with your own. The quotation marks and spaces are important! This will create a record of how many of us have successfully logged in to Discovery and finished the welcome and setup tasks:
echo "Hello from xyz" > /dartfs-hpc/scratch/rnaseq1/welcome/your_name.txtSo, for example, Tim would write:
echo "Hello from Tim" > /dartfs-hpc/scratch/rnaseq1/welcome/tim_sullivan.txt