Skip to content

Commit 264a925

Browse files
authored
Merge branch 'master' into devcontainers-cleanup
2 parents 4ff37d3 + 011340e commit 264a925

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+856956
-34
lines changed

docs/hello_nextflow/01_hello_world.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ process sayHello {
138138
}
139139
```
140140

141-
This a very minimal process definition that just contains an `output` definition and the `script` to execute.
141+
This is a very minimal process definition that just contains an `output` definition and the `script` to execute.
142142

143143
The `output` definition includes the `path` qualifier, which tells Nextflow this should be handled as a path (includes both directory paths and files).
144144
Another common qualifier is `val`.
@@ -172,7 +172,7 @@ workflow {
172172
}
173173
```
174174

175-
This a very minimal **workflow** definition.
175+
This is a very minimal **workflow** definition.
176176
In a real-world pipeline, the workflow typically contains multiple calls to **processes** connected by **channels**, and the processes expect one or more variable **input(s)**.
177177

178178
You'll learn how to add variable inputs later in this training module; and you'll learn how to add more processes and connect them by channels in Part 3 of this course.
@@ -479,7 +479,7 @@ In the process block, make the following code change:
479479

480480
_Before:_
481481

482-
```groovy title="hello-channels.nf" linenums="6"
482+
```groovy title="hello-world.nf" linenums="6"
483483
process sayHello {
484484
485485
publishDir 'results', mode: 'copy'
@@ -490,7 +490,7 @@ process sayHello {
490490

491491
_After:_
492492

493-
```groovy title="hello-channels.nf" linenums="6"
493+
```groovy title="hello-world.nf" linenums="6"
494494
process sayHello {
495495
496496
publishDir 'results', mode: 'copy'

docs/hello_nextflow/02_hello_channels.md

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -522,7 +522,7 @@ Here we added the operator on the next line for readability, but you can add ope
522522

523523
#### 3.2.2. Add `view()` to inspect channel contents
524524

525-
We could run this right away to test if it works, but while we're at it, we're also going to add a couple of [`view()`](https://www.nextflow.io/docs/latest/reference/operator.html#view) directives, which allow us to inspect the contents of a channel.
525+
We could run this right away to test if it works, but while we're at it, we're also going to add a couple of [`view()`](https://www.nextflow.io/docs/latest/reference/operator.html#view) operators, which allow us to inspect the contents of a channel.
526526
You can think of `view()` as a debugging tool, like a `print()` statement in Python, or its equivalent in other languages.
527527

528528
In the workflow block, make the following code change:
@@ -540,12 +540,25 @@ _After:_
540540
```groovy title="hello-channels.nf" linenums="31"
541541
// create a channel for inputs
542542
greeting_ch = Channel.of(greetings_array)
543-
.view { "Before flatten: $it" }
543+
.view { greeting -> "Before flatten: $greeting" }
544544
.flatten()
545-
.view { "After flatten: $it" }
545+
.view { greeting -> "After flatten: $greeting" }
546546
```
547547

548-
Here `$it` is an implicit variable that represents each individual item loaded in a channel.
548+
We are using an operator _closure_ here - the curly brackets.
549+
This code executes for each item in the channel.
550+
We define a temporary variable for the inner value, here called `greeting` (it could be anything).
551+
This variable is only used within the scope of that closure.
552+
553+
In this example, `$greeting` represents each individual item loaded in a channel.
554+
555+
!!! note "Note on `$it`"
556+
557+
In some pipelines you may see a special variable called `$it` used inside operator closures.
558+
This is an _implicit_ variable that allows a short-hand access to the inner variable,
559+
without needing to define it with a `->`.
560+
561+
We prefer to be explicit to aid code clarity, as such the `$it` syntax is discouraged and will slowly be phased out of the Nextflow language.
549562

550563
#### 3.2.3. Run the workflow
551564

@@ -723,9 +736,9 @@ _After:_
723736
```groovy title="hello-channels.nf" linenums="31"
724737
// create a channel for inputs from a CSV file
725738
greeting_ch = Channel.fromPath(params.greeting)
726-
.view { "Before splitCsv: $it" }
739+
.view { csv -> "Before splitCsv: $csv" }
727740
.splitCsv()
728-
.view { "After splitCsv: $it" }
741+
.view { csv -> "After splitCsv: $csv" }
729742
```
730743

731744
As you can see, we also include before/after view statements while we're at it.
@@ -787,7 +800,7 @@ This is what the syntax looks like:
787800

788801
This means 'for each element in the channel, take the first of any items it contains'.
789802

790-
So let's apply that to our CVS parsing.
803+
So let's apply that to our CSV parsing.
791804

792805
#### 4.3.1. Apply `map()` to the channel
793806

@@ -798,21 +811,21 @@ _Before:_
798811
```groovy title="hello-channels.nf" linenums="31"
799812
// create a channel for inputs from a CSV file
800813
greeting_ch = Channel.fromPath(params.greeting)
801-
.view { "Before splitCsv: $it" }
814+
.view { csv -> "Before splitCsv: $csv" }
802815
.splitCsv()
803-
.view { "After splitCsv: $it" }
816+
.view { csv -> "After splitCsv: $csv" }
804817
```
805818

806819
_After:_
807820

808821
```groovy title="hello-channels.nf" linenums="31"
809822
// create a channel for inputs from a CSV file
810823
greeting_ch = Channel.fromPath(params.greeting)
811-
.view { "Before splitCsv: $it" }
824+
.view { csv -> "Before splitCsv: $csv" }
812825
.splitCsv()
813-
.view { "After splitCsv: $it" }
826+
.view { csv -> "After splitCsv: $csv" }
814827
.map { item -> item[0] }
815-
.view { "After map: $it" }
828+
.view { csv -> "After map: $csv" }
816829
```
817830

818831
Once again we include another `view()` call to confirm that the operator does what we expect.

docs/hello_nextflow/03_hello_workflow.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -485,8 +485,8 @@ _After:_
485485
collectGreetings(convertToUpper.out.collect())
486486
487487
// optional view statements
488-
convertToUpper.out.view { "Before collect: $it" }
489-
convertToUpper.out.collect().view { "After collect: $it" }
488+
convertToUpper.out.view { greeting -> "Before collect: $greeting" }
489+
convertToUpper.out.collect().view { greeting -> "After collect: $greeting" }
490490
}
491491
```
492492

@@ -816,11 +816,9 @@ _After:_
816816
collectGreetings(convertToUpper.out.collect(), params.batch)
817817
818818
// emit a message about the size of the batch
819-
collectGreetings.out.count.view { "There were $it greetings in this batch" }
819+
collectGreetings.out.count.view { num_greetings -> "There were $num_greetings greetings in this batch" }
820820
```
821821

822-
Here we are using `$it` in the same way we did earlier, as an implicit variable to access the contents of the channel.
823-
824822
!!! note
825823

826824
There are a few other ways we could achieve a similar result, including some more elegant ones like the `count()` operator, but this allows us to show how to handle multiple outputs, which is what we care about.

docs/hello_nextflow/05_hello_containers.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ You can see that the filesystem inside the container is different from the files
189189
When you run a container, it is isolated from the host system by default.
190190
This means that the container can't access any files on the host system unless you explicitly allow it to do so.
191191

192-
You will learn how to do that in a minute.
192+
You will learn how to do that in a minute.
193193

194194
#### 1.3.2. Run the desired tool command(s)
195195

@@ -434,7 +434,7 @@ _Before:_
434434
collectGreetings(convertToUpper.out.collect(), params.batch)
435435
436436
// emit a message about the size of the batch
437-
collectGreetings.out.count.view{ "There were $it greetings in this batch" }
437+
collectGreetings.out.count.view{ num_greetings -> "There were $num_greetings greetings in this batch" }
438438
```
439439

440440
_After:_
@@ -444,7 +444,7 @@ _After:_
444444
collectGreetings(convertToUpper.out.collect(), params.batch)
445445
446446
// emit a message about the size of the batch
447-
collectGreetings.out.count.view{ "There were $it greetings in this batch" }
447+
collectGreetings.out.count.view{ num_greetings -> "There were $num_greetings greetings in this batch" }
448448
449449
// generate ASCII art of the greetings with cowpy
450450
cowpy(collectGreetings.out.outfile, params.character)

docs/hello_nextflow/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The rise of big data has made it increasingly necessary to be able to analyze an
1212

1313
During this training, you will be introduced to Nextflow in a series of complementary hands-on workshops.
1414

15-
Let's get started! Click on the "Open in GitHub Codespaces" button below.
15+
Let's get started! Click on the "Open in GitHub Codespaces" button below to launch the training environment (preferably in a separate tab), then read on while it loads.
1616

1717
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/nextflow-io/training?quickstart=1&ref=master)
1818

docs/nf4_science/genomics/index.md

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,9 @@ It builds on the [Hello Nextflow](../../hello_nextflow/) beginner training and d
1111

1212
Specifically, this course demonstrates how to implement a simple variant calling pipeline with [GATK](https://gatk.broadinstitute.org/) (Genome Analysis Toolkit), a widely used software package for analyzing high-throughput sequencing data.
1313

14-
!!! note
14+
Let's get started! Click on the "Open in GitHub Codespaces" button below to launch the training environment (preferably in a separate tab), then read on while it loads.
1515

16-
Don't worry if you're not familiar with GATK specifically.
17-
We'll summarize the necessary concepts as we go, and the workflow implementation principles we demonstrate here apply broadly to any command line tool that processes genomics data.
16+
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/nextflow-io/training?quickstart=1&ref=master)
1817

1918
## Learning objectives
2019

@@ -40,9 +39,3 @@ The course assumes some minimal familiarity with the following:
4039
- Foundational Nextflow concepts and tooling covered in the [Hello Nextflow](../../hello_nextflow/) beginner training.
4140

4241
For technical requirements and environment setup, see the [Environment Setup](../../envsetup/) mini-course.
43-
44-
## Get started
45-
46-
To get started, open the training environment by clicking the 'Open in GitHub Codespaces' button below.
47-
48-
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/nextflow-io/training?quickstart=1&ref=master)
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Orientation
2+
3+
The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
4+
However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.
5+
6+
If you have not yet done so, please the [Environment Setup](../../envsetup/) mini-course before going any further.
7+
8+
## Materials provided
9+
10+
Throughout this training course, we'll be working in the `nf4-science/rnaseq/` directory, which you need to move into when you open the training workspace.
11+
This directory contains all the code files, test data and accessory files you will need.
12+
13+
Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the training workspace in the VSCode interface.
14+
Alternatively, you can use the `tree` command.
15+
Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.
16+
17+
Here we generate a table of contents to the second level down:
18+
19+
```bash
20+
tree . -L 3
21+
```
22+
23+
If you run this inside `nf4-science/rnaseq`, you should see the following output:
24+
25+
```console title="Directory contents"
26+
.
27+
├── data
28+
│ ├── genome.fa
29+
│ ├── paired-end.csv
30+
│ ├── reads
31+
│ │ ├── ENCSR000COQ1_1.fastq.gz
32+
│ │ ├── ENCSR000COQ1_2.fastq.gz
33+
│ │ ├── ENCSR000COQ2_1.fastq.gz
34+
│ │ ├── ENCSR000COQ2_2.fastq.gz
35+
│ │ ├── ENCSR000COR1_1.fastq.gz
36+
│ │ ├── ENCSR000COR1_2.fastq.gz
37+
│ │ ├── ENCSR000COR2_1.fastq.gz
38+
│ │ ├── ENCSR000COR2_2.fastq.gz
39+
│ │ ├── ENCSR000CPO1_1.fastq.gz
40+
│ │ ├── ENCSR000CPO1_2.fastq.gz
41+
│ │ ├── ENCSR000CPO2_1.fastq.gz
42+
│ │ └── ENCSR000CPO2_2.fastq.gz
43+
│ └── single-end.csv
44+
├── nextflow.config
45+
├── rnaseq.nf
46+
└── solutions
47+
├── modules
48+
│ ├── fastqc.nf
49+
│ ├── fastqc_pe.nf
50+
│ ├── hisat2_align.nf
51+
│ ├── hisat2_align_pe.nf
52+
│ ├── multiqc.nf
53+
│ ├── trim_galore.nf
54+
│ └── trim_galore_pe.nf
55+
├── rnaseq-2.1.nf
56+
├── rnaseq-2.2.nf
57+
├── rnaseq-2.3.nf
58+
├── rnaseq-3.1.nf
59+
├── rnaseq-3.2.nf
60+
└── rnaseq_pe-3.3.nf
61+
62+
```
63+
64+
!!!note
65+
66+
Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
67+
This is just meant to give you an overview.
68+
69+
**Here's a summary of what you should know to get started:**
70+
71+
- **The `rnaseq.nf` file** is the outline if the workflow script we will work to develop.
72+
73+
- **The file `nextflow.config`** is a configuration file that sets minimal environment properties. You can ignore it for now.
74+
75+
- **The `data` directory** contains input data and related resources:
76+
77+
- _A reference genome_ called `genome.fa` consisting of a small region of the human chromosome 20 (from hg19/b37).
78+
- _RNAseq data_ that has been subset to a small region to keep the file sizes down, in the `reads/` directory.
79+
- _CSV files_ listing the IDs and paths of the example data files, for processing in batches.
80+
81+
- **The `solutions` directory** contains the completed workflow scripts and modules that result from each step of the course.
82+
They are intended to be used as a reference to check your work and troubleshoot any issues.
83+
The number in the filename corresponds to the step of the relevant part of the course.
84+
85+
!!!tip
86+
87+
If for whatever reason you move out of this directory, you can always run this command to return to it:
88+
89+
```bash
90+
cd /workspaces/training/nf4-science/rnaseq
91+
```
92+
93+
Now, to begin the course, click on the arrow in the bottom right corner of this page.

0 commit comments

Comments
 (0)