You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/hello_nextflow/01_hello_world.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,7 @@ process sayHello {
138
138
}
139
139
```
140
140
141
-
This a very minimal process definition that just contains an `output` definition and the `script` to execute.
141
+
This is a very minimal process definition that just contains an `output` definition and the `script` to execute.
142
142
143
143
The `output` definition includes the `path` qualifier, which tells Nextflow this should be handled as a path (includes both directory paths and files).
144
144
Another common qualifier is `val`.
@@ -172,7 +172,7 @@ workflow {
172
172
}
173
173
```
174
174
175
-
This a very minimal **workflow** definition.
175
+
This is a very minimal **workflow** definition.
176
176
In a real-world pipeline, the workflow typically contains multiple calls to **processes** connected by **channels**, and the processes expect one or more variable **input(s)**.
177
177
178
178
You'll learn how to add variable inputs later in this training module; and you'll learn how to add more processes and connect them by channels in Part 3 of this course.
@@ -479,7 +479,7 @@ In the process block, make the following code change:
Copy file name to clipboardExpand all lines: docs/hello_nextflow/02_hello_channels.md
+25-12Lines changed: 25 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -522,7 +522,7 @@ Here we added the operator on the next line for readability, but you can add ope
522
522
523
523
#### 3.2.2. Add `view()` to inspect channel contents
524
524
525
-
We could run this right away to test if it works, but while we're at it, we're also going to add a couple of [`view()`](https://www.nextflow.io/docs/latest/reference/operator.html#view)directives, which allow us to inspect the contents of a channel.
525
+
We could run this right away to test if it works, but while we're at it, we're also going to add a couple of [`view()`](https://www.nextflow.io/docs/latest/reference/operator.html#view)operators, which allow us to inspect the contents of a channel.
526
526
You can think of `view()` as a debugging tool, like a `print()` statement in Python, or its equivalent in other languages.
527
527
528
528
In the workflow block, make the following code change:
@@ -540,12 +540,25 @@ _After:_
540
540
```groovy title="hello-channels.nf" linenums="31"
541
541
// create a channel for inputs
542
542
greeting_ch = Channel.of(greetings_array)
543
-
.view { "Before flatten: $it" }
543
+
.view { greeting -> "Before flatten: $greeting" }
544
544
.flatten()
545
-
.view { "After flatten: $it" }
545
+
.view { greeting -> "After flatten: $greeting" }
546
546
```
547
547
548
-
Here `$it` is an implicit variable that represents each individual item loaded in a channel.
548
+
We are using an operator _closure_ here - the curly brackets.
549
+
This code executes for each item in the channel.
550
+
We define a temporary variable for the inner value, here called `greeting` (it could be anything).
551
+
This variable is only used within the scope of that closure.
552
+
553
+
In this example, `$greeting` represents each individual item loaded in a channel.
554
+
555
+
!!! note "Note on `$it`"
556
+
557
+
In some pipelines you may see a special variable called `$it` used inside operator closures.
558
+
This is an _implicit_ variable that allows a short-hand access to the inner variable,
559
+
without needing to define it with a `->`.
560
+
561
+
We prefer to be explicit to aid code clarity, as such the `$it` syntax is discouraged and will slowly be phased out of the Nextflow language.
549
562
550
563
#### 3.2.3. Run the workflow
551
564
@@ -723,9 +736,9 @@ _After:_
723
736
```groovy title="hello-channels.nf" linenums="31"
724
737
// create a channel for inputs from a CSV file
725
738
greeting_ch = Channel.fromPath(params.greeting)
726
-
.view { "Before splitCsv: $it" }
739
+
.view { csv -> "Before splitCsv: $csv" }
727
740
.splitCsv()
728
-
.view { "After splitCsv: $it" }
741
+
.view { csv -> "After splitCsv: $csv" }
729
742
```
730
743
731
744
As you can see, we also include before/after view statements while we're at it.
@@ -787,7 +800,7 @@ This is what the syntax looks like:
787
800
788
801
This means 'for each element in the channel, take the first of any items it contains'.
789
802
790
-
So let's apply that to our CVS parsing.
803
+
So let's apply that to our CSV parsing.
791
804
792
805
#### 4.3.1. Apply `map()` to the channel
793
806
@@ -798,21 +811,21 @@ _Before:_
798
811
```groovy title="hello-channels.nf" linenums="31"
799
812
// create a channel for inputs from a CSV file
800
813
greeting_ch = Channel.fromPath(params.greeting)
801
-
.view { "Before splitCsv: $it" }
814
+
.view { csv -> "Before splitCsv: $csv" }
802
815
.splitCsv()
803
-
.view { "After splitCsv: $it" }
816
+
.view { csv -> "After splitCsv: $csv" }
804
817
```
805
818
806
819
_After:_
807
820
808
821
```groovy title="hello-channels.nf" linenums="31"
809
822
// create a channel for inputs from a CSV file
810
823
greeting_ch = Channel.fromPath(params.greeting)
811
-
.view { "Before splitCsv: $it" }
824
+
.view { csv -> "Before splitCsv: $csv" }
812
825
.splitCsv()
813
-
.view { "After splitCsv: $it" }
826
+
.view { csv -> "After splitCsv: $csv" }
814
827
.map { item -> item[0] }
815
-
.view { "After map: $it" }
828
+
.view { csv -> "After map: $csv" }
816
829
```
817
830
818
831
Once again we include another `view()` call to confirm that the operator does what we expect.
collectGreetings.out.count.view { "There were $it greetings in this batch" }
819
+
collectGreetings.out.count.view { num_greetings -> "There were $num_greetings greetings in this batch" }
820
820
```
821
821
822
-
Here we are using `$it` in the same way we did earlier, as an implicit variable to access the contents of the channel.
823
-
824
822
!!! note
825
823
826
824
There are a few other ways we could achieve a similar result, including some more elegant ones like the `count()` operator, but this allows us to show how to handle multiple outputs, which is what we care about.
Copy file name to clipboardExpand all lines: docs/hello_nextflow/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ The rise of big data has made it increasingly necessary to be able to analyze an
12
12
13
13
During this training, you will be introduced to Nextflow in a series of complementary hands-on workshops.
14
14
15
-
Let's get started! Click on the "Open in GitHub Codespaces" button below.
15
+
Let's get started! Click on the "Open in GitHub Codespaces" button below to launch the training environment (preferably in a separate tab), then read on while it loads.
16
16
17
17
[](https://codespaces.new/nextflow-io/training?quickstart=1&ref=master)
Copy file name to clipboardExpand all lines: docs/nf4_science/genomics/index.md
+2-9Lines changed: 2 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,10 +11,9 @@ It builds on the [Hello Nextflow](../../hello_nextflow/) beginner training and d
11
11
12
12
Specifically, this course demonstrates how to implement a simple variant calling pipeline with [GATK](https://gatk.broadinstitute.org/) (Genome Analysis Toolkit), a widely used software package for analyzing high-throughput sequencing data.
13
13
14
-
!!! note
14
+
Let's get started! Click on the "Open in GitHub Codespaces" button below to launch the training environment (preferably in a separate tab), then read on while it loads.
15
15
16
-
Don't worry if you're not familiar with GATK specifically.
17
-
We'll summarize the necessary concepts as we go, and the workflow implementation principles we demonstrate here apply broadly to any command line tool that processes genomics data.
16
+
[](https://codespaces.new/nextflow-io/training?quickstart=1&ref=master)
18
17
19
18
## Learning objectives
20
19
@@ -40,9 +39,3 @@ The course assumes some minimal familiarity with the following:
40
39
- Foundational Nextflow concepts and tooling covered in the [Hello Nextflow](../../hello_nextflow/) beginner training.
41
40
42
41
For technical requirements and environment setup, see the [Environment Setup](../../envsetup/) mini-course.
43
-
44
-
## Get started
45
-
46
-
To get started, open the training environment by clicking the 'Open in GitHub Codespaces' button below.
47
-
48
-
[](https://codespaces.new/nextflow-io/training?quickstart=1&ref=master)
The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
4
+
However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.
5
+
6
+
If you have not yet done so, please the [Environment Setup](../../envsetup/) mini-course before going any further.
7
+
8
+
## Materials provided
9
+
10
+
Throughout this training course, we'll be working in the `nf4-science/rnaseq/` directory, which you need to move into when you open the training workspace.
11
+
This directory contains all the code files, test data and accessory files you will need.
12
+
13
+
Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the training workspace in the VSCode interface.
14
+
Alternatively, you can use the `tree` command.
15
+
Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.
16
+
17
+
Here we generate a table of contents to the second level down:
18
+
19
+
```bash
20
+
tree . -L 3
21
+
```
22
+
23
+
If you run this inside `nf4-science/rnaseq`, you should see the following output:
24
+
25
+
```console title="Directory contents"
26
+
.
27
+
├── data
28
+
│ ├── genome.fa
29
+
│ ├── paired-end.csv
30
+
│ ├── reads
31
+
│ │ ├── ENCSR000COQ1_1.fastq.gz
32
+
│ │ ├── ENCSR000COQ1_2.fastq.gz
33
+
│ │ ├── ENCSR000COQ2_1.fastq.gz
34
+
│ │ ├── ENCSR000COQ2_2.fastq.gz
35
+
│ │ ├── ENCSR000COR1_1.fastq.gz
36
+
│ │ ├── ENCSR000COR1_2.fastq.gz
37
+
│ │ ├── ENCSR000COR2_1.fastq.gz
38
+
│ │ ├── ENCSR000COR2_2.fastq.gz
39
+
│ │ ├── ENCSR000CPO1_1.fastq.gz
40
+
│ │ ├── ENCSR000CPO1_2.fastq.gz
41
+
│ │ ├── ENCSR000CPO2_1.fastq.gz
42
+
│ │ └── ENCSR000CPO2_2.fastq.gz
43
+
│ └── single-end.csv
44
+
├── nextflow.config
45
+
├── rnaseq.nf
46
+
└── solutions
47
+
├── modules
48
+
│ ├── fastqc.nf
49
+
│ ├── fastqc_pe.nf
50
+
│ ├── hisat2_align.nf
51
+
│ ├── hisat2_align_pe.nf
52
+
│ ├── multiqc.nf
53
+
│ ├── trim_galore.nf
54
+
│ └── trim_galore_pe.nf
55
+
├── rnaseq-2.1.nf
56
+
├── rnaseq-2.2.nf
57
+
├── rnaseq-2.3.nf
58
+
├── rnaseq-3.1.nf
59
+
├── rnaseq-3.2.nf
60
+
└── rnaseq_pe-3.3.nf
61
+
62
+
```
63
+
64
+
!!!note
65
+
66
+
Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
67
+
This is just meant to give you an overview.
68
+
69
+
**Here's a summary of what you should know to get started:**
70
+
71
+
-**The `rnaseq.nf` file** is the outline if the workflow script we will work to develop.
72
+
73
+
-**The file `nextflow.config`** is a configuration file that sets minimal environment properties. You can ignore it for now.
74
+
75
+
-**The `data` directory** contains input data and related resources:
76
+
77
+
-_A reference genome_ called `genome.fa` consisting of a small region of the human chromosome 20 (from hg19/b37).
78
+
-_RNAseq data_ that has been subset to a small region to keep the file sizes down, in the `reads/` directory.
79
+
-_CSV files_ listing the IDs and paths of the example data files, for processing in batches.
80
+
81
+
-**The `solutions` directory** contains the completed workflow scripts and modules that result from each step of the course.
82
+
They are intended to be used as a reference to check your work and troubleshoot any issues.
83
+
The number in the filename corresponds to the step of the relevant part of the course.
84
+
85
+
!!!tip
86
+
87
+
If for whatever reason you move out of this directory, you can always run this command to return to it:
88
+
89
+
```bash
90
+
cd /workspaces/training/nf4-science/rnaseq
91
+
```
92
+
93
+
Now, to begin the course, click on the arrow in the bottom right corner of this page.
0 commit comments