|
10 | 10 |
|
11 | 11 | ## Samplesheet input |
12 | 12 |
|
13 | | -You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. |
| 13 | +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. |
14 | 14 |
|
15 | 15 | ```bash |
16 | 16 | --input '[path to samplesheet file]' |
17 | 17 | ``` |
18 | 18 |
|
19 | | -### Multiple runs of the same sample |
| 19 | +### Full samplesheet |
20 | 20 |
|
21 | | -The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: |
| 21 | +The following simple run dir structure... |
22 | 22 |
|
23 | | -```csv title="samplesheet.csv" |
24 | | -sample,fastq_1,fastq_2 |
25 | | -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz |
26 | | -CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz |
27 | | -CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz |
| 23 | +``` |
| 24 | +run_dir |
| 25 | +├── sample1_lane1_group1_r1.fq.gz |
| 26 | +├── sample2_lane1_group1_r1.fq.gz |
| 27 | +├── sample3_lane2_group2_r1.fq.gz |
| 28 | +└── sample4_lane2_group3_r1.fq.gz |
28 | 29 | ``` |
29 | 30 |
|
30 | | -### Full samplesheet |
31 | | - |
32 | | -The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below. |
33 | | - |
34 | | -A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. |
| 31 | +...would be represented in the following samplesheet (shown as .tsv for readability) |
35 | 32 |
|
36 | 33 | ```csv title="samplesheet.csv" |
37 | | -sample,fastq_1,fastq_2 |
38 | | -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz |
39 | | -CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz |
40 | | -CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz |
41 | | -TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz, |
42 | | -TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz, |
43 | | -TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz, |
44 | | -TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz, |
| 34 | +sample lane group fastq_1 fastq_2 rundir |
| 35 | +sample1 1 group1 path/to/run_dir/sample1_lane1_group1_r1.fq.gz path/to/run_dir |
| 36 | +sample2 1 group1 path/to/run_dir/sample2_lane1_group1_r1.fq.gz path/to/run_dir |
| 37 | +sample3 2 group2 path/to/run_dir/sample3_lane2_group2_r1.fq.gz path/to/run_dir |
| 38 | +sample4 2 group3 path/to/run_dir/sample4_lane2_group3_r1.fq.gz path/to/run_dir |
| 39 | +
|
45 | 40 | ``` |
46 | 41 |
|
47 | 42 | | Column | Description | |
48 | 43 | | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
49 | 44 | | `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | |
| 45 | +| `lane` | Lane where the sample was processed on an Illumina instrument (optional). | |
| 46 | +| `group` | Group the sample belongs too, useful when several groups are pooled together (optional). | |
50 | 47 | | `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | |
51 | | -| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | |
| 48 | +| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz" (optional). | |
| 49 | +| `rundir` | Path to the runfolder containing extra information about the sequencing run (optional) . | |
52 | 50 |
|
53 | | -An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. |
| 51 | +Another [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. |
54 | 52 |
|
55 | 53 | ## Running the pipeline |
56 | 54 |
|
|
0 commit comments