Skip to content

Commit 01222dd

Browse files
authored
Merge pull request #33 from GoekeLab/dev
update master branch with blow5 tutorials
2 parents d9234dd + f323877 commit 01222dd

File tree

10 files changed

+418
-116
lines changed

10 files changed

+418
-116
lines changed

README.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ https://groups.google.com/forum/#!forum/sg-nex-updates/join
3232

3333
## Data Release and Access
3434

35-
**Latest Release (v0.3)**
35+
**Latest Release (v0.4)**
3636

3737
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5574654.svg)](https://doi.org/10.5281/zenodo.5574654)
3838

@@ -43,6 +43,7 @@ This release includes 86 samples from 11 different cell lines.
4343
You can access the following data through the [AWS Open Data Registry](https://registry.opendata.aws/sgnex/):
4444

4545
- raw files (fast5)
46+
- raw files (blow5)
4647
- basecalled files (fastq)
4748
- aligned reads (genome and transcriptome) (bam)
4849
- tracks for visualisation (bigwig and bigbed)
@@ -51,14 +52,24 @@ You can access the following data through the [AWS Open Data Registry](https://r
5152
- annotation files
5253
- detailed sample and experiment information
5354

54-
You can browse the S3 data [here](http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/).
55+
You can browse the S3 data here: 1) [fast5, fastq, and bam](http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/) and 2) [blow5](http://sg-nex-data-blow5.s3-website-ap-southeast-1.amazonaws.com/).
5556

5657
Please refer to the [data access tutorial](docs/AWS_data_access_tutorial.md) which describes the S3 data structure and how to access files with [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/). The direct links to the data are listed in the [sample spreadsheet](docs/samples.tsv).
5758

5859
_**Citation**_: Please cite the pre-print describing the SG-NEx data resource when using these data, and add the following details: "The SG-NEx data was accessed on [DATE] at registry.opendata.aws/sg-nex-data".
5960

6061
Chen, Y. _et al._ "A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines." _bioRxiv_ (2021). doi: https://doi.org/10.1101/2021.04.21.440736
6162

63+
**Release Note**
64+
65+
Version Number: V0.4.0
66+
Date: 2023-03-06
67+
Update of the SG-NEx data on AWS. Includes raw signal data in blow5 format.
68+
69+
Version Number: V0.3.0
70+
Date: 2022-07-28
71+
Initial release of the SG-NEx data on AWS. Includes Nanopore direct RNA, cDNA, direct cDNA-Seq, short read RNA-Seq and m6ACE-Seq.
72+
6273
**Release History**
6374

6475
You can find previous releases here in the [release history](https://github.com/GoekeLab/sg-nex-data/releases)
@@ -89,6 +100,8 @@ The following short tutorials are available that demonstrate how to analyse the
89100

90101
- [Identification of m6A with the SG-NEx samples (using m6Anet)](./docs/SG-NEx_m6Anet_tutorial.md)
91102

103+
- [Basecalling and analysing SG-NEx samples in S/BLOW5 format](./docs/SG-NEx_blow5_tutorial.md)
104+
92105
Additional, more detailed workflows can be found here:
93106

94107
- [Transcript discovery, quantification, and differential transcript expression from long read RNA-Seq data (using Bambu)](https://github.com/GoekeLab/bambu)
@@ -99,16 +112,14 @@ Additional, more detailed workflows can be found here:
99112

100113

101114
## Contributors
102-
103115
**GIS Sequencing Platform and Data Generation**
104116
Hwee Meng Low, Yao Fei, Sarah Ng, Wendy Soon, CC Khor
105117

106118
**Cancer Genomics and RNA Modifications**
107119
Viktoriia Iakovleva, Puay Leng Lee, Lixia Xin, Hui En Vanessa Ng, Jia Min Loo, Xuewen Ong, Hui Qi Amanda Ng, Suk Yeah Polly Poon, Hoang-Dai Tran, Kok Hao Edwin Lim, Huck Hui Ng, Boon Ooi Patrick Tan, Huck-Hui Ng, N.Gopalakrishna Iyer, Wai Leong Tam, Wee Joo Chng, Leilei Chen, Ramanuj DasGupta, Yun Shen Winston Chan, Qiang Yu, Torsten Wüstefeld, Wee Siong Sho Goh
108120

109121
**Statistical Modeling and Data Analytics**
110-
Ying Chen, Nadia M. Davidson, Harshil Patel, Yuk Kei Wan, Min Hao Ling, Yu Song Chuah, Naruemon Pratanwanich, Christopher Hendra, Laura Watten, Chelsea Sawyer, Dominik Stanojevic, Philip Andrew Ewels, Andreas Wilm, Mile Sikic, Alexandre Thiery, Michael I. Love, Alicia Oshlak, Jonathan Göke
111-
122+
Ying Chen, Hasindu Gamaarachchi, Nadia M. Davidson, Harshil Patel, Yuk Kei Wan, Min Hao Ling, Yu Song Chuah, Naruemon Pratanwanich, Christopher Hendra, Laura Watten, Chelsea Sawyer, Dominik Stanojevic, Philip Andrew Ewels, Andreas Wilm, Mile Sikic, Alexandre Thiery, Michael I. Love, Alicia Oshlak, Jonathan Göke
112123
## Citing the SG-NEx project
113124

114125
The SG-NEx resource is described in:

docs/AWS_README

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@ The SG-NEx data set is documented here: https://github.com/GoekeLab/sg-nex-data
1010

1111
The folder structure and data access tutorial is described here: https://github.com/GoekeLab/sg-nex-data/blob/master/docs/AWS_data_access_tutorial.md
1212

13-
The data browser link is here: http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/
13+
The data browser link is here: http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/ and http://sg-nex-data-blow5.s3-website-ap-southeast-1.amazonaws.com/
1414

docs/AWS_RELEASE_NOTE

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
The Singapore Nanopore Expression (SG-NEx) Project: Release updates on AWS S3 open data
22

3+
Version Number: V0.4.0
4+
Date: 2023-03-06
5+
By: Ying Chen, Genome Insitute of Singapore
6+
Update of the SG-NEx data on AWS. Includes raw signal data in blow5 format. Please refer to https://github.com/GoekeLab/sg-nex-data for a detailed documentation.
7+
8+
Data access tutorial: https://github.com/GoekeLab/sg-nex-data/blob/master/docs/AWS_data_access_tutorial.md
9+
Data browser link: http://sg-nex-data-blow5.s3-website-ap-southeast-1.amazonaws.com/
10+
Contact and questions: https://github.com/GoekeLab/sg-nex-data/discussions
11+
312

413
Version Number: V0.3.0
514
Date: 2022-07-28

docs/AWS_data_access_tutorial.md

100644100755
Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,18 @@ SG-NEx data source contains long read (Oxford Nanopore) RNA sequencing data for
44

55
The SG-NEx S3 bucket contains the following types of data:
66

7-
- [Raw sequencing signal (fast5)](#raw-sequencing-signal)
8-
- [Basecalled sequences (fastq)](#basecalled-sequences)
9-
- [Aligned sequences (bam)](#aligned-sequences)
10-
- [Data visualisation tracks (bigwig/bigbed)](#data-visualisation-tracks)
11-
- [Annotations](#annotations)
12-
- [Processed data for RNA modification detection](#processed-data)
13-
- [Sample and experiment information](#sample-and-experimental-data)
7+
- [Raw sequencing signal (fast5)](#raw-sequencing-signal)
8+
- [Basecalled sequences (fastq)](#basecalled-sequences)
9+
- [Aligned sequences (bam)](#aligned-sequences)
10+
- [Data visualisation tracks (bigwig/bigbed)](#data-visualisation-tracks)
11+
- [Annotations](#annotations)
12+
- [Processed data for RNA modification detection](#processed-data)
13+
- [Sample and experiment information](#sample-and-experimental-data)
1414

15-
Below is the folder index for the open data bucket:
15+
The SG-NEx S3 BLOW5 bucket contains the following types of data:
16+
- [Raw sequencing signal (blow5)](#raw-sequencing-signal-in-blow5-format)
17+
18+
Below is the folder index for the open data buckets:
1619

1720
![folder indexing\!](/images/folder_index.png)
1821

@@ -24,6 +27,14 @@ aws s3 ls --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fast5/ # l
2427
aws s3 sync --no-sign-request s3://sg-nex-data/data/sequencing_data_ont/fast5/sample_name . # download fast5 files to your local directory
2528
```
2629

30+
# Raw sequencing signal in BLOW5 format
31+
To access raw sequencing (blow5) files:
32+
33+
```bash
34+
aws s3 ls --no-sign-request s3://sg-nex-data-blow5/ # list samples
35+
aws s3 sync --no-sign-request s3://sg-nex-data-blow5/sample_name . # download blow5 file and the index to your local directory
36+
```
37+
2738
# Basecalled sequences
2839
To access basecalled sequencing (fastq) files:
2940

@@ -90,7 +101,6 @@ aws s3 sync --no-sign-request s3://sg-nex-data/data/annotations/gtf_file . # do
90101

91102
## RNA modification detection
92103
Long read direct RNA sequencing has allows the detection of RNA modification with RNA modification tools, such as [xPore](https://github.com/GoekeLab/xpore) and [m6Anet](https://github.com/GoekeLab/m6anet). To simplify the analysis of RNA modifications using the SG-Nex datasets, you can download the processed files to use with xPore and m6Anet.
93-
94104
To download the processed data for differential RNA modification analysis with xPore:
95105
```bash
96106
aws s3 ls --no-sign-request s3://sg-nex-data/data/processed_data/xpore/ # list all samples that have processed data for RNA modification detection using xPore
@@ -106,7 +116,7 @@ These files are provided for a subset of samples, please see [here](/docs/sample
106116

107117
# Sample and experimental data
108118

109-
Detailed information for each sequencing sample is provided [here](/docs/samples.tsv). The data also includes multiplexed samples which share the same fast5 files. The information about the multiplexed samples can be found [here](/docs/multiplexed_samples.tsv). The files can also be accessed directly on S3:
119+
Detailed information for each sequencing sample is provided [here](/docs/samples.tsv). The data also includes multiplexed samples which share the same fast5/blow5 files. The information about the multiplexed samples can be found [here](/docs/multiplexed_samples.tsv). The files can also be accessed directly on S3:
110120

111121

112122
```bash

docs/SG-NEx_Bambu_tutorial.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Next, we will need to download the required data to run Bambu. The required data
4646
4747
Generally, you may want to learn how to get access to these data using the [data
4848
access
49-
tutorial](https://github.com/GoekeLab/sg-nex-data/blob/updated-documentation/docs/AWS_data_access_tutorial.md). Below we only show the necessary steps to download the required data. The following command requires you to have [AWS CLI](https://aws.amazon.com/cli/) installed.
49+
tutorial](AWS_data_access_tutorial.md). Below we only show the necessary steps to download the required data. The following command requires you to have [AWS CLI](https://aws.amazon.com/cli/) installed.
5050
5151
``` bash
5252
# create a directory to store the data
@@ -68,7 +68,7 @@ aws s3 sync --no-sign-request s3://sg-nex-data/data/data_tutorial/bam ./bambu_tu
6868
You may also download the required data directly from the [SG-NEx AWS S3
6969
bucket](http://sg-nex-data.s3-website-ap-southeast-1.amazonaws.com/) if you are unfamiliar with AWS CLI command. They are stored in the `data/data_tutorial/bam` folder.
7070
71-
**NOTE: We have downsampled the Hg38 genome, A549 and HepG2 samples to ensure this tutorial can be completed in 10 minutes. If you want to run Bambu on the original samples, you can find the sample name [here](https://github.com/GoekeLab/sg-nex-data/blob/updated-documentation/docs/samples.tsv) and amend it into the following code chunk:**
71+
**NOTE: We have downsampled the Hg38 genome, A549 and HepG2 samples to ensure this tutorial can be completed in 10 minutes. If you want to run Bambu on the original samples, you can find the sample name [here](samples.tsv) and amend it into the following code chunk:**
7272
7373
```bash
7474
# Note: Please make sure to replace the "sample_alias" with your sample name

0 commit comments

Comments
 (0)