Skip to content

Commit 7039dfc

Browse files
authored
ok: Merge pull request #2841 from mritchielab/pr-longbench
add LongBench dataset
2 parents 44f7d04 + c1402a6 commit 7039dfc

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed

datasets/longbench.yaml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
Name: LongBench - cross-platform reference dataset profiling cancer cell lines with bulk and single-cell approaches
2+
Description: >
3+
LongBench is a comprehensive benchmark dataset of the latest long-read transcriptomics technologies from Oxford Nanopore (ON) and Pacific Biosciences, alongside a comparison with next-generation sequencing from Illumina. We generated bulk and single-cell libraries from lung cancer cell lines which include different cancer subtypes to capture real biological variation. To further compare and assess sequencing platform performance, Sequins and SIRVs (Set 4) synthetic spike-ins have been included.
4+
Documentation: https://github.com/mritchielab/LongBench.io
5+
6+
ManagedBy: Richie Lab, Walter and Eliza Hall Institute of Medical Research
7+
UpdateFrequency: New data will be added as soon as they are available.
8+
Tags:
9+
- benchmark
10+
- long read sequencing
11+
- single-cell transcriptomics
12+
- short read sequencing
13+
- bioinformatics
14+
- fastq
15+
- bam
16+
- vcf
17+
- cancer
18+
- life sciences
19+
- aws-pds
20+
License: CC BY-4.0
21+
Resources:
22+
- Description: Bulk, single-cell, and single-nucleus RNA-seq data from the LongBench project, covering eight human lung cancer cell lines. Bulk sequencing (FASTQ) was performed on ONT PCR-cDNA, ONT direct RNA (including pod5 files for RNA modification analysis), PacBio Kinnex, and Illumina platforms. Single-cell and single-nucleus sequencing (FASTQ) was performed on ONT PCR-cDNA, PacBio Kinnex, and Illumina platforms. Aligned reads (BAM), variant calls (VCF), and processed gene expression data are also provided, along with reference genome annotations (GTF and FASTA).
23+
ARN: arn:aws:s3:::longbench-data
24+
Region: ap-southeast-2
25+
Type: S3 Bucket
26+
27+
DataAtWork:
28+
Tutorials:
29+
- Title: Benchmarking long-read DE gene and transcript analysis with edgeR
30+
URL: https://mritchielab.github.io/LongBench.io/bulk-de-benchmarking/
31+
AuthorName: Yupei You
32+
33+
ADXCategories:
34+
- Healthcare & Life Sciences Data

0 commit comments

Comments
 (0)