Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Easy access to a small collection of benchmark datasets for methods development.

<p align="center">
<a href="benchmarks.md">Why do we need data benchmarks?</a>&nbsp;&nbsp;&nbsp;
<a href="contributing.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
<a href="CONTRIBUTING.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
</p>

## Contents
Expand All @@ -25,7 +25,8 @@ Instructions for downloading and loading each dataset are in text files in the `
- [Tabula Muris](datasets/tabula_muris.md) - 20 different mouse organs, both full transcript (SmartSeq2) and UMI-based droplet counting (10x Genomics). [Code repo](https://github.com/czbiohub/tabula-muris) | [Vignette repo](https://github.com/czbiohub/tabula-muris-vignettes) | [Interactive website](http://tabula-muris.ds.czbiohub.org/) | [Download instructions](datasets/tabula_muris.md) #mouse #aorta #bladder #brain #diaphragm #fat #heart #kidney #large_intestine #muscle #liver #lung #mammary_gland #marrow #pancreas #skin #spleen #thymus #tongue #rnaseq #smartseq2 #10x #umi #droplet
- [Human cortex development](datasets/ucsc_human_cortex.md) - 4000 SmartSeq2 cells from different locations of the developing human fetus.
- [Conquer](datasets/conquer.md) - [38 datasets](http://imlspenticton.uzh.ch:3838/conquer/) summarized to a count level available as `R` `MultiAssayExperiment` objects.
- [CellBench pilot data set](https://github.com/LuyiTian/CellBench_data/blob/master/cellbench.md) - mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms.
- [CellBench pilot data set](https://github.com/LuyiTian/CellBench_data/blob/master/cellbench.md) - mixture data set from 3 human lung adenocarcinoma cell lines (HCC827, H1975 and H2228) across different platforms. #benchmark #celseq2 #dropseq #10x #cellline
- [HCA preview dataset gene counting matrix](datasets/HCA_preview_scPipe.md) - HCA preview data gene counting matrix. The gene counting matrix was generated from fastq by [scPipe](https://bioconductor.org/packages/release/bioc/html/scPipe.html). #hca #scPipe

### Imaging

Expand Down
21 changes: 21 additions & 0 deletions datasets/HCA_preview_scPipe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# HCA_Previewdata
stores the [preview data](https://github.com/LuyiTian/HCA_Previewdata) and initial analysis script, data was processed by scPipe.

data was downloaded from [HCA data portal](https://preview.data.humancellatlas.org/)

## Metadata

metadata is stored in [HCA data portal](https://preview.data.humancellatlas.org/) and can be downloaded.

## Count files for R

You can find SingleCellExperiment object for the dataset, either the raw data (`ischaemic_sensitivity_raw.RData`) or processed data (`ischaemic_sensitivity_QC_norm.RData`) after quality control and normalization in `rdata` folder

## data exploration analysis

Rmd document can be found in `script` folder.


## CSV and MTX files

You can find gene count matrix in `data/<dataset_name>/gene_count.csv.zip`. Quality control metrics generated by scPipe during data preprocessing can be found in `data/<dataset_name>/stat`