This repository provides utilities for managing files in preparation for a submission of genomics data to the Gene Expression Omnibus (GEO). GEO submissions require you to organize raw and processed data for each data type into dedicated folders prior to uploading data to their server. These utilities streamline the creation of these folders. Currently this is optimized for uploading single-cell data generated on the 10X Genomics platform and have been preprocessed with CellRanger. Functionality for other data types will be added in the future.
Complete instructions that describe the full process of uploading data to GEO are available on the GEO website here.
- Locate raw and processed data files
- Set working directory as folder that contains output from CellRanger runs you wouldnlike to include in your GEO submission.
- Clone this repository using
git clone. Note that you may be required to eneter a personal access token. Instructions are available to do this here. - Move the script
create-file-links.shto your working directory with:mv GEO-submission-file-management/create-file-links.sh ./
- Create folder named
geo-subwithmkdir geo-sub - Create subfolders:
mkdir -p geo-sub/raw-datamkdir -p geo-sub/processed-data
- Edit input arrays
create-file-links.shfolders: folders in your current working directory containing the target files of interest. for data analyzed with CellRanger, each folder will generally refer to data from a specific sample.sample: sample names that you would like to be appended to file names.
- Run the script with
bash create-file-links.sh. - Extract md5checksums from file
md5sums.txtand copy to GEO metadata sheet - Add symlinks to processed data files. For single cell data, we generally suggest at minimum:
.rdsor.adataobject.csvfile containing cell-level metadata
The md5sums.txt file should be in the below format:
filename md5sum
sample-1_barcodes.tsv a1b2c3d4e5f6...
sample-1_features.tsv f6e5d4c3b2a1...
sample-3_matrix.mtx 1a2b3c4d5e6f...
To Do:
- add optional argument for feeature barcoding file
- add template for xenium data
- add utilities for other data types
-
- add functionality for linking FASTQs