Skip to content

Dartmouth-Data-Analytics-Core/GEO-submission-file-management

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 

Repository files navigation

GEO submission file management

This repository provides utilities for managing files in preparation for a submission of genomics data to the Gene Expression Omnibus (GEO). GEO submissions require you to organize raw and processed data for each data type into dedicated folders prior to uploading data to their server. These utilities streamline the creation of these folders. Currently this is optimized for uploading single-cell data generated on the 10X Genomics platform and have been preprocessed with CellRanger. Functionality for other data types will be added in the future.

Complete instructions that describe the full process of uploading data to GEO are available on the GEO website here.

Instructions

  1. Locate raw and processed data files
  2. Set working directory as folder that contains output from CellRanger runs you wouldnlike to include in your GEO submission.
  3. Clone this repository using git clone. Note that you may be required to eneter a personal access token. Instructions are available to do this here.
  4. Move the script create-file-links.sh to your working directory with:
    • mv GEO-submission-file-management/create-file-links.sh ./
  5. Create folder named geo-sub with mkdir geo-sub
  6. Create subfolders:
    • mkdir -p geo-sub/raw-data
    • mkdir -p geo-sub/processed-data
  7. Edit input arrays create-file-links.sh
    • folders: folders in your current working directory containing the target files of interest. for data analyzed with CellRanger, each folder will generally refer to data from a specific sample.
    • sample: sample names that you would like to be appended to file names.
  8. Run the script with bash create-file-links.sh.
  9. Extract md5checksums from file md5sums.txt and copy to GEO metadata sheet
  10. Add symlinks to processed data files. For single cell data, we generally suggest at minimum:
    • .rds or .adata object
    • .csv file containing cell-level metadata

The md5sums.txt file should be in the below format:

filename	md5sum
sample-1_barcodes.tsv	a1b2c3d4e5f6...
sample-1_features.tsv	f6e5d4c3b2a1...
sample-3_matrix.mtx	1a2b3c4d5e6f...

To Do:

  • add optional argument for feeature barcoding file
  • add template for xenium data
  • add utilities for other data types
    • add functionality for linking FASTQs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages