-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
When "GRCh38 [Ensembl]" is selected in the UI, the backend run_job.sh receives the genome ID Homo_sapiens/Ensembl/GRCh38 .... but ultimately splits this on / to pass --genome GRCh38 to nextflow run. Since nf-core/rnaseq (3.2) defaults to using the NCBI reference, we are actually using the version pulled from s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38 rather then using Ensembl. This is because there is no GRCh38 Ensembl release in the iGenomes collection.
Several possible solutions:
- Change the UI for nf-core/rnaseq to only offer the NCBI human reference (and defaults for other organisms)
- Use the local
Homo_sapiens/Ensembl/GRCh38reference when "GRCh38 [Ensembl]" is selected - Add configuration (in ComputeResource?) to use the local copy of iGenomes when available (via https://nf-co.re/usage/troubleshooting#using-a-local-version-of-igenomes ), and as above use non-iGenomes references when there is no iGenomes equivalent.
- Avoid using iGenomes short IDs altogether (GRCh38), instead always map
Homo_sapiens/Ensembl/GRCh38to the locally cached reference (or pull from Ensembl on demand if missing, as per rnasik). - Add additional genomes not on iGenomes to an additional nextflow config: https://nf-co.re/usage/reference_genomes#adding-paths-to-a-config-file
- Migrate to using Refgenie for all the references: http://refgenie.databio.org/en/latest/overview/