Snakemake version freezed
Updated the public key for GEXNT Azure storage
- Updated the public key for GEXNT Azure storage, where the bundle reference file is stored.
- New relatedness inference workflow was added
--flow rapid. It requires phased data as input and should process 100K samples in an hour. It cannot reliably distinguish between parent-offspring and full siblings relationships yet, so it should be used only for distant degrees. - New configuration flags
--rapid-error-rate,--rapid-min-snp,--rapid-num-runs,--rapid-num-success,--rapid-seg-lenforrapidwere included in launcher. - Outlier samples can be filtered out in
preprocessworkflow now, using--iqr-alphafor threshold modification. It should be used when there are a few samples with a lot more genotyped or imputed SNPs than the majority of a dataset.
rapidandgermline-kingrelatedness workflows now require phase-preserving preprocessing, which can be initiated by adding a corresponding--flowflag topreprocessworkflow.
- Postprocessing step in relatedness workflow is now capable to process large datasets thanks to
polarspython library.
- All conda environments are now built in Dockerfile and Snakemake doesn't need to create them for every workflow run from
.yamlfiles. - Multiple tests were added, covering all GRAPE flows. Test cases are stored at
grape/test-cases.
- Phased affymetrix chip is now stored within the bundle to speed up the simulation flow, because of this
intersectrule inpedsimsimulation workflow was moved toreferencedownloading workflow.
- Fixed
ibisdetecting empty IBD segments causing pipeline teardown.
- New workflow for simulation of a big relatives dataset (~500k samples) was added. It's available via
simbigcommand of the pipeline launcher. - Support multiple cores for the preprocessing (
preprocess) workflow. - IBD segments weighting feature was added, see
compute-weight-maskworkflow and--weight-maskparameter of the pipeline launcher. - Several options for better control of the samples filtering were added:
--missing-samples,--alt-hom-samples,--het-samples. - Random seed parameter was added for the Ped-sim simulation.
- GRAPE flows were renamed in the pipeline launcnher:
ibis_king->ibis-king,germline->germline-king. readme.mdand the GRAPE scheme were updated and actualized.- Singularity support was removed in favour of conda environments.
- Code refactoring and clean up.
- Fixed
germline-kingsimulation flow. - Fixed
java command not foundduring thereferenceworkflow evaluation.
With --flow ibis_king grape now calculates IBD1 and IBD2 shares from KING data for the 0-3 degrees.
- Fixed a bug with parsing ERSA output for large datasets.
- Fixed a bug with setting every values for some rows in relatives.tsv to 2.
- Fixed
total_seg_lenandtotal_seg_len_ibd2calculation. Nowtotal_seg_lencorresponds to only ibd1 segments.
- Bundle downloading hotfix.
- File verification hotfix for reference workflow.
- Removed singularity from all workflows.
- Many intermediate files are now temporary. This significantly reduces working folder size.
- Fixed removal of duplicate SNPs.
- ERSA-only workflow now correctly detects duplicates or monozygotic twins.
- Dockstore support
- Small and full bundle reference downloading from azure
- ERSA can handle 100k samples.
- Preprocessing saves phasing information in vcf input.
- MAF filter is now consistent across different inputs.