A really fast, simple SNP pre-processor and annotator. Millions of variants per minute.
go get github.com/akotlar/bystro-snp && go install $_;
pigz -d -c in.snp.gz | bystro-snp --minGq .95 | pigz -c - > output 2> log.txtPerforms several important functions:
- Splits multiallelics
- Performs QC on variants: checks whether allele is ACTG, +ACTG, or -Int
- Filters samples based on genotype quality
- Calculates whether site is transition, transversion, or neither
- Processes all available samples
- calculates homozygosity, heterozygosity, missingness
- labels samples as homozygous, heterozygous, or missing
bystro-snp is used to pre-proces SNP files for Bystro (github)
If you use bystro-snp please cite https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1387-3
Millions of variants/rows per minute. Performance is dependent on the # of samples.
go get github.com/akotlar/bystro-snp && go install $_;Via pipe:
pigz -d -c in.snp.gz | bystro-snp --minGq .95 | pigz -c - > out.gzVia inPath argument:
bystro-snp --inPath in.snp --minGq .95 " > outchrom <String> pos <Int> type <String[SNP|DEL|INS|MULTIALLELIC]> ref <String> alt <String> trTv <Int[0|1|2]> heterozygotes <String> heterozygosity <Float64> homozygotes <String> homozygosity <Float64> missingGenos <String> missingness <Float64> sampleMaf <Float64>--minGq <Float>Minimum genotype quality to keep (0 - 1)
--inPath /path/to/uncompressedFile.snpAn input file path, to an uncompressed VCF file. Defaults to stdin
--errPath /path/to/log.txtWhere to store log messages. Defaults to STDERR
--emptyField "!"Which value to assign to missing data. Defaults to !
--fieldDelimiter ";"Which delimiter to use when joining multiple values. Defaults to ;