Skip to content

Commit 37c498d

Browse files
committed
readTags_Stacks accepts gzipped files
1 parent 107ead5 commit 37c498d

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ Mrker4050,2,AGTAGGGAAAGGCCGGCAAGGCAACTAAA,
149149
```
150150

151151
### Stacks catalog
152-
The program `cstacks` from the [Stacks](http://catchenlab.life.illinois.edu/stacks/) software generates three files in the format `batch_X.catalog.tags.tsv`, `batch_X.catalog.snps.tsv`, and `batch_X.catalog.alleles.tsv`. TagDigger can read all three of these files and extract tag sequences. Marker names will be numbers identical to the Catalog IDs in Stacks. There is an option to ignore all non-biallelic markers.
152+
The program `cstacks` from the [Stacks](http://catchenlab.life.illinois.edu/stacks/) software generates three files in the format `batch_X.catalog.tags.tsv`, `batch_X.catalog.snps.tsv`, and `batch_X.catalog.alleles.tsv`. TagDigger can read all three of these files and extract tag sequences. Marker names will be numbers identical to the Catalog IDs in Stacks. There is an option to ignore all non-biallelic markers. If the file name ends with ".gz", TagDigger will assume it is gzipped, and otherwise will assume it is not compressed.
153153

154154
### SAM files from TASSEL-GBSv2
155155
[TASSEL 5](http://www.maizegenetics.net/#!tassel/c17q9) includes as part of its pipeline a [SAM](https://samtools.github.io/hts-specs/SAMv1.pdf) file produced by [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) or [BWA](http://bio-bwa.sourceforge.net/). TagDigger can read tag sequences from this file and generate SNP names in the same format as the TASSEL GBS version 2 pipeline. Since TASSEL can output multiple SNPs from the same tag, TagDigger generates a different set of names for the tags (in the format `chromosome-position-strand_allele`) but can output a CSV file matching the TASSEL SNP names to the TagDigger marker names. If supplying a list of markers to retain, the user should put them in the format of TASSEL SNP names (e.g. `S01_1026`). There is also an option to ignore all non-biallelic markers.

tagdigger_fun.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -622,23 +622,23 @@ def readTags_Stacks(tagsfile, snpsfile, allelesfile, toKeep = None, binaryOnly=F
622622
'''Read tags from the catalog format produced by Stacks.'''
623623
try:
624624
alltags = dict() # keys are locus numbers, values are sequences
625-
with open(tagsfile, mode = 'r') as mycon:
625+
with gzip.open(tagsfile, mode = 'rt') if tagsfile.endswith('.gz') else open(tagsfile, mode = 'r') as mycon:
626626
tr = csv.reader(mycon, delimiter='\t')
627627
for row in tr:
628628
if row[0].startswith("#"):
629629
continue # skip comment line
630630
if toKeep == None or row[2] in toKeep:
631631
alltags[row[2]] = row[9]
632632
alleles = list() # tuples, where first item is locus number and second is haplotype
633-
with open(allelesfile, mode = 'r') as mycon:
633+
with gzip.open(allelesfile, mode = 'rt') if allelesfile.endswith('.gz') else open(allelesfile, mode = 'r') as mycon:
634634
ar = csv.reader(mycon, delimiter='\t')
635635
for row in ar:
636636
if row[0].startswith("#"):
637637
continue
638638
if toKeep == None or row[2] in toKeep:
639639
alleles.append((row[2], row[3]))
640640
positions = dict() # keys are locus numbers, values are lists of variant positions
641-
with open(snpsfile, mode = 'r') as mycon:
641+
with gzip.open(snpsfile, mode = 'rt') if snpsfile.endswith('.gz') else open(snpsfile, mode = 'r') as mycon:
642642
sr = csv.reader(mycon, delimiter='\t')
643643
for row in sr:
644644
if row[0].startswith("#"):

0 commit comments

Comments
 (0)