Skip to content

Commit 073dab7

Browse files
committed
ENH update 06_Compare_with_other_datasets
1 parent a3a704c commit 073dab7

File tree

2 files changed

+30
-1
lines changed

2 files changed

+30
-1
lines changed

General_Scripts/06_Compare_with_other_datasets/Readme.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22

33
| **Code** | **Description** |
44
| :---: | :---: |
5-
| 01_download.sh 02_filter_sp_dedup.py | Download archaeal and bacterial proteins from Refseq, filter sequences (<100aa) and remove redundancy |
5+
| 01_download.sh | Download archaeal and bacterial proteins from Refseq |
6+
| 02_filter_sp_dedup.py | Filter sequences (<100aa) and remove redundancy |
67
| 03_align.sh | Use Diamond to align sequences to GMSC |
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
def fasta_iter(fname, full_header=False):
2+
header = None
3+
chunks = []
4+
if fname.endswith('.gz'):
5+
import gzip
6+
op = gzip.open
7+
elif fname.endswith('.xz'):
8+
import lzma
9+
op = lzma.open
10+
else:
11+
op = open
12+
with op(fname, 'rt') as f:
13+
for line in f:
14+
if line[0] == '>':
15+
if header is not None:
16+
yield header,''.join(chunks)
17+
line = line[1:].strip()
18+
if not line:
19+
header = ''
20+
elif full_header:
21+
header = line.strip()
22+
else:
23+
header = line.split()[0]
24+
chunks = []
25+
else:
26+
chunks.append(line.strip())
27+
if header is not None:
28+
yield header, ''.join(chunks)

0 commit comments

Comments
 (0)