Skip to content

Commit 7d8084b

Browse files
authored
Updates for v1.0.2 (#10)
* setup initial tests * test: updated testing actions and test structure * feat: added in some tests for utils * feat: added in function for reading in population panel files separately * draft: updated some tests * test: population panel testing * feat: new function to add in population handling * test: added in test for verification of samples * feat: vcf2frequency table now uses cyvcf2 * draft: small commit - still needs some fixing on docs * test: some refactoring and some testing * doc: some additional docstrings * feat: updated to include data in installation * feat: finally got cyvcf2 features working * feat: not supporting windows due to cyvcf2 * feat: added update to make custom lists a little bit easier * doc: cleaned up a lot of the original documentation * fix: initial attempt to address issue #8 * test: updated tests for utilities to be more comprehensive * test: added in a test for gzipped frequency files * ci: updated some of the github CI parameters * Gzip streaming + additional tests (#9) * Dev (#6) * setup initial tests * test: updated testing actions and test structure * feat: added in some tests for utils * feat: added in function for reading in population panel files separately * draft: updated some tests * test: population panel testing * feat: new function to add in population handling * test: added in test for verification of samples * feat: vcf2frequency table now uses cyvcf2 * draft: small commit - still needs some fixing on docs * test: some refactoring and some testing * doc: some additional docstrings * feat: updated to include data in installation * feat: finally got cyvcf2 features working * feat: not supporting windows due to cyvcf2 * feat: added update to make custom lists a little bit easier * doc: cleaned up a lot of the original documentation * fix: initial attempt to address issue #8 * test: updated tests for utilities to be more comprehensive * test: added in a test for gzipped frequency files * ci: updated some of the github CI parameters * fix: updated version in setup.cfg * ci: changes * Revert "ci: changes" This reverts commit 19c22ee. * fix: added in new commit file due to exclusion * ci: updated precommit and flake8 compliance * ci: removed flake8 from ci
1 parent 1703ff0 commit 7d8084b

File tree

13 files changed

+172
-92
lines changed

13 files changed

+172
-92
lines changed

.github/workflows/macos.yml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616

1717
strategy:
1818
matrix:
19-
python-version: [3.6, 3.7, 3.8, 3.9]
19+
python-version: [3.7, 3.9, 3.11]
2020

2121
steps:
2222
- uses: actions/checkout@v2
@@ -36,7 +36,3 @@ jobs:
3636
- name: run tests
3737
run: |
3838
python -m pytest tests/
39-
40-
- name: run flake8
41-
run: |
42-
flake8

.github/workflows/ubuntu.yml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616

1717
strategy:
1818
matrix:
19-
python-version: [3.6, 3.7, 3.8, 3.9]
19+
python-version: [3.7, 3.9, 3.11]
2020

2121
steps:
2222
- uses: actions/checkout@v2
@@ -36,7 +36,3 @@ jobs:
3636
- name: run tests
3737
run: |
3838
python -m pytest tests/
39-
40-
- name: run flake8
41-
run: |
42-
flake8

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@ geovar/data/*.csv
33
docsrc/_build/*
44
.hypothesis/
55
*.egg-info/
6-
*.coverage
6+
*.coverage*
77
*.python-version
88
build/

.pre-commit-config.yaml

Lines changed: 25 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,28 @@
11
# See https://pre-commit.com for more information
22
# See https://pre-commit.com/hooks.html for more hooks
33
repos:
4-
- repo: https://github.com/pre-commit/pre-commit-hooks
5-
rev: v2.4.0
6-
hooks:
7-
- id: trailing-whitespace
8-
- id: end-of-file-fixer
9-
- id: check-yaml
10-
- id: check-added-large-files
11-
args: ['--maxkb=900']
12-
- repo: https://github.com/psf/black
13-
rev: 19.3b0
14-
hooks:
15-
- id: black
16-
- repo: https://github.com/pycqa/pydocstyle
17-
rev: 4.0.0 # pick a git hash / tag to point to
18-
hooks:
19-
- id: pydocstyle
20-
- repo: https://gitlab.com/pycqa/flake8
21-
rev: 3.7.9
22-
hooks:
23-
- id: flake8
4+
- repo: https://github.com/pre-commit/pre-commit-hooks
5+
rev: v2.4.0
6+
hooks:
7+
- id: trailing-whitespace
8+
exclude: '^docs*/'
9+
- id: end-of-file-fixer
10+
exclude: '^docs/'
11+
- id: check-yaml
12+
- id: check-added-large-files
13+
args: ['--maxkb=900']
14+
- repo: https://github.com/psf/black
15+
rev: 22.12.0
16+
hooks:
17+
- id: black
18+
exclude: '^docs*/'
19+
- repo: https://github.com/pycqa/pydocstyle
20+
rev: 4.0.0 # pick a git hash / tag to point to
21+
hooks:
22+
- id: pydocstyle
23+
exclude: '^docs*/'
24+
- repo: https://github.com/pycqa/flake8
25+
rev: 3.7.9
26+
hooks:
27+
- id: flake8
28+
exclude: '^docs*/'

docsrc/stubs/geovar.GeoVar.rst

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,17 @@
55

66
.. autoclass:: GeoVar
77

8-
8+
99
.. automethod:: __init__
1010

11-
11+
1212
.. rubric:: Methods
1313

1414
.. autosummary::
15-
15+
1616
~GeoVar.__init__
1717
~GeoVar.add_freq_mat
1818
~GeoVar.count_geovar_codes
1919
~GeoVar.generate_bins
2020
~GeoVar.geovar_binning
2121
~GeoVar.geovar_codes_streaming
22-
23-
24-
25-
26-
27-

docsrc/stubs/geovar.GeoVarPlot.rst

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@
55

66
.. autoclass:: GeoVarPlot
77

8-
8+
99
.. automethod:: __init__
1010

11-
11+
1212
.. rubric:: Methods
1313

1414
.. autosummary::
15-
15+
1616
~GeoVarPlot.__init__
1717
~GeoVarPlot.add_cmap
1818
~GeoVarPlot.add_data_geovar
@@ -26,9 +26,3 @@
2626
~GeoVarPlot.reorder_pops
2727
~GeoVarPlot.set_colors
2828
~GeoVarPlot.sort_geodist
29-
30-
31-
32-
33-
34-

docsrc/stubs/geovar.utils.rst

Lines changed: 6 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,17 @@
33

44
.. automodule:: geovar.utils
55

6-
7-
8-
96

10-
11-
7+
8+
9+
10+
11+
1212
.. rubric:: Functions
1313

1414
.. autosummary::
15-
15+
1616
read_pop_panel
1717
sep_freq_mat_pops
1818
vcf_to_freq_table
1919
verify_sample_indices
20-
21-
22-
23-
24-
25-
26-
27-
28-
29-
30-
31-
32-

geovar/binning.py

Lines changed: 31 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
import numpy as np
44
import pandas as pd
55
from tqdm import tqdm
6+
from pathlib import Path
7+
import gzip
68
from .utils import sep_freq_mat_pops
79

810

@@ -44,7 +46,7 @@ def add_freq_mat(self, freq_mat_file):
4446
(see example notebook for formatting).
4547
4648
"""
47-
af_df = pd.read_table(freq_mat_file, sep=r"\s")
49+
af_df = pd.read_table(freq_mat_file, sep=r"\s", engine="python")
4850
pops, freq_mat = sep_freq_mat_pops(af_df)
4951
self.pops = pops
5052
self.freq_mat = freq_mat
@@ -58,14 +60,16 @@ def generate_bins(self, bins=[(0, 0), (0, 0.05), (0.05, 1.0)]):
5860
bins (:obj:`list`): list of tuples specifying bins of allele frequency.
5961
6062
"""
61-
assert np.all(np.array(bins) < 1.0)
62-
b = 0.0
63-
new_bins = []
64-
for x in bins:
65-
new_bins.append((b, x))
66-
b = x
67-
new_bins.append((b, 1.0))
68-
self.bins = new_bins
63+
assert np.all(np.array(bins) <= 1.0)
64+
assert np.all(np.array(bins) >= 0.0)
65+
min_val = 1.0
66+
max_val = 0.0
67+
for (start, end) in bins:
68+
min_val = min(min_val, start)
69+
max_val = max(max_val, end)
70+
assert min_val >= 0
71+
assert max_val <= 1
72+
self.bins = bins
6973

7074
def geovar_binning(self):
7175
"""Compute the GeoVar codes for each variant across each population."""
@@ -91,25 +95,30 @@ def geovar_codes_streaming(self, freq_mat_file):
9195
"""Version of GeoVar code generation algorithm that streams through file to avoid memory overflow.
9296
9397
Args:
94-
freq_mat_file (:obj:`string`): filepath to
95-
frequency table file (see example notebook for formatting).
98+
freq_mat_file (:obj:`string`): filepath to a frequency table file (see example notebook for formatting).
9699
97100
"""
98101
assert self.bins is not None
102+
freq_mat_fp = Path(freq_mat_file)
103+
assert freq_mat_fp.is_file()
99104
geovar_codes = []
100105
# Setting up the testing bins
101106
test_bins = np.array([x[1] for x in self.bins])
102-
with open(freq_mat_file, "r") as f:
103-
header = f.readline()
104-
# Take the population labels currently
105-
pops = np.array(header.split()[6:])
106-
self.pops = pops
107-
for line in tqdm(f):
108-
# Split after the 6th column ...
109-
maf_vector = np.array(line.split()[6:]).astype(np.float64)
110-
cur_geovar = np.digitize(maf_vector, test_bins, right=True)
111-
cur_geovar_code = "".join([str(i) for i in cur_geovar])
112-
geovar_codes.append(cur_geovar_code)
107+
if ".gz" in freq_mat_fp.suffixes:
108+
f = gzip.open(freq_mat_fp, "rt")
109+
else:
110+
f = open(freq_mat_fp, "r")
111+
header = f.readline()
112+
# Take the population labels currently
113+
pops = np.array(header.split()[6:])
114+
self.pops = pops
115+
for line in tqdm(f):
116+
# Split after the 6th column ...
117+
maf_vector = np.array(line.split()[6:]).astype(np.float64)
118+
cur_geovar = np.digitize(maf_vector, test_bins, right=True)
119+
cur_geovar_code = "".join([str(i) for i in cur_geovar])
120+
geovar_codes.append(cur_geovar_code)
121+
f.close()
113122
# Setting the variables here
114123
self.geovar_codes = np.array(geovar_codes)
115124
self.n_variants = self.geovar_codes.size
Binary file not shown.

geovar/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def vcf_to_freq_table(vcf_file, pop_df, outfile=None, minor_allele=True, **kwarg
8686
"""
8787
vcf_filepath = Path(vcf_file)
8888
if not vcf_filepath.is_file():
89-
raise ValueError(f"{vcf_file} is not a valid VCF file!")
89+
raise FileNotFoundError(f"{vcf_file} is not a valid VCF file!")
9090
vcf = VCF(vcf_filepath, **kwargs)
9191
unique_pops, pop_idx_dict, pop_dict = verify_sample_indices(pop_df, vcf.samples)
9292
chrom = []

0 commit comments

Comments
 (0)