Skip to content

Commit 52695f4

Browse files
committed
Merge branch 'develop' into hashtable
2 parents 67d3287 + b744683 commit 52695f4

File tree

10 files changed

+120
-86
lines changed

10 files changed

+120
-86
lines changed

.github/workflows/python-publish.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ jobs:
2020
id-token: write
2121

2222
steps:
23-
- uses: actions/checkout@v4
23+
- uses: actions/checkout@v5
2424
- name: Set up Python
25-
uses: actions/setup-python@v5
25+
uses: actions/setup-python@v6
2626
with:
2727
python-version: '3.x'
2828
- name: Install dependencies
@@ -32,4 +32,4 @@ jobs:
3232
- name: Build package
3333
run: python -m build
3434
- name: Publish package
35-
uses: pypa/gh-action-pypi-publish@67339c736fd9354cd4f8cb0b744f2b82a74b5c70
35+
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e

.github/workflows/test_run.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ jobs:
1111
python-version: ["3.8", "3.9", "3.10", "3.11"]
1212

1313
steps:
14-
- uses: actions/checkout@v4
14+
- uses: actions/checkout@v5
1515
- name: Set up Python ${{ matrix.python-version }}
16-
uses: actions/setup-python@v5
16+
uses: actions/setup-python@v6
1717
with:
1818
python-version: ${{ matrix.python-version }}
1919
- name: Install dependencies

CHANGELOG.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
2+
# Change log
3+
4+
## Version 2.1.2
5+
6+
### Fixed
7+
8+
- An important bugfix to the search function producing invalid results in 2.1.1: #57
9+
- Fixed incompatibility with python 13 (#53)
10+
- Fixed a crash when empty fasta if provided (#58)
11+
12+
13+
### Changed
14+
15+
- Updated dependencies to Github actions
16+
17+
## Version 2.1.1
18+
19+
- Performance improvements to the mkdb command with orthoxml input
20+
- Added a check for non-unique protein IDs in the input fasta files. Now it gives a more informative error message
21+
- fixed #49
22+
23+
## Version 2.1.0
24+
- Significant improvements to classification speed
25+
26+
## Version 2.0.4
27+
- Fixes issue #34 (numpy2 incompatibility)
28+
- Experimental support to build omamer databases from orthoxml/fasta files
29+
- Updated github action to latest versions
30+
31+
## Version 2.0.3
32+
- Fixes issue #30
33+
- Update github action to latest versions
34+
35+
## Version 2.0.2
36+
- changed method for hiding taxa in build process. Now takes a file containing taxa to hide on separate lines.
37+
- checks and improved feedback for root taxon and requested taxa to hide.
38+
- root taxon set by default to the root level in speciestree.nwk (previously hard-coded to default to LUCA)
39+
40+
## Version 2.0.1
41+
- remove dependency for filehash library
42+
- return better error message if build dependencies are not met, but trying to building an omamer database
43+
- minor fixes
44+
45+
## Version 2.0.0
46+
- Major update of database format and search code to improve overall memory useage. Most standard runs with LUCA-level database will run on a machine with 16GB RAM.
47+
- Update to the scoring algorithm for root-level HOG / family assignments, to allow for significance testing. This estimates a binomial distribution for each family, so that we can compute the probability of matching at least as many k-mers as we have observed by chance, for each family that has a match to a given query.
48+
- UX improvements - more feedback during interactive search runs, whilst maintaining small log files.
49+
50+
## Version 0.2.5
51+
- Fixes an issue when storing the pre-conputed statistics
52+
53+
## Version 0.2.4
54+
- Improved loading time for standard search by pre-computing statistics
55+
- Adding new command line option "info" to show the metadata of the
56+
dataset used to build the omamer database.
57+
58+
59+
## Version 0.2.2
60+
- Automated deployment to PyPI
61+
- Removed PyHAM dependency
62+
63+
## Version 0.2.0
64+
- Added ``--min_fam_completeness``, ``--logic``, ``--score`` and ``--reference_taxon`` options
65+
- New output format
66+
- Debugging
67+
68+
## Version 0.1.2 - 0.1.3
69+
- Debugging
70+
71+
## Version 0.1.0
72+
- Added hidden_taxa and threshold arguments
73+
74+
## Version 0.0.1
75+
- Initial release

README.md

Lines changed: 0 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -128,58 +128,6 @@ Required arguments: ``--db``, ``--oma_path``
128128
| [``--log_level``](#markdown-header--log_level)|info|Logging level
129129

130130

131-
# Change log
132-
133-
#### Version 2.0.4
134-
- fixes issue #34 (numpy2 incompatibility)
135-
- experimental support to build omamer databases from orthoxml/fasta files
136-
- update github action to latest versions
137-
138-
#### Version 2.0.3
139-
- fixes issue #30
140-
- update github action to latest versions
141-
142-
#### Version 2.0.2
143-
- changed method for hiding taxa in build process. Now takes a file containing taxa to hide on separate lines.
144-
- checks and improved feedback for root taxon and requested taxa to hide.
145-
- root taxon set by default to the root level in speciestree.nwk (previously hard-coded to default to LUCA)
146-
147-
#### Version 2.0.1
148-
- remove dependency for filehash library
149-
- return better error message if build dependencies are not met, but trying to building an omamer database
150-
- minor fixes
151-
152-
#### Version 2.0.0
153-
- Major update of database format and search code to improve overall memory useage. Most standard runs with LUCA-level database will run on a machine with 16GB RAM.
154-
- Update to the scoring algorithm for root-level HOG / family assignments, to allow for significance testing. This estimates a binomial distribution for each family, so that we can compute the probability of matching at least as many k-mers as we have observed by chance, for each family that has a match to a given query.
155-
- UX improvements - more feedback during interactive search runs, whilst maintaining small log files.
156-
157-
#### Version 0.2.5
158-
- Fixes an issue when storing the pre-conputed statistics
159-
160-
#### Version 0.2.4
161-
- Improved loading time for standard search by pre-computing statistics
162-
- Adding new command line option "info" to show the metadata of the
163-
dataset used to build the omamer database.
164-
165-
166-
#### Version 0.2.2
167-
- Automated deployment to PyPI
168-
- Removed PyHAM dependency
169-
170-
#### Version 0.2.0
171-
- Added ``--min_fam_completeness``, ``--logic``, ``--score`` and ``--reference_taxon`` options
172-
- New output format
173-
- Debugging
174-
175-
#### Version 0.1.2 - 0.1.3
176-
- Debugging
177-
178-
#### Version 0.1.0
179-
- Added hidden_taxa and threshold arguments
180-
181-
#### Version 0.0.1
182-
- Initial release
183131

184132
# License
185133
OMAmer is a free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

omamer/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
from datetime import date
2525

2626
__packagename__ = "omamer"
27-
__version__ = "2.1.1"
27+
__version__ = "2.1.2"
2828
__copyright__ = "(C) 2019-{:d} Victor Rossier <victor.rossier@unil.ch> and Alex Warwick Vesztrocy <alex@warwickvesztrocy.co.uk> and Nikolai Romashchenko <nikolai.romashchenko@unil.ch>".format(
2929
date.today().year
3030
)

omamer/_runners.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
You should have received a copy of the GNU Lesser General Public License
2222
along with OMAmer. If not, see <http://www.gnu.org/licenses/>.
2323
"""
24+
import os
2425
from ._utils import LOG, check_file_exists
2526

2627

@@ -143,7 +144,7 @@ def mkdb_oma(args):
143144

144145
def search(args):
145146
from alive_progress import alive_bar
146-
from ._utils import print_message, print_line
147+
from ._utils import print_message
147148
import sys
148149

149150
if args.out is None:
@@ -175,6 +176,7 @@ def search(args):
175176
bar.text(" [DONE]")
176177

177178
print_run_data(args)
179+
check_args(args)
178180

179181
t0 = time()
180182

@@ -244,7 +246,7 @@ def search(args):
244246
# write the top header
245247
print("!omamer-version: {}".format(__version__), file=args.out)
246248
print(
247-
"!query-md5: {}".format(compute_file_md5(args.query.name)),
249+
"!query-md5: {}".format(compute_file_md5(args.query)),
248250
file=args.out,
249251
)
250252
print(
@@ -372,7 +374,7 @@ def print_run_data(args):
372374
print_line(80)
373375
print_message("\nRunning OMAmer on {}, using:".format(platform.node()))
374376
print_message(" - database: {}".format(args.db))
375-
print_message(" - query: {}".format(args.query.name))
377+
print_message(" - query: {}".format(args.query))
376378
print_message(" - version: {}".format(__version__))
377379
print_message("")
378380
print_line(80)
@@ -410,3 +412,12 @@ def goodbye(args, time_taken, search_rate):
410412
)
411413
print_message("")
412414
print_line(80)
415+
416+
417+
def check_args(args):
418+
# Enforce query existence check before loading DB
419+
with open(args.query, "r") as _:
420+
pass
421+
422+
if os.path.getsize(args.query) == 0:
423+
raise RuntimeError(f"Input file {args.query} is empty")

omamer/database.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -501,13 +501,13 @@ def _get_child_prots(hogs, hog2protoffs, child_prots_off):
501501
# TODO: check what else would break. this could be used if someone wanted to build a
502502
# database for flat OGs.
503503
LOG.warning("No nesting structure in HOGs defined in OrthoXML.")
504-
else:
505-
self.db.create_carray(
506-
"/",
507-
"ChildrenHOG",
508-
obj=np.array(child_hogs, dtype=np.uint32),
509-
filters=self._compr,
510-
)
504+
child_hogs = [0] # adding sentinel in case no nested HOGs are defined.
505+
self.db.create_carray(
506+
"/",
507+
"ChildrenHOG",
508+
obj=np.array(child_hogs, dtype=np.uint32),
509+
filters=self._compr,
510+
)
511511
self.db.create_carray(
512512
"/",
513513
"ChildrenProt",

omamer/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ def get_thread_count():
193193
"--query",
194194
required=True,
195195
help="Path to FASTA formatted sequences",
196-
type=FileType("r"),
196+
type=str,
197197
)
198198

199199
search_parser.add_argument(

omamer/sequence_reader.py

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -23,21 +23,21 @@
2323
"""
2424
from Bio import SeqIO
2525

26-
2726
class SequenceReader(object):
2827
@staticmethod
29-
def read(fp, k, format="fasta", chunksize=None, sanitiser=None):
30-
ids = []
31-
seqs = []
32-
for rec in filter(lambda x: (len(x.seq) >= k), SeqIO.parse(fp, format)):
33-
ids.append(rec.id)
34-
s = str(rec.seq).upper()
35-
seqs.append(sanitiser(s) if sanitiser is not None else s)
36-
37-
if chunksize is not None and len(ids) == chunksize:
38-
yield (ids, seqs)
39-
ids = []
40-
seqs = []
41-
42-
if len(ids) > 0:
43-
yield (ids, seqs)
28+
def read(filename, k, format="fasta", chunksize=None, sanitiser=None):
29+
with open(filename, "r") as fp:
30+
ids = []
31+
seqs = []
32+
for rec in filter(lambda x: (len(x.seq) >= k), SeqIO.parse(fp, format)):
33+
ids.append(rec.id)
34+
s = str(rec.seq).upper()
35+
seqs.append(sanitiser(s) if sanitiser is not None else s)
36+
37+
if chunksize is not None and len(ids) == chunksize:
38+
yield ids, seqs
39+
ids = []
40+
seqs = []
41+
42+
if len(ids) > 0:
43+
yield ids, seqs

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 2.1.0
2+
current_version = 2.1.2
33
commit = True
44
tag = False
55

0 commit comments

Comments
 (0)