You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Restore pfamsearch functionality and modernize API integration v0.19.0
Major Features:
- Fully restored pfamsearch database creation (-d flag) using InterPro API
- Implement individual HMM downloads via ?annotation=hmm parameter
- Add robust retry logic with exponential backoff and timeout handling
- All tests passing (60/60 across 8 test files)
API Modernization:
- Migrate from defunct Pfam API to EBI Search + InterPro APIs
- Add JSON::Tiny dependency for efficient API response parsing
- Update pfam2go URLs to current Gene Ontology location
- Switch from HTTP to HTTPS for secure connections
Improvements:
- Much faster than downloading full 331MB Pfam database
- Better error handling and user feedback with progress indicators
- Updated Docker usage examples with best practices (--rm, -w flags)
- Enhanced README with complete workflow examples
- Updated documentation and version to 0.19.0
Bug Fixes:
- Fixed test expectations to match new API behavior (4 vs 103 HMMs)
- Resolved SSL certificate issues with legacy URLs
- Updated version numbers across all modules
Copy file name to clipboardExpand all lines: README.md
+31-7Lines changed: 31 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,29 +9,43 @@ Build Status|Version
9
9
10
10
### What is HMMER2GO?
11
11
12
-
HMMER2GO is a command line application to map DNA sequences, typically transcripts, to [Gene Ontology](http://geneontology.org/) based on the similarity of the query sequences to curated HMM models for protein families represented in [Pfam](http://pfam.xfam.org/).
12
+
HMMER2GO is a command line application to map DNA sequences, typically transcripts, to [Gene Ontology](http://geneontology.org/) based on the similarity of the query sequences to curated HMM models for protein families represented in Pfam (now available through [InterPro](https://www.ebi.ac.uk/interpro/)).
13
13
14
14
These GO term mappings allow you to make inferences about the function of the gene products, or changes in function in the case of expression studies. The GAF mapping file that is produced can be used with Ontologizer or other tools, to visualize a graph of the term relationships along with their signifcance values.
15
15
16
16
**INSTALLATION**
17
17
18
-
It is recommended to use [Docker](https://www.docker.com), as shown below:
18
+
It is recommended to use [Docker](https://www.docker.com) for easy installation and usage. Here are examples of running HMMER2GO commands with Docker:
19
19
20
-
docker run -it --name hmmer2go-con -v $(pwd)/db:/db:Z sestaton/hmmer2go
That will create a container called "hmmer2go-con" and start an interactive shell. The above assumes you have a directory called db in the working directory that contains your database files (Pfam HMM file that is formatted), and the input sequences. To run the full analysis, change to the mounted directory with cd db in your container and run the commands shown below.
32
+
The `--rm` flag automatically removes the container after execution. The `-v $(pwd):/data` mounts your current directory to `/data` inside the container, and `-w /data` sets the working directory so HMMER2GO can access your local files with their simple filenames.
23
33
24
-
Alternatively, you can follow the steps in the [INSTALL](https://github.com/sestaton/HMMER2GO/blob/master/INSTALL.md) file and install HMMER2GO on any Mac or Linux, and likely Windows (though I have not tested yet, advice is welcome).
34
+
**Alternative Installation**
35
+
36
+
You can also follow the steps in the [INSTALL](https://github.com/sestaton/HMMER2GO/blob/master/INSTALL.md) file to install HMMER2GO directly on Mac or Linux systems.
25
37
26
38
Please see the wiki [Demonstration](https://github.com/sestaton/HMMER2GO/wiki/Demonstraton) page for full working example and demo script that will download and run HMMER2GO. This page also contains a brief description of how to begin analyzing the results.
27
39
28
40
**BRIEF USAGE**
29
41
42
+
### Full Workflow Example
43
+
30
44
Starting with a file of DNA sequences, we first want to get the longest open reading frame (ORF) for each gene and translate those sequences.
31
45
32
46
hmmer2go getorf -i genes.fasta -o genes_orfs.faa
33
47
34
-
Next, we search our ORFs for coding domains.
48
+
Next, we search our ORFs for coding domains against the full Pfam database.
35
49
36
50
hmmer2go run -i genes_orfs.faa -d Pfam-A.hmm -o genes_orf_Pfam-A.tblout
37
51
@@ -43,6 +57,16 @@ If we want to perform a statistical analysis on the GO mappings, it may be neces
# Use the custom database for faster, targeted searches
68
+
hmmer2go run -i genes_orfs.faa -d mads+mads-box_hmms/mads+mads-box.hmm -o genes_orf_mads.tblout
69
+
46
70
For a full explanation of these commands, see the [HMMER2GO wiki](https://github.com/sestaton/HMMER2GO/wiki). In particular, see the [tutorial](https://github.com/sestaton/HMMER2GO/wiki/Tutorial) page for a walk-through of all the commands. There is also an example script on the [demonstration](https://github.com/sestaton/HMMER2GO/wiki/Demonstraton) page to fetch data for _Arabidopsis thaliana_ and run the full analysis.
47
71
48
72
**DOCUMENTATION**
@@ -63,7 +87,7 @@ Report any issues at the HMMER2GO issue tracker: https://github.com/sestaton/HMM
63
87
64
88
**LICENSE AND COPYRIGHT**
65
89
66
-
Copyright (C) 2014-2022 S. Evan Staton
90
+
Copyright (C) 2014-2025 S. Evan Staton
67
91
68
92
This program is distributed under the MIT (X11) License, which should be distributed with the package.
69
93
If not, it can be found here: http://www.opensource.org/licenses/mit-license.php
0 commit comments