Skip to content

Commit 6fb56d2

Browse files
authored
Initial implementation to access OrgDb annotation objects (#1)
version 0.0.1
1 parent e4a2f4a commit 6fb56d2

File tree

16 files changed

+1289
-196
lines changed

16 files changed

+1289
-196
lines changed

.github/workflows/run-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ jobs:
3232
platform:
3333
- ubuntu-latest
3434
- macos-latest
35-
- windows-latest
35+
# - windows-latest
3636
runs-on: ${{ matrix.platform }}
3737
name: Python ${{ matrix.python }}, ${{ matrix.platform }}
3838
steps:

CHANGELOG.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Changelog
22

3-
## Version 0.1 (development)
3+
## Version 0.0.1
44

5-
- Feature A added
6-
- FIX: nasty bug #1729 fixed
7-
- add your changes here!
5+
- Initial implementation to access OrgDB objects.
6+
- This also fetches the annotation hub sqlite file and queries for available org sqlite files instead of a static registry used in the txdb package.

README.md

Lines changed: 106 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
[![PyPI-Server](https://img.shields.io/pypi/v/orgdb.svg)](https://pypi.org/project/orgdb/)
2-
![Unit tests](https://github.com/YOUR_ORG_OR_USERNAME/orgdb/actions/workflows/run-tests.yml/badge.svg)
2+
![Unit tests](https://github.com/BiocPy/orgdb/actions/workflows/run-tests.yml/badge.svg)
33

44
# orgdb
55

6-
> Access OrgDB annotations
6+
**OrgDb** provides an interface to access and query **Organism Database (OrgDb)** SQLite files in Python. It mirrors functionality from the R/Bioconductor `AnnotationDbi` package, enabling seamless integration of organism-wide gene annotation into Python workflows.
77

8-
A longer description of your project goes here...
8+
> [!NOTE]
9+
>
10+
> If you are looking to access TxDb databases, check out the [txdb package](https://www.github.com/biocpy/txdb).
911
1012
## Install
1113

@@ -15,6 +17,107 @@ To get started, install the package from [PyPI](https://pypi.org/project/orgdb/)
1517
pip install orgdb
1618
```
1719

20+
## Usage
21+
22+
### Using OrgDbRegistry
23+
24+
The registry download the AnnotationHub's metadata sqlite file and filters for all available OrgDb databases. You can fetch standard organism databases via the registry (backed by AnnotationHub).
25+
26+
```py
27+
from orgdb import OrgDbRegistry
28+
29+
# Initialize registry and list available organisms
30+
registry = OrgDbRegistry()
31+
available = registry.list_orgdb()
32+
print(available[:5])
33+
# ["org.'Caballeronia_concitans'.eg", "org.'Chlorella_vulgaris'_C-169.eg", ...]
34+
35+
# Load the database for Homo sapiens (downloads and caches automatically)
36+
db = registry.load_db("org.Hs.eg.db")
37+
print(db.species)
38+
# 'Homo sapiens'
39+
```
40+
41+
### Inspecting metadata
42+
43+
Explore the available columns and key types in the database.
44+
45+
```py
46+
# List available columns (and keytypes)
47+
cols = db.columns()
48+
print(cols[:5])
49+
# ['ENTREZID', 'PFAM', 'IPI', 'PROSITE', 'ACCNUM']
50+
51+
# Check available keys for a specific keytype
52+
entrez_ids = db.keys("ENTREZID")
53+
print(entrez_ids[:5])
54+
# ['1', '2', '9', '10', '11']
55+
```
56+
57+
### Querying Annotations (using `select`)
58+
59+
The `select` method retrieves data as a `BiocFrame`. It automatically handles complex joins across tables.
60+
61+
```py
62+
# Retrieve Gene Symbols and Gene Names for a list of Entrez IDs
63+
res = db.select(
64+
keys=["1", "10"],
65+
columns=["SYMBOL", "GENENAME"],
66+
keytype="ENTREZID"
67+
)
68+
69+
print(res)
70+
# BiocFrame with 2 rows and 3 columns
71+
GENENAME ENTREZID SYMBOL
72+
<list> <list> <list>
73+
# [0] alpha-1-B glycoprotein 1 A1BG
74+
# [1] N-acetyltransferase 2 10 NAT2
75+
76+
```
77+
78+
> [!NOTE]
79+
>
80+
> If you request "GO" columns, the result will automatically expand to include "EVIDENCE" and "ONTOLOGY" columns, matching Bioconductor behavior.
81+
82+
```py
83+
go_res = db.select(
84+
keys="1",
85+
columns=["GO"],
86+
keytype="ENTREZID"
87+
)
88+
# BiocFrame with 12 rows and 4 columns
89+
ONTOLOGY ENTREZID GO EVIDENCE
90+
<list> <list> <list> <list>
91+
# [0] BP 1 GO:0002764 IBA
92+
# [1] CC 1 GO:0005576 HDA
93+
# [2] CC 1 GO:0005576 IDA
94+
# ... ... ... ...
95+
# [9] CC 1 GO:0070062 HDA
96+
# [10] CC 1 GO:0072562 HDA
97+
# [11] CC 1 GO:1904813 TAS
98+
```
99+
100+
### Accessing Genomic Ranges
101+
102+
Extract gene coordinates as a `GenomicRanges` object (requires the `chromosome_locations` table in the OrgDb database).
103+
104+
```py
105+
gr = db.genes()
106+
print(gr)
107+
# GenomicRanges with 52232 ranges and 1 metadata column
108+
# seqnames ranges strand gene_id
109+
# <str> <IRanges> <ndarray[int8]> <list>
110+
# 1 19 -58345182 - -58336872 * | 1
111+
# 2 12 -9067707 - -9019495 * | 2
112+
# 2 12 -9067707 - -9019185 * | 2
113+
# ... ... ... | ...
114+
# 116804918 11 121024101 - 121191490 * | 116804918
115+
# 117779438 1 20154213 - 20160568 * | 117779438
116+
# 118142757 6 42155405 - 42180056 * | 118142757
117+
# ------
118+
# seqinfo(369 sequences): 1 10 10_GL383545v1_alt ... X_KI270913v1_alt Y Y_KZ208924v1_fix
119+
```
120+
18121
<!-- biocsetup-notes -->
19122

20123
## Note

docs/index.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,14 @@
11
# orgdb
22

3-
Access OrgDB annotations
3+
**OrgDb** provides an interface to access and query **Organism Database (OrgDb)** SQLite files in Python. It mirrors functionality from the R/Bioconductor `AnnotationDbi` package, enabling seamless integration of organism-wide gene annotation into Python workflows.
44

5+
## Install
56

6-
## Note
7-
8-
> This is the main page of your project's [Sphinx] documentation. It is
9-
> formatted in [Markdown]. Add additional pages by creating md-files in
10-
> `docs` or rst-files (formatted in [reStructuredText]) and adding links to
11-
> them in the `Contents` section below.
12-
>
13-
> Please check [Sphinx] and [MyST] for more information
14-
> about how to document your project and how to configure your preferences.
7+
To get started, install the package from [PyPI](https://pypi.org/project/orgdb/)
158

9+
```bash
10+
pip install orgdb
11+
```
1612

1713
## Contents
1814

setup.cfg

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ license = MIT
1212
license_files = LICENSE.txt
1313
long_description = file: README.md
1414
long_description_content_type = text/markdown; charset=UTF-8; variant=GFM
15-
url = https://github.com/pyscaffold/pyscaffold/
15+
url = https://github.com/BiocPy/orgdb
1616
# Add here related links, for example:
1717
project_urls =
18-
Documentation = https://pyscaffold.org/
19-
# Source = https://github.com/pyscaffold/pyscaffold/
18+
Documentation = https://github.com/BiocPy/orgdb
19+
Source = https://github.com/BiocPy/orgdb
2020
# Changelog = https://pyscaffold.org/en/latest/changelog.html
2121
# Tracker = https://github.com/pyscaffold/pyscaffold/issues
2222
# Conda-Forge = https://anaconda.org/conda-forge/pyscaffold
@@ -49,6 +49,9 @@ package_dir =
4949
# For more information, check out https://semver.org/.
5050
install_requires =
5151
importlib-metadata; python_version<"3.8"
52+
genomicranges
53+
biocframe
54+
pybiocfilecache
5255

5356

5457
[options.packages.find]

src/orgdb/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,9 @@
1414
__version__ = "unknown"
1515
finally:
1616
del version, PackageNotFoundError
17+
18+
from .orgdb import OrgDb
19+
from .orgdbregistry import OrgDbRegistry
20+
from .record import OrgDbRecord
21+
22+
__all__ = ["OrgDb", "OrgDbRegistry", "OrgDbRecord"]

src/orgdb/_ahub.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
"""This list of OrgDB resources was generated from AnnotationHub.
2+
3+
Code to generate:
4+
5+
```bash
6+
wget https://annotationhub.bioconductor.org/metadata/annotationhub.sqlite3
7+
sqlite3 annotationhub.sqlite3
8+
```
9+
10+
```sql
11+
SELECT
12+
r.title,
13+
r.rdatadateadded,
14+
lp.location_prefix || rp.rdatapath AS full_rdatapath
15+
FROM resources r
16+
LEFT JOIN location_prefixes lp
17+
ON r.location_prefix_id = lp.id
18+
LEFT JOIN rdatapaths rp
19+
ON rp.resource_id = r.id
20+
WHERE r.title LIKE 'org%.sqlite';
21+
```
22+
23+
Note: we only keep the latest version of these files.
24+
25+
"""
26+
27+
__author__ = "Jayaram Kancherla"
28+
__copyright__ = "Jayaram Kancherla"
29+
__license__ = "MIT"
30+
31+
AHUB_METADATA_URL = "https://annotationhub.bioconductor.org/metadata/annotationhub.sqlite3"

0 commit comments

Comments
 (0)