Skip to content

Commit 8cc9060

Browse files
authored
Expand user/developer docs with guides, quickstart, and overview (#55)
* Revamp documentation with quickstart and guides * Improve module docs and README links
1 parent cf0343d commit 8cc9060

File tree

13 files changed

+400
-21
lines changed

13 files changed

+400
-21
lines changed

README.md

Lines changed: 61 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,67 @@
22

33
<img src="https://user-images.githubusercontent.com/112839/227489878-d253c381-75fd-4e92-b851-2b36df0fc5ed.png" width=100>
44

5-
STATUS: BETA
5+
Pandasaurus supports simple queries over ontology annotations in dataframes, powered by Ubergraph SPARQL queries. It keeps dependencies light while still offering CURIE validation, enrichment utilities, and graph exports for downstream tooling.
66

7-
A python library supporting simple queries over ontology annotations in dataframes, using UberGraph queries.
7+
## Features
88

9-
The aim for now is to keep this as a very simple independent Python lib avoiding any complex dependencies.
9+
- Validate and update seed CURIEs, catching obsoleted terms with replacement suggestions.
10+
- Enrich seed lists via simple, minimal, full, contextual, and ancestor-based strategies.
11+
- Build tabular outputs (`pandas.DataFrame`) and transitive-reduced graphs (`rdflib.Graph`) for visualization.
12+
- Batched SPARQL queries and deterministic tests with built-in mocking examples.
1013

11-
With the basic library in place, the first planned use for this is as a base for a library that provides simple enrichement and querability to AnnData Cell X Gene matrices following the [CZ single cell curation standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md).
14+
## Installation
15+
16+
```bash
17+
pip install pandasaurus
18+
```
19+
20+
or with Poetry:
21+
22+
```bash
23+
poetry add pandasaurus
24+
```
25+
26+
Requires Python 3.9–3.11.
27+
28+
## Quick Example
29+
30+
```python
31+
from pandasaurus.curie_validator import CurieValidator
32+
from pandasaurus.query import Query
33+
34+
seeds = ["CL:0000084", "CL:0000787", "CL:0000636"]
35+
36+
terms = CurieValidator.construct_term_list(seeds)
37+
CurieValidator.get_validation_report(terms) # raises if invalid or obsoleted
38+
39+
query = Query(seeds, force_fail=True)
40+
df = query.simple_enrichment()
41+
print(df.head())
42+
```
43+
44+
See the [Quick Start guide](docs/quickstart.rst) for a step-by-step workflow.
45+
46+
## Documentation
47+
48+
Full documentation (quick start, recipes, developer guide, and API reference) lives under `docs/` and is published from the `gh-pages` branch:
49+
50+
- [Hosted documentation](https://incatools.github.io/PandaSaurus/)
51+
- [Quick Start (source)](docs/quickstart.rst)
52+
- [Guides (source)](docs/guides/index.rst)
53+
- [API reference (source)](docs/pandasaurus/index.rst)
54+
55+
To build docs locally:
56+
57+
```bash
58+
poetry install -E docs
59+
poetry run sphinx-build -b html docs docs/_build/html
60+
```
61+
62+
## Contributing
63+
64+
Pull requests are welcome! See `docs/guides/contributing.rst` for details on environment setup, testing, linting, and the release workflow. Pandasaurus aims to remain a small, focused library; please open an issue before introducing large new features.
65+
66+
## Background
67+
68+
The first planned use case is to provide enrichment/query tooling for AnnData Cell x Gene matrices following the [CZ single cell curation standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md).

docs/guides/contributing.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Contributing & Development
2+
==========================
3+
4+
Environment Setup
5+
-----------------
6+
7+
1. Install Poetry (see https://python-poetry.org/docs/#installation).
8+
2. Clone the repository and install dependencies:
9+
10+
.. code-block:: bash
11+
12+
poetry install
13+
14+
3. Activate the virtualenv:
15+
16+
.. code-block:: bash
17+
18+
poetry shell
19+
20+
Running Tests
21+
-------------
22+
23+
Use pytest with coverage:
24+
25+
.. code-block:: bash
26+
27+
poetry run pytest --cov=pandasaurus --cov-report=term-missing
28+
29+
Network-dependent tests hit Ubergraph. If you need deterministic runs, mock ``run_sparql_query`` as shown in ``test/test_query.py``.
30+
31+
Linting & Formatting
32+
--------------------
33+
34+
Before committing, run:
35+
36+
.. code-block:: bash
37+
38+
poetry run isort pandasaurus test
39+
poetry run black pandasaurus test
40+
poetry run flake8 pandasaurus test
41+
42+
The repository also includes a pre-commit hook (``.githooks/pre-commit``) that executes ``isort`` and ``black`` automatically if you configure ``core.hooksPath``.
43+
44+
Documentation
45+
-------------
46+
47+
Docs live under ``docs/`` (Sphinx). Build them locally with:
48+
49+
.. code-block:: bash
50+
51+
poetry install -E docs
52+
poetry run sphinx-build -b html docs docs/_build/html
53+
54+
CI publishes documentation from ``main`` to the ``gh-pages`` branch via GitHub Actions.
55+
56+
Release Pipeline
57+
----------------
58+
59+
PyPI releases are automated: publishing a GitHub Release triggers the ``publish-pypi`` workflow, which builds the package via Poetry and uploads to PyPI using the ``PYPI_API_TOKEN`` secret.

docs/guides/index.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
User and Developer Guides
2+
=========================
3+
4+
.. toctree::
5+
:maxdepth: 1
6+
:caption: Guides:
7+
8+
../overview
9+
recipes
10+
contributing

docs/guides/recipes.rst

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
Task-Oriented Recipes
2+
=====================
3+
4+
Validate and Update Seeds
5+
-------------------------
6+
7+
1. Construct the term list:
8+
9+
.. code-block:: python
10+
11+
from pandasaurus.curie_validator import CurieValidator
12+
13+
terms = CurieValidator.construct_term_list(seeds)
14+
15+
2. Catch validation errors:
16+
17+
.. code-block:: python
18+
19+
from pandasaurus.utils.pandasaurus_exceptions import InvalidTerm, ObsoletedTerm
20+
21+
try:
22+
CurieValidator.get_validation_report(terms)
23+
except InvalidTerm as err:
24+
print(err)
25+
except ObsoletedTerm as err:
26+
print(err)
27+
28+
3. Replace obsoleted terms programmatically:
29+
30+
.. code-block:: python
31+
32+
query = Query(seeds)
33+
query.update_obsoleted_terms()
34+
35+
Contextual Enrichment
36+
---------------------
37+
38+
Gather all terms that are ``part_of`` a context and enrich them:
39+
40+
.. code-block:: python
41+
42+
q = Query(kidney_terms, force_fail=True)
43+
enriched = q.contextual_slim_enrichment(["UBERON:0000362"]) # renal medulla
44+
45+
Parent-only Enrichment
46+
----------------------
47+
48+
Use :meth:`pandasaurus.query.Query.parent_enrichment` for a one-hop graph:
49+
50+
.. code-block:: python
51+
52+
q = Query(seeds)
53+
parent_df = q.parent_enrichment()
54+
55+
Export to Graph
56+
---------------
57+
58+
After any enrichment call:
59+
60+
.. code-block:: python
61+
62+
graph_df = q.graph_df # pandas DataFrame
63+
rdflib_graph = q.graph
64+
65+
Use :mod:`pandasaurus.graph.graph_generator` to further manipulate the graph or export as needed.

docs/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ pandasaurus's documentation!
1010
:maxdepth: 2
1111
:caption: Contents:
1212

13+
overview
14+
quickstart
15+
guides/index
1316
introduction
1417
pandasaurus/index
1518

docs/introduction.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ Pandasaurus
44
.. image:: https://user-images.githubusercontent.com/112839/227489878-d253c381-75fd-4e92-b851-2b36df0fc5ed.png
55
:width: 100
66

7-
STATUS: BETA
8-
------------
9-
107
A python library supporting simple queries over ontology annotations in dataframes, using UberGraph queries.
118

129
The aim for now is to keep this as a very simple independent Python lib avoiding any complex dependencies.

docs/overview.rst

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
Overview
2+
========
3+
4+
Pandasaurus supports simple queries over ontology annotations in dataframes, powered by Ubergraph SPARQL queries. It keeps dependencies light while still offering CURIE validation, enrichment utilities, and graph exports for downstream tooling.
5+
6+
Features
7+
--------
8+
9+
- Validate and update seed CURIEs, catching obsoleted terms with replacement suggestions.
10+
- Enrich seed lists via simple, minimal, full, contextual, and ancestor-based strategies.
11+
- Build tabular outputs (:class:`pandas.DataFrame`) and transitive-reduced graphs (:class:`rdflib.Graph`) for visualization.
12+
- Batched SPARQL queries and deterministic tests with built-in mocking examples.
13+
14+
Installation
15+
------------
16+
17+
.. code-block:: bash
18+
19+
pip install pandasaurus
20+
21+
or with Poetry:
22+
23+
.. code-block:: bash
24+
25+
poetry add pandasaurus
26+
27+
Requires Python 3.9–3.11.
28+
29+
Quick Example
30+
-------------
31+
32+
.. code-block:: python
33+
34+
from pandasaurus.curie_validator import CurieValidator
35+
from pandasaurus.query import Query
36+
37+
seeds = ["CL:0000084", "CL:0000787", "CL:0000636"]
38+
39+
terms = CurieValidator.construct_term_list(seeds)
40+
CurieValidator.get_validation_report(terms) # raises if invalid or obsoleted
41+
42+
query = Query(seeds, force_fail=True)
43+
df = query.simple_enrichment()
44+
print(df.head())
45+
46+
Continue to :doc:`quickstart` for a full workflow.
47+
48+
.. seealso::
49+
Jump straight into the detailed walkthrough in :doc:`quickstart`.
50+
51+
Documentation Links
52+
-------------------
53+
54+
- :doc:`quickstart`
55+
- :doc:`guides/index`
56+
- :doc:`pandasaurus/index`
57+
58+
Background
59+
----------
60+
61+
The first planned use case is to provide enrichment/query tooling for AnnData Cell x Gene matrices following the `CZ single cell curation standard <https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md>`_.
Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,22 @@
11
Curie Validator
22
==================
33

4-
Documentation
5-
-------------
4+
Overview
5+
--------
6+
7+
``CurieValidator`` validates seed CURIEs and surfaces obsoleted terms in a structured way. Use it before running any enrichment to ensure your data is clean.
8+
9+
Typical usage:
10+
11+
.. code-block:: python
12+
13+
terms = CurieValidator.construct_term_list(seeds)
14+
CurieValidator.get_validation_report(terms) # raises on invalid/obsoleted terms
15+
16+
Class Reference
17+
---------------
618

719
.. currentmodule:: pandasaurus.curie_validator
820

921
.. autoclass:: CurieValidator
10-
:members:
22+
:members:
Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,17 @@
11
Graph Generator
2-
==================
2+
===============
33

4-
Documentation
5-
-------------
4+
``GraphGenerator`` turns enrichment results into rdflib graphs and applies transitive reduction so you can visualize clean hierarchies.
5+
6+
Use it when:
7+
8+
* You need a graph representation of the enriched DataFrame (for plotting or exporting).
9+
* You want to remove redundant edges before exporting to visualization tools.
10+
11+
Class Reference
12+
---------------
613

714
.. currentmodule:: pandasaurus.graph.graph_generator
815

916
.. autoclass:: GraphGenerator
10-
:members:
17+
:members:

docs/pandasaurus/graph/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Graph Module
22
=======================
33

4+
Utilities for turning enrichment results into graphs and reducing redundancy before visualization.
5+
46
.. toctree::
57
:maxdepth: 2
68
:caption: Contents:

0 commit comments

Comments
 (0)