Expand user/developer docs with guides, quickstart, and overview (#55)

ubyndr · web-flow · commit 8cc9060020aa · 2025-12-02T13:26:16.000Z
* Revamp documentation with quickstart and guides

* Improve module docs and README links
diff --git a/README.md b/README.md
@@ -2,10 +2,67 @@
 
 <img src="https://user-images.githubusercontent.com/112839/227489878-d253c381-75fd-4e92-b851-2b36df0fc5ed.png" width=100>
 
-STATUS: BETA
+Pandasaurus supports simple queries over ontology annotations in dataframes, powered by Ubergraph SPARQL queries. It keeps dependencies light while still offering CURIE validation, enrichment utilities, and graph exports for downstream tooling.
 
-A python library supporting simple queries over ontology annotations in dataframes, using UberGraph queries.
+## Features
 
-The aim for now is to keep this as a very simple independent Python lib avoiding any complex dependencies.
+- Validate and update seed CURIEs, catching obsoleted terms with replacement suggestions.
+- Enrich seed lists via simple, minimal, full, contextual, and ancestor-based strategies.
+- Build tabular outputs (`pandas.DataFrame`) and transitive-reduced graphs (`rdflib.Graph`) for visualization.
+- Batched SPARQL queries and deterministic tests with built-in mocking examples.
 
-With the basic library in place, the first planned use for this is as a base for a library that provides simple enrichement and querability to AnnData Cell X Gene matrices following the [CZ single cell curation standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md).
+## Installation
+
+```bash
+pip install pandasaurus
+```
+
+or with Poetry:
+
+```bash
+poetry add pandasaurus
+```
+
+Requires Python 3.9–3.11.
+
+## Quick Example
+
+```python
+from pandasaurus.curie_validator import CurieValidator
+from pandasaurus.query import Query
+
+seeds = ["CL:0000084", "CL:0000787", "CL:0000636"]
+
+terms = CurieValidator.construct_term_list(seeds)
+CurieValidator.get_validation_report(terms)  # raises if invalid or obsoleted
+
+query = Query(seeds, force_fail=True)
+df = query.simple_enrichment()
+print(df.head())
+```
+
+See the [Quick Start guide](docs/quickstart.rst) for a step-by-step workflow.
+
+## Documentation
+
+Full documentation (quick start, recipes, developer guide, and API reference) lives under `docs/` and is published from the `gh-pages` branch:
+
+- [Hosted documentation](https://incatools.github.io/PandaSaurus/)
+- [Quick Start (source)](docs/quickstart.rst)
+- [Guides (source)](docs/guides/index.rst)
+- [API reference (source)](docs/pandasaurus/index.rst)
+
+To build docs locally:
+
+```bash
+poetry install -E docs
+poetry run sphinx-build -b html docs docs/_build/html
+```
+
+## Contributing
+
+Pull requests are welcome! See `docs/guides/contributing.rst` for details on environment setup, testing, linting, and the release workflow. Pandasaurus aims to remain a small, focused library; please open an issue before introducing large new features.
+
+## Background
+
+The first planned use case is to provide enrichment/query tooling for AnnData Cell x Gene matrices following the [CZ single cell curation standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md).
diff --git a/docs/guides/contributing.rst b/docs/guides/contributing.rst
@@ -0,0 +1,59 @@
+Contributing & Development
+==========================
+
+Environment Setup
+-----------------
+
+1. Install Poetry (see https://python-poetry.org/docs/#installation).
+2. Clone the repository and install dependencies:
+
+   .. code-block:: bash
+
+      poetry install
+
+3. Activate the virtualenv:
+
+   .. code-block:: bash
+
+      poetry shell
+
+Running Tests
+-------------
+
+Use pytest with coverage:
+
+.. code-block:: bash
+
+   poetry run pytest --cov=pandasaurus --cov-report=term-missing
+
+Network-dependent tests hit Ubergraph. If you need deterministic runs, mock ``run_sparql_query`` as shown in ``test/test_query.py``.
+
+Linting & Formatting
+--------------------
+
+Before committing, run:
+
+.. code-block:: bash
+
+   poetry run isort pandasaurus test
+   poetry run black pandasaurus test
+   poetry run flake8 pandasaurus test
+
+The repository also includes a pre-commit hook (``.githooks/pre-commit``) that executes ``isort`` and ``black`` automatically if you configure ``core.hooksPath``.
+
+Documentation
+-------------
+
+Docs live under ``docs/`` (Sphinx). Build them locally with:
+
+.. code-block:: bash
+
+   poetry install -E docs
+   poetry run sphinx-build -b html docs docs/_build/html
+
+CI publishes documentation from ``main`` to the ``gh-pages`` branch via GitHub Actions.
+
+Release Pipeline
+----------------
+
+PyPI releases are automated: publishing a GitHub Release triggers the ``publish-pypi`` workflow, which builds the package via Poetry and uploads to PyPI using the ``PYPI_API_TOKEN`` secret.
diff --git a/docs/guides/index.rst b/docs/guides/index.rst
@@ -0,0 +1,10 @@
+User and Developer Guides
+=========================
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Guides:
+
+   ../overview
+   recipes
+   contributing
diff --git a/docs/guides/recipes.rst b/docs/guides/recipes.rst
@@ -0,0 +1,65 @@
+Task-Oriented Recipes
+=====================
+
+Validate and Update Seeds
+-------------------------
+
+1. Construct the term list:
+
+   .. code-block:: python
+
+      from pandasaurus.curie_validator import CurieValidator
+
+      terms = CurieValidator.construct_term_list(seeds)
+
+2. Catch validation errors:
+
+   .. code-block:: python
+
+      from pandasaurus.utils.pandasaurus_exceptions import InvalidTerm, ObsoletedTerm
+
+      try:
+          CurieValidator.get_validation_report(terms)
+      except InvalidTerm as err:
+          print(err)
+      except ObsoletedTerm as err:
+          print(err)
+
+3. Replace obsoleted terms programmatically:
+
+   .. code-block:: python
+
+      query = Query(seeds)
+      query.update_obsoleted_terms()
+
+Contextual Enrichment
+---------------------
+
+Gather all terms that are ``part_of`` a context and enrich them:
+
+.. code-block:: python
+
+   q = Query(kidney_terms, force_fail=True)
+   enriched = q.contextual_slim_enrichment(["UBERON:0000362"])  # renal medulla
+
+Parent-only Enrichment
+----------------------
+
+Use :meth:`pandasaurus.query.Query.parent_enrichment` for a one-hop graph:
+
+.. code-block:: python
+
+   q = Query(seeds)
+   parent_df = q.parent_enrichment()
+
+Export to Graph
+---------------
+
+After any enrichment call:
+
+.. code-block:: python
+
+   graph_df = q.graph_df  # pandas DataFrame
+   rdflib_graph = q.graph
+
+Use :mod:`pandasaurus.graph.graph_generator` to further manipulate the graph or export as needed.
diff --git a/docs/index.rst b/docs/index.rst
@@ -10,6 +10,9 @@ pandasaurus's documentation!
    :maxdepth: 2
    :caption: Contents:
 
+   overview
+   quickstart
+   guides/index
    introduction
    pandasaurus/index
 
diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -4,9 +4,6 @@ Pandasaurus
 .. image:: https://user-images.githubusercontent.com/112839/227489878-d253c381-75fd-4e92-b851-2b36df0fc5ed.png
     :width: 100
 
-STATUS: BETA
-------------
-
 A python library supporting simple queries over ontology annotations in dataframes, using UberGraph queries.
 
 The aim for now is to keep this as a very simple independent Python lib avoiding any complex dependencies.
diff --git a/docs/overview.rst b/docs/overview.rst
@@ -0,0 +1,61 @@
+Overview
+========
+
+Pandasaurus supports simple queries over ontology annotations in dataframes, powered by Ubergraph SPARQL queries. It keeps dependencies light while still offering CURIE validation, enrichment utilities, and graph exports for downstream tooling.
+
+Features
+--------
+
+- Validate and update seed CURIEs, catching obsoleted terms with replacement suggestions.
+- Enrich seed lists via simple, minimal, full, contextual, and ancestor-based strategies.
+- Build tabular outputs (:class:`pandas.DataFrame`) and transitive-reduced graphs (:class:`rdflib.Graph`) for visualization.
+- Batched SPARQL queries and deterministic tests with built-in mocking examples.
+
+Installation
+------------
+
+.. code-block:: bash
+
+   pip install pandasaurus
+
+or with Poetry:
+
+.. code-block:: bash
+
+   poetry add pandasaurus
+
+Requires Python 3.9–3.11.
+
+Quick Example
+-------------
+
+.. code-block:: python
+
+   from pandasaurus.curie_validator import CurieValidator
+   from pandasaurus.query import Query
+
+   seeds = ["CL:0000084", "CL:0000787", "CL:0000636"]
+
+   terms = CurieValidator.construct_term_list(seeds)
+   CurieValidator.get_validation_report(terms)  # raises if invalid or obsoleted
+
+   query = Query(seeds, force_fail=True)
+   df = query.simple_enrichment()
+   print(df.head())
+
+Continue to :doc:`quickstart` for a full workflow.
+
+.. seealso::
+   Jump straight into the detailed walkthrough in :doc:`quickstart`.
+
+Documentation Links
+-------------------
+
+- :doc:`quickstart`
+- :doc:`guides/index`
+- :doc:`pandasaurus/index`
+
+Background
+----------
+
+The first planned use case is to provide enrichment/query tooling for AnnData Cell x Gene matrices following the `CZ single cell curation standard <https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md>`_.
diff --git a/docs/pandasaurus/curie_validator.rst b/docs/pandasaurus/curie_validator.rst
@@ -1,10 +1,22 @@
 Curie Validator
 ==================
 
-Documentation
--------------
+Overview
+--------
+
+``CurieValidator`` validates seed CURIEs and surfaces obsoleted terms in a structured way. Use it before running any enrichment to ensure your data is clean.
+
+Typical usage:
+
+.. code-block:: python
+
+   terms = CurieValidator.construct_term_list(seeds)
+   CurieValidator.get_validation_report(terms)  # raises on invalid/obsoleted terms
+
+Class Reference
+---------------
 
 .. currentmodule:: pandasaurus.curie_validator
 
 .. autoclass:: CurieValidator
-   :members:
+   :members:
diff --git a/docs/pandasaurus/graph/graph_generator.rst b/docs/pandasaurus/graph/graph_generator.rst
@@ -1,10 +1,17 @@
 Graph Generator
-==================
+===============
 
-Documentation
--------------
+``GraphGenerator`` turns enrichment results into rdflib graphs and applies transitive reduction so you can visualize clean hierarchies.
+
+Use it when:
+
+* You need a graph representation of the enriched DataFrame (for plotting or exporting).
+* You want to remove redundant edges before exporting to visualization tools.
+
+Class Reference
+---------------
 
 .. currentmodule:: pandasaurus.graph.graph_generator
 
 .. autoclass:: GraphGenerator
-   :members:
+   :members:
diff --git a/docs/pandasaurus/graph/index.rst b/docs/pandasaurus/graph/index.rst
@@ -1,6 +1,8 @@
 Graph Module
 =======================
 
+Utilities for turning enrichment results into graphs and reducing redundancy before visualization.
+
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
diff --git a/docs/pandasaurus/query.rst b/docs/pandasaurus/query.rst
@@ -1,10 +1,28 @@
 Query
 ==================
 
-Documentation
--------------
+Overview
+--------
+
+``Query`` orchestrates CURIE validation, enrichment, and graph generation on top of Ubergraph.
+
+Typical workflow:
+
+1. Construct with a seed list (list of CURIE strings).
+2. Call an enrichment method (``simple_enrichment``, ``minimal_slim_enrichment``, ``contextual_slim_enrichment``, etc.).
+3. Inspect the resulting DataFrame or export ``graph_df``/``graph``.
+
+Key Attributes
+--------------
+
+* ``enriched_df`` – latest enrichment results as a pandas DataFrame.
+* ``graph_df`` – edges suitable for plotting or exporting to external graph tooling.
+* ``graph`` – rdflib graph with transitive reduction applied.
+
+Class Reference
+---------------
 
 .. currentmodule:: pandasaurus.query
 
 .. autoclass:: Query
-   :members:
+   :members:
diff --git a/docs/pandasaurus/slim_manager.rst b/docs/pandasaurus/slim_manager.rst
diff --git a/docs/quickstart.rst b/docs/quickstart.rst