pchlenski
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 49 additions & 11 deletions b/‎.github/workflows/test.yml‎
Lines changed: 49 additions & 11 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 91 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 91 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 51 additions & 33 deletions b/‎README.md‎
Lines changed: 51 additions & 33 deletions
@@ -6,6 +6,10 @@ on:
   pull_request:
     branches: [ main ]
 
+permissions:
+  contents: read
+  pull-requests: write
+
 jobs:  
   test:
     runs-on: ubuntu-latest
@@ -25,15 +29,6 @@ jobs:
           python-version: ${{ matrix.python-version }}
           cache: "pip"
 
-      # Cache HuggingFace - this saves time running tests/test_utils.py on subsequent runs
-      - name: Cache HuggingFace datasets
-        uses: actions/cache@v4
-        with:
-          path: ~/.cache/huggingface
-          key: ${{ runner.os }}-huggingface-${{ hashFiles('tests/test_*.py') }}
-          restore-keys: |
-            ${{ runner.os }}-huggingface-
-
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
@@ -63,8 +58,8 @@ jobs:
         run: mypy manify/
 
       # Unit testing
-      - name: Run unit tests & collect coverage
-        run: pytest tests --cov --cov-report=xml
+      - name: Run unit tests & collect coverage (except dataloaders)
+        run: pytest tests --cov --cov-report=xml -k "not test_dataloaders"
 
 
       # Code coverage
@@ -75,3 +70,46 @@ jobs:
           fail_ci_if_error: false
           verbose: true
           flags: unittests
+          name: python-${{ matrix.python-version }}
+
+  # Dataloaders run in parallel, for speed
+  test-dataloaders:
+    runs-on: ubuntu-latest
+    
+    steps:
+      - name: Check out code
+        uses: actions/checkout@v4
+
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+          cache: "pip"
+
+      - name: Cache HuggingFace datasets
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/huggingface
+          key: ${{ runner.os }}-huggingface-dataloaders-v1
+          restore-keys: |
+            ${{ runner.os }}-huggingface-dataloaders-
+            ${{ runner.os }}-huggingface-
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+
+      - name: Run dataloader tests
+        run: pytest tests/test_utils.py::test_dataloaders -v --cov=manify/dataloaders --cov-report=xml
+
+      # Upload dataloader coverage separately
+      - name: Upload dataloader coverage to Codecov
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          fail_ci_if_error: false
+          verbose: true
+          flags: dataloaders
+          name: dataloaders
+
@@ -0,0 +1,91 @@
+# Contributing to Manify
+
+Thank you for your interest in contributing to Manify! We welcome contributions of all kinds.
+
+## Getting Started
+
+1. **Fork and clone** the repository
+2. **Install in development mode**:
+   ```bash
+   pip install -e ".[dev]"
+   ```
+3. **Set up pre-commit hooks**:
+   ```bash
+   pre-commit install
+   ```
+
+## Code Quality Standards
+
+### Type Annotations
+- Use type annotations for all functions and methods
+- Use `jaxtyping` for tensor shape annotations:
+  ```python
+  from jaxtyping import Float
+  import torch
+  
+  def process_embeddings(x: Float[torch.Tensor, "batch dim"]) -> Float[torch.Tensor, "batch output_dim"]:
+      ...
+  ```
+
+### Testing
+- Write unit tests for all new functionality
+- **Coverage requirement**: 80%+ for new code
+- Run tests with beartype enabled (as in CI):
+  ```bash
+  pytest tests --cov
+  ```
+- Tests should cover edge cases and error conditions
+
+### Code Style
+- We use **Ruff** for linting and formatting
+- Check your code before committing:
+  ```bash
+  ruff check manify/
+  ruff format manify/
+  ```
+- Type check with MyPy:
+  ```bash
+  mypy manify/
+  ```
+
+## Documentation
+
+We especially welcome documentation contributions! Areas where help is needed:
+
+- **Mathematical details**: The [paper](https://arxiv.org/abs/2503.09576) contains rich mathematical content that could be integrated into the docs
+- **Tutorials**: More examples and tutorials are always appreciated
+- **API documentation**: Improving docstrings and examples
+- **Use case guides**: Real-world applications and workflows
+
+Documentation uses Google-style docstrings:
+```python
+def my_function(param1: int, param2: str) -> bool:
+    """Brief description of the function.
+    
+    Args:
+        param1: Description of param1
+        param2: Description of param2
+        
+    Returns:
+        Description of return value
+        
+    Raises:
+        ValueError: When something goes wrong
+    """
+```
+
+## Pull Request Process
+
+1. **Create a feature branch** from `main`
+2. **Make your changes** following the standards above
+3. **Add tests** with good coverage
+4. **Update documentation** as needed
+5. **Ensure CI passes** (tests, linting, type checking)
+6. **Submit a pull request** with a clear description
+
+## Questions?
+
+- Open an [issue](https://github.com/pchlenski/manify/issues) for bugs or feature requests
+- Start a [discussion](https://github.com/pchlenski/manify/discussions) for questions
+
+We appreciate your contributions to making non-Euclidean machine learning more accessible!
@@ -1,26 +1,23 @@
 # Manify 🪐
-> A Python Library for Learning Non-Euclidean Representations
 
 [![Python Version](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
-[![License](https://img.shields.io/github/license/pchlenski/manify)](https://github.com/pchlenski/manify/blob/main/LICENSE)
 [![PyPI version](https://badge.fury.io/py/manify.svg)](https://badge.fury.io/py/manify)
 [![Tests](https://github.com/pchlenski/manify/actions/workflows/test.yml/badge.svg)](https://github.com/pchlenski/manify/actions/workflows/test.yml)
 [![codecov](https://codecov.io/gh/pchlenski/manify/branch/main/graph/badge.svg)](https://codecov.io/gh/pchlenski/manify)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
+[![Documentation](https://img.shields.io/badge/docs-manify.readthedocs.io-blue)](https://manify.readthedocs.io)
+[![arXiv](https://img.shields.io/badge/arXiv-2503.09576-b31b1b.svg)](https://arxiv.org/abs/2503.09576)
+[![License](https://img.shields.io/github/license/pchlenski/manify)](https://github.com/pchlenski/manify/blob/main/LICENSE)
 
-
-Manify is a Python library for generating graph/data embeddings and performing machine learning in product spaces with mixed curvature (hyperbolic, Euclidean, and spherical spaces). It provides tools for manifold creation, curvature estimation, embedding generation, and predictive modeling that respects the underlying geometry of complex data.
-
-You can read our manuscript here: [Manify: A Python Library for Learning Non-Euclidean Representations](https://arxiv.org/abs/2503.09576)
-
-📖 **Documentation**: [manify.readthedocs.io](https://manify.readthedocs.io)
-
-## Key Features
-- Create and manipulate manifolds with different curvatures (hyperbolic, Euclidean, spherical)
-- Build product manifolds by combining multiple spaces with different geometric properties
-- Learn embeddings of data in these manifolds
-- Train machine learning models that respect the geometry of the embedding space
-- Generate synthetic data with known geometric properties for benchmarking
+Manify is a Python library for non-Euclidean representation learning. 
+It is built on top of `geoopt` and follows `scikit-learn` API conventions.
+The library supports a variety of workflows involving (products of) Riemannian manifolds, including:
+- All basic manifold operations (e.g. exponential map, logarithmic map, parallel transport, and distance computations)
+- Sampling Gaussian distributions and Gaussian mixtures
+- Learning embeddings of data on product manifolds, using features and/or distances
+- Training machine learning models on manifold-valued embeddings, including decision trees, random forests, SVMs, 
+perceptrons, and neural networks.
+- Clustering manifold-valued data using Riemannian fuzzy K-Means
 
 ## Installation
 
@@ -31,37 +28,33 @@ There are two ways to install `manify`:
    pip install manify
    ```
 
-2. **From GitHub**:
+2. **From GitHub** (recommended due to active development of the repo):
    ```bash
    pip install git+https://github.com/pchlenski/manify
    ```
 
 ## Quick Example
 
 ```python
-import torch
-from manify.manifolds import ProductManifold
-from manify.embedders import CoordinateLearning
-from manify.predictors.decision_tree import ProductSpaceDT
+import manify
 from manify.utils.dataloaders import load_hf
 from sklearn.model_selection import train_test_split
 
-# Load graph data
+# Load Polblogs graph from HuggingFace
 features, dists, adj, labels = load_hf("polblogs")
 
-# Create product manifold
-pm = ProductManifold(signature=[(1, 4)])  # S^4_1
+# Create an S^4 x H^4 product manifold
+pm = manify.ProductManifold(signature=[(1.0, 4), (-1.0, 4)])
 
 # Learn embeddings (Gu et al (2018) method)
-embedder = CoordinateLearning(pm=pm)
-embedder.fit(X=None, D=dists)
-X_embedded = embedder.transform()
+embedder = manify.CoordinateLearning(pm=pm)
+X_embedded = embedder.fit_transform(X=None, D=dists, burn_in_iterations=200, training_iterations=800)
 
 # Train and evaluate classifier (Chlenski et al (2025) method)
 X_train, X_test, y_train, y_test = train_test_split(X_embedded, labels)
-tree = ProductSpaceDT(pm=pm, max_depth=3, task="classification")
-tree.fit(X_train, y_train)
-print(tree.score(X_test, y_test))
+model = manify.ProductSpaceDT(pm=pm, max_depth=3, task="classification")
+model.fit(X_train, y_train)
+print(model.score(X_test, y_test))
 ```
 
 ## Modules
@@ -71,7 +64,7 @@ print(tree.score(X_test, y_test))
 
 **Curvature Estimation**
 - `manify.curvature_estimation.delta_hyperbolicity` - Compute delta-hyperbolicity of a metric space
-- `manify.curvature_estimation.greedy_method` - Greedy selection of signatures
+- `manify.curvature_estimation.greedy_method` - Greedy selection of near-optimal signatures
 - `manify.curvature_estimation.sectional_curvature` - Sectional curvature estimation using Toponogov's theorem
 
 **Embedders**
@@ -80,6 +73,7 @@ print(tree.score(X_test, y_test))
 - `manify.embedders.vae` - Product space variational autoencoder
 
 **Predictors**
+- `manify.predictors.nn` - Neural network layers
 - `manify.predictors.decision_tree` - Decision tree and random forest predictors
 - `manify.predictors.kappa_gcn` - Kappa GCN
 - `manify.predictors.perceptron` - Product space perceptron
@@ -97,10 +91,20 @@ print(tree.score(X_test, y_test))
 - `manify.utils.link_prediction` - Preprocessing graphs with link prediction
 - `manify.utils.visualization` - Tools for visualization
 
-## Research Background
-Manify implements geometric machine learning approaches described in academic papers, particularly focusing on handling data with mixed geometric properties. It's especially suited for data that naturally lives in non-Euclidean spaces, such as hierarchical data, networks, and certain types of biological data.
+## Archival branches
+This repo has a number of archival branches that contain code from previous versions of the library when it was under
+active development. These branches are not maintained and are provided for reference only:
+- [Dataset-Generation](https://github.com/pchlenski/manify/tree/Dataset-Generation). This branch contains code used to
+generate the datasets found in `manify.utils.dataloaders`.
+- [notebook-archive](https://github.com/pchlenski/manify/tree/notebook_archive). This branch contains dozens of Jupyter
+notebooks and datasets that were used to develop the library and carry out various benchmarks for the Mixed Curvature
+Decision Trees and Random Forests paper.
 
-## Citation
+## Contributing
+Please read our [contributing guide](https://github.com/pchlenski/manify/blob/main/CONTRIBUTING.md) for details on how
+to contribute to the project.
+
+## References
 If you use our work, please cite the `Manify` paper:
 ```bibtex
 @misc{chlenski2025manifypythonlibrarylearning,
@@ -113,3 +117,17 @@ If you use our work, please cite the `Manify` paper:
       url={https://arxiv.org/abs/2503.09576}, 
 }
 ```
+
+Additionally, if you use one of the methods implemented in `manify`, please cite the original papers:
+- `CoordinateLearning`: Gu et al. "Learning Mixed-Curvature Representations in Product Spaces." ICLR 2019. 
+[https://openreview.net/forum?id=HJxeWnCcF7](https://openreview.net/forum?id=HJxeWnCcF7)
+- `ProductSpaceVAE`: Skopek et al. "Mixed-Curvature Variational Autoencoders." ICLR 2020. 
+[https://openreview.net/forum?id=S1g6xeSKDS](https://openreview.net/forum?id=S1g6xeSKDS)
+- `SiameseNetwork`: Based on Siamese networks: Chopra et al. "Learning a Similarity Metric Discriminatively, with Application to Face Verification." CVPR 2005. [https://ieeexplore.ieee.org/document/1467314](https://ieeexplore.ieee.org/document/1467314)
+- `ProductSpaceDT` and `ProductSpaceRF`: Chlenski et al. "Mixed Curvature Decision Trees and Random Forests." ICML 2025. [https://arxiv.org/abs/2410.13879](https://arxiv.org/abs/2410.13879)
+- `KappaGCN`: Bachmann et al. "Constant Curvature Graph Convolutional Networks." ICML 2020. [https://proceedings.mlr.press/v119/bachmann20a.html](https://proceedings.mlr.press/v119/bachmann20a.html)
+- `ProductSpacePerceptron` and `ProductSpaceSVM`: Tabaghi et al. "Linear Classifiers in Product Space Forms." ArXiv 2021. [https://arxiv.org/abs/2102.10204](https://arxiv.org/abs/2102.10204)
+- `RiemannianFuzzyKMeans` and `RiemannianAdan`: Yuan et al. "Riemannian Fuzzy K-Means." OpenReview 2025. [https://openreview.net/forum?id=9VmOgMN4Ie](https://openreview.net/forum?id=9VmOgMN4Ie)
+- Delta-hyperbolicity computation: Based on Gromov's δ-hyperbolicity metric for tree-likeness of metric spaces. Gromov, M. "Hyperbolic Groups." Essays in Group Theory, 1987. [https://link.springer.com/chapter/10.1007/978-1-4613-9586-7_3](https://link.springer.com/chapter/10.1007/978-1-4613-9586-7_3)
+- Sectional curvature estimation: Gu et al. "Learning Mixed-Curvature Representations in Product Spaces." ICLR 2019. [https://openreview.net/forum?id=HJxeWnCcF7](https://openreview.net/forum?id=HJxeWnCcF7)
+- Greedy signature selection: Tabaghi et al. "Linear Classifiers in Product Space Forms." ArXiv 2021. [https://arxiv.org/abs/2102.10204](https://arxiv.org/abs/2102.10204)