Skip to content

Commit b8c4f75

Browse files
committed
Merged with main
2 parents 960dfeb + 7443323 commit b8c4f75

File tree

839 files changed

+2843
-1160118
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

839 files changed

+2843
-1160118
lines changed

.github/workflows/test.yml

Lines changed: 49 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ on:
66
pull_request:
77
branches: [ main ]
88

9+
permissions:
10+
contents: read
11+
pull-requests: write
12+
913
jobs:
1014
test:
1115
runs-on: ubuntu-latest
@@ -25,15 +29,6 @@ jobs:
2529
python-version: ${{ matrix.python-version }}
2630
cache: "pip"
2731

28-
# Cache HuggingFace - this saves time running tests/test_utils.py on subsequent runs
29-
- name: Cache HuggingFace datasets
30-
uses: actions/cache@v4
31-
with:
32-
path: ~/.cache/huggingface
33-
key: ${{ runner.os }}-huggingface-${{ hashFiles('tests/test_*.py') }}
34-
restore-keys: |
35-
${{ runner.os }}-huggingface-
36-
3732
- name: Install dependencies
3833
run: |
3934
python -m pip install --upgrade pip
@@ -63,8 +58,8 @@ jobs:
6358
run: mypy manify/
6459

6560
# Unit testing
66-
- name: Run unit tests & collect coverage
67-
run: pytest tests --cov --cov-report=xml
61+
- name: Run unit tests & collect coverage (except dataloaders)
62+
run: pytest tests --cov --cov-report=xml -k "not test_dataloaders"
6863

6964

7065
# Code coverage
@@ -75,3 +70,46 @@ jobs:
7570
fail_ci_if_error: false
7671
verbose: true
7772
flags: unittests
73+
name: python-${{ matrix.python-version }}
74+
75+
# Dataloaders run in parallel, for speed
76+
test-dataloaders:
77+
runs-on: ubuntu-latest
78+
79+
steps:
80+
- name: Check out code
81+
uses: actions/checkout@v4
82+
83+
- name: Set up Python 3.11
84+
uses: actions/setup-python@v5
85+
with:
86+
python-version: "3.11"
87+
cache: "pip"
88+
89+
- name: Cache HuggingFace datasets
90+
uses: actions/cache@v4
91+
with:
92+
path: ~/.cache/huggingface
93+
key: ${{ runner.os }}-huggingface-dataloaders-v1
94+
restore-keys: |
95+
${{ runner.os }}-huggingface-dataloaders-
96+
${{ runner.os }}-huggingface-
97+
98+
- name: Install dependencies
99+
run: |
100+
python -m pip install --upgrade pip
101+
pip install -e ".[dev]"
102+
103+
- name: Run dataloader tests
104+
run: pytest tests/test_utils.py::test_dataloaders -v --cov=manify/dataloaders --cov-report=xml
105+
106+
# Upload dataloader coverage separately
107+
- name: Upload dataloader coverage to Codecov
108+
uses: codecov/codecov-action@v5
109+
with:
110+
token: ${{ secrets.CODECOV_TOKEN }}
111+
fail_ci_if_error: false
112+
verbose: true
113+
flags: dataloaders
114+
name: dataloaders
115+

CONTRIBUTING.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Contributing to Manify
2+
3+
Thank you for your interest in contributing to Manify! We welcome contributions of all kinds.
4+
5+
## Getting Started
6+
7+
1. **Fork and clone** the repository
8+
2. **Install in development mode**:
9+
```bash
10+
pip install -e ".[dev]"
11+
```
12+
3. **Set up pre-commit hooks**:
13+
```bash
14+
pre-commit install
15+
```
16+
17+
## Code Quality Standards
18+
19+
### Type Annotations
20+
- Use type annotations for all functions and methods
21+
- Use `jaxtyping` for tensor shape annotations:
22+
```python
23+
from jaxtyping import Float
24+
import torch
25+
26+
def process_embeddings(x: Float[torch.Tensor, "batch dim"]) -> Float[torch.Tensor, "batch output_dim"]:
27+
...
28+
```
29+
30+
### Testing
31+
- Write unit tests for all new functionality
32+
- **Coverage requirement**: 80%+ for new code
33+
- Run tests with beartype enabled (as in CI):
34+
```bash
35+
pytest tests --cov
36+
```
37+
- Tests should cover edge cases and error conditions
38+
39+
### Code Style
40+
- We use **Ruff** for linting and formatting
41+
- Check your code before committing:
42+
```bash
43+
ruff check manify/
44+
ruff format manify/
45+
```
46+
- Type check with MyPy:
47+
```bash
48+
mypy manify/
49+
```
50+
51+
## Documentation
52+
53+
We especially welcome documentation contributions! Areas where help is needed:
54+
55+
- **Mathematical details**: The [paper](https://arxiv.org/abs/2503.09576) contains rich mathematical content that could be integrated into the docs
56+
- **Tutorials**: More examples and tutorials are always appreciated
57+
- **API documentation**: Improving docstrings and examples
58+
- **Use case guides**: Real-world applications and workflows
59+
60+
Documentation uses Google-style docstrings:
61+
```python
62+
def my_function(param1: int, param2: str) -> bool:
63+
"""Brief description of the function.
64+
65+
Args:
66+
param1: Description of param1
67+
param2: Description of param2
68+
69+
Returns:
70+
Description of return value
71+
72+
Raises:
73+
ValueError: When something goes wrong
74+
"""
75+
```
76+
77+
## Pull Request Process
78+
79+
1. **Create a feature branch** from `main`
80+
2. **Make your changes** following the standards above
81+
3. **Add tests** with good coverage
82+
4. **Update documentation** as needed
83+
5. **Ensure CI passes** (tests, linting, type checking)
84+
6. **Submit a pull request** with a clear description
85+
86+
## Questions?
87+
88+
- Open an [issue](https://github.com/pchlenski/manify/issues) for bugs or feature requests
89+
- Start a [discussion](https://github.com/pchlenski/manify/discussions) for questions
90+
91+
We appreciate your contributions to making non-Euclidean machine learning more accessible!

README.md

Lines changed: 51 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,23 @@
11
# Manify 🪐
2-
> A Python Library for Learning Non-Euclidean Representations
32

43
[![Python Version](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
5-
[![License](https://img.shields.io/github/license/pchlenski/manify)](https://github.com/pchlenski/manify/blob/main/LICENSE)
64
[![PyPI version](https://badge.fury.io/py/manify.svg)](https://badge.fury.io/py/manify)
75
[![Tests](https://github.com/pchlenski/manify/actions/workflows/test.yml/badge.svg)](https://github.com/pchlenski/manify/actions/workflows/test.yml)
86
[![codecov](https://codecov.io/gh/pchlenski/manify/branch/main/graph/badge.svg)](https://codecov.io/gh/pchlenski/manify)
97
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
8+
[![Documentation](https://img.shields.io/badge/docs-manify.readthedocs.io-blue)](https://manify.readthedocs.io)
9+
[![arXiv](https://img.shields.io/badge/arXiv-2503.09576-b31b1b.svg)](https://arxiv.org/abs/2503.09576)
10+
[![License](https://img.shields.io/github/license/pchlenski/manify)](https://github.com/pchlenski/manify/blob/main/LICENSE)
1011

11-
12-
Manify is a Python library for generating graph/data embeddings and performing machine learning in product spaces with mixed curvature (hyperbolic, Euclidean, and spherical spaces). It provides tools for manifold creation, curvature estimation, embedding generation, and predictive modeling that respects the underlying geometry of complex data.
13-
14-
You can read our manuscript here: [Manify: A Python Library for Learning Non-Euclidean Representations](https://arxiv.org/abs/2503.09576)
15-
16-
📖 **Documentation**: [manify.readthedocs.io](https://manify.readthedocs.io)
17-
18-
## Key Features
19-
- Create and manipulate manifolds with different curvatures (hyperbolic, Euclidean, spherical)
20-
- Build product manifolds by combining multiple spaces with different geometric properties
21-
- Learn embeddings of data in these manifolds
22-
- Train machine learning models that respect the geometry of the embedding space
23-
- Generate synthetic data with known geometric properties for benchmarking
12+
Manify is a Python library for non-Euclidean representation learning.
13+
It is built on top of `geoopt` and follows `scikit-learn` API conventions.
14+
The library supports a variety of workflows involving (products of) Riemannian manifolds, including:
15+
- All basic manifold operations (e.g. exponential map, logarithmic map, parallel transport, and distance computations)
16+
- Sampling Gaussian distributions and Gaussian mixtures
17+
- Learning embeddings of data on product manifolds, using features and/or distances
18+
- Training machine learning models on manifold-valued embeddings, including decision trees, random forests, SVMs,
19+
perceptrons, and neural networks.
20+
- Clustering manifold-valued data using Riemannian fuzzy K-Means
2421

2522
## Installation
2623

@@ -31,37 +28,33 @@ There are two ways to install `manify`:
3128
pip install manify
3229
```
3330

34-
2. **From GitHub**:
31+
2. **From GitHub** (recommended due to active development of the repo):
3532
```bash
3633
pip install git+https://github.com/pchlenski/manify
3734
```
3835

3936
## Quick Example
4037

4138
```python
42-
import torch
43-
from manify.manifolds import ProductManifold
44-
from manify.embedders import CoordinateLearning
45-
from manify.predictors.decision_tree import ProductSpaceDT
39+
import manify
4640
from manify.utils.dataloaders import load_hf
4741
from sklearn.model_selection import train_test_split
4842

49-
# Load graph data
43+
# Load Polblogs graph from HuggingFace
5044
features, dists, adj, labels = load_hf("polblogs")
5145

52-
# Create product manifold
53-
pm = ProductManifold(signature=[(1, 4)]) # S^4_1
46+
# Create an S^4 x H^4 product manifold
47+
pm = manify.ProductManifold(signature=[(1.0, 4), (-1.0, 4)])
5448

5549
# Learn embeddings (Gu et al (2018) method)
56-
embedder = CoordinateLearning(pm=pm)
57-
embedder.fit(X=None, D=dists)
58-
X_embedded = embedder.transform()
50+
embedder = manify.CoordinateLearning(pm=pm)
51+
X_embedded = embedder.fit_transform(X=None, D=dists, burn_in_iterations=200, training_iterations=800)
5952

6053
# Train and evaluate classifier (Chlenski et al (2025) method)
6154
X_train, X_test, y_train, y_test = train_test_split(X_embedded, labels)
62-
tree = ProductSpaceDT(pm=pm, max_depth=3, task="classification")
63-
tree.fit(X_train, y_train)
64-
print(tree.score(X_test, y_test))
55+
model = manify.ProductSpaceDT(pm=pm, max_depth=3, task="classification")
56+
model.fit(X_train, y_train)
57+
print(model.score(X_test, y_test))
6558
```
6659

6760
## Modules
@@ -71,7 +64,7 @@ print(tree.score(X_test, y_test))
7164

7265
**Curvature Estimation**
7366
- `manify.curvature_estimation.delta_hyperbolicity` - Compute delta-hyperbolicity of a metric space
74-
- `manify.curvature_estimation.greedy_method` - Greedy selection of signatures
67+
- `manify.curvature_estimation.greedy_method` - Greedy selection of near-optimal signatures
7568
- `manify.curvature_estimation.sectional_curvature` - Sectional curvature estimation using Toponogov's theorem
7669

7770
**Embedders**
@@ -80,6 +73,7 @@ print(tree.score(X_test, y_test))
8073
- `manify.embedders.vae` - Product space variational autoencoder
8174

8275
**Predictors**
76+
- `manify.predictors.nn` - Neural network layers
8377
- `manify.predictors.decision_tree` - Decision tree and random forest predictors
8478
- `manify.predictors.kappa_gcn` - Kappa GCN
8579
- `manify.predictors.perceptron` - Product space perceptron
@@ -97,10 +91,20 @@ print(tree.score(X_test, y_test))
9791
- `manify.utils.link_prediction` - Preprocessing graphs with link prediction
9892
- `manify.utils.visualization` - Tools for visualization
9993

100-
## Research Background
101-
Manify implements geometric machine learning approaches described in academic papers, particularly focusing on handling data with mixed geometric properties. It's especially suited for data that naturally lives in non-Euclidean spaces, such as hierarchical data, networks, and certain types of biological data.
94+
## Archival branches
95+
This repo has a number of archival branches that contain code from previous versions of the library when it was under
96+
active development. These branches are not maintained and are provided for reference only:
97+
- [Dataset-Generation](https://github.com/pchlenski/manify/tree/Dataset-Generation). This branch contains code used to
98+
generate the datasets found in `manify.utils.dataloaders`.
99+
- [notebook-archive](https://github.com/pchlenski/manify/tree/notebook_archive). This branch contains dozens of Jupyter
100+
notebooks and datasets that were used to develop the library and carry out various benchmarks for the Mixed Curvature
101+
Decision Trees and Random Forests paper.
102102

103-
## Citation
103+
## Contributing
104+
Please read our [contributing guide](https://github.com/pchlenski/manify/blob/main/CONTRIBUTING.md) for details on how
105+
to contribute to the project.
106+
107+
## References
104108
If you use our work, please cite the `Manify` paper:
105109
```bibtex
106110
@misc{chlenski2025manifypythonlibrarylearning,
@@ -113,3 +117,17 @@ If you use our work, please cite the `Manify` paper:
113117
url={https://arxiv.org/abs/2503.09576},
114118
}
115119
```
120+
121+
Additionally, if you use one of the methods implemented in `manify`, please cite the original papers:
122+
- `CoordinateLearning`: Gu et al. "Learning Mixed-Curvature Representations in Product Spaces." ICLR 2019.
123+
[https://openreview.net/forum?id=HJxeWnCcF7](https://openreview.net/forum?id=HJxeWnCcF7)
124+
- `ProductSpaceVAE`: Skopek et al. "Mixed-Curvature Variational Autoencoders." ICLR 2020.
125+
[https://openreview.net/forum?id=S1g6xeSKDS](https://openreview.net/forum?id=S1g6xeSKDS)
126+
- `SiameseNetwork`: Based on Siamese networks: Chopra et al. "Learning a Similarity Metric Discriminatively, with Application to Face Verification." CVPR 2005. [https://ieeexplore.ieee.org/document/1467314](https://ieeexplore.ieee.org/document/1467314)
127+
- `ProductSpaceDT` and `ProductSpaceRF`: Chlenski et al. "Mixed Curvature Decision Trees and Random Forests." ICML 2025. [https://arxiv.org/abs/2410.13879](https://arxiv.org/abs/2410.13879)
128+
- `KappaGCN`: Bachmann et al. "Constant Curvature Graph Convolutional Networks." ICML 2020. [https://proceedings.mlr.press/v119/bachmann20a.html](https://proceedings.mlr.press/v119/bachmann20a.html)
129+
- `ProductSpacePerceptron` and `ProductSpaceSVM`: Tabaghi et al. "Linear Classifiers in Product Space Forms." ArXiv 2021. [https://arxiv.org/abs/2102.10204](https://arxiv.org/abs/2102.10204)
130+
- `RiemannianFuzzyKMeans` and `RiemannianAdan`: Yuan et al. "Riemannian Fuzzy K-Means." OpenReview 2025. [https://openreview.net/forum?id=9VmOgMN4Ie](https://openreview.net/forum?id=9VmOgMN4Ie)
131+
- Delta-hyperbolicity computation: Based on Gromov's δ-hyperbolicity metric for tree-likeness of metric spaces. Gromov, M. "Hyperbolic Groups." Essays in Group Theory, 1987. [https://link.springer.com/chapter/10.1007/978-1-4613-9586-7_3](https://link.springer.com/chapter/10.1007/978-1-4613-9586-7_3)
132+
- Sectional curvature estimation: Gu et al. "Learning Mixed-Curvature Representations in Product Spaces." ICLR 2019. [https://openreview.net/forum?id=HJxeWnCcF7](https://openreview.net/forum?id=HJxeWnCcF7)
133+
- Greedy signature selection: Tabaghi et al. "Linear Classifiers in Product Space Forms." ArXiv 2021. [https://arxiv.org/abs/2102.10204](https://arxiv.org/abs/2102.10204)

0 commit comments

Comments
 (0)