Update readme/contributing; fix clustering tests

pchlenski · pchlenski · commit 0b6c59da0191 · 2025-07-12T01:00:33.000-07:00
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,91 @@
+# Contributing to Manify
+
+Thank you for your interest in contributing to Manify! We welcome contributions of all kinds.
+
+## Getting Started
+
+1. **Fork and clone** the repository
+2. **Install in development mode**:
+   ```bash
+   pip install -e ".[dev]"
+   ```
+3. **Set up pre-commit hooks**:
+   ```bash
+   pre-commit install
+   ```
+
+## Code Quality Standards
+
+### Type Annotations
+- Use type annotations for all functions and methods
+- Use `jaxtyping` for tensor shape annotations:
+  ```python
+  from jaxtyping import Float
+  import torch
+  
+  def process_embeddings(x: Float[torch.Tensor, "batch dim"]) -> Float[torch.Tensor, "batch output_dim"]:
+      ...
+  ```
+
+### Testing
+- Write unit tests for all new functionality
+- **Coverage requirement**: 80%+ for new code
+- Run tests with beartype enabled (as in CI):
+  ```bash
+  pytest tests --cov
+  ```
+- Tests should cover edge cases and error conditions
+
+### Code Style
+- We use **Ruff** for linting and formatting
+- Check your code before committing:
+  ```bash
+  ruff check manify/
+  ruff format manify/
+  ```
+- Type check with MyPy:
+  ```bash
+  mypy manify/
+  ```
+
+## Documentation
+
+We especially welcome documentation contributions! Areas where help is needed:
+
+- **Mathematical details**: The [paper](https://arxiv.org/abs/2503.09576) contains rich mathematical content that could be integrated into the docs
+- **Tutorials**: More examples and tutorials are always appreciated
+- **API documentation**: Improving docstrings and examples
+- **Use case guides**: Real-world applications and workflows
+
+Documentation uses Google-style docstrings:
+```python
+def my_function(param1: int, param2: str) -> bool:
+    """Brief description of the function.
+    
+    Args:
+        param1: Description of param1
+        param2: Description of param2
+        
+    Returns:
+        Description of return value
+        
+    Raises:
+        ValueError: When something goes wrong
+    """
+```
+
+## Pull Request Process
+
+1. **Create a feature branch** from `main`
+2. **Make your changes** following the standards above
+3. **Add tests** with good coverage
+4. **Update documentation** as needed
+5. **Ensure CI passes** (tests, linting, type checking)
+6. **Submit a pull request** with a clear description
+
+## Questions?
+
+- Open an [issue](https://github.com/pchlenski/manify/issues) for bugs or feature requests
+- Start a [discussion](https://github.com/pchlenski/manify/discussions) for questions
+
+We appreciate your contributions to making non-Euclidean machine learning more accessible!
diff --git a/README.md b/README.md
@@ -8,19 +8,19 @@
 [![codecov](https://codecov.io/gh/pchlenski/manify/branch/main/graph/badge.svg)](https://codecov.io/gh/pchlenski/manify)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 
-
-Manify is a Python library for generating graph/data embeddings and performing machine learning in product spaces with mixed curvature (hyperbolic, Euclidean, and spherical spaces). It provides tools for manifold creation, curvature estimation, embedding generation, and predictive modeling that respects the underlying geometry of complex data.
-
-You can read our manuscript here: [Manify: A Python Library for Learning Non-Euclidean Representations](https://arxiv.org/abs/2503.09576)
+Manify is a Python library for non-Euclidean representation learning. 
+It is built on top of `geoopt` and follows `scikit-learn` API conventions.
+The library supports a variety of workflows involving (products of) Riemannian manifolds, including:
+- All basic manifold operations (e.g. exponential map, logarithmic map, parallel transport, and distance computations)
+- Sampling Gaussian distributions and Gaussian mixtures
+- Learning embeddings of data on product manifolds, using features and/or distances
+- Training machine learning models on manifold-valued embeddings, including decision trees, random forests, SVMs, 
+perceptrons, and neural networks.
+- Clustering manifold-valued data using Riemannian fuzzy K-Means
 
 📖 **Documentation**: [manify.readthedocs.io](https://manify.readthedocs.io)
-
-## Key Features
-- Create and manipulate manifolds with different curvatures (hyperbolic, Euclidean, spherical)
-- Build product manifolds by combining multiple spaces with different geometric properties
-- Learn embeddings of data in these manifolds
-- Train machine learning models that respect the geometry of the embedding space
-- Generate synthetic data with known geometric properties for benchmarking
+📝 **Manuscript**: [Manify: A Python Library for Learning Non-Euclidean Representations](https://arxiv.org/abs/2503.09576)
+🐛 **Issue Tracker**: [Github](https://github.com/pchlenski/manify/issues)
 
 ## Installation
 
@@ -107,6 +107,10 @@ generate the datasets found in `manify.utils.dataloaders`.
 notebooks and datasets that were used to develop the library and carry out various benchmarks for the Mixed Curvature
 Decision Trees and Random Forests paper.
 
+## Contributing
+Please read our [contributing guide](https://github.com/pchlenski/manify/blob/main/CONTRIBUTING.md) for details on how
+to contribute to the project.
+
 ## Citation
 If you use our work, please cite the `Manify` paper:
 ```bibtex
diff --git a/tests/test_clustering.py b/tests/test_clustering.py
@@ -1,3 +1,5 @@
+import torch
+
 from manify.clustering import RiemannianFuzzyKMeans
 from manify.manifolds import ProductManifold
 
@@ -7,14 +9,14 @@ def test_riemannian_fuzzy_k_means():
     X, _ = pm.gaussian_mixture(num_points=100)
 
     for optimizer in ["adam", "adan"]:
-        kmeans = RiemannianFuzzyKMeans(manifold=pm, n_clusters=5)
+        kmeans = RiemannianFuzzyKMeans(manifold=pm, n_clusters=5, random_state=42)
         kmeans.fit(X)
         preds = kmeans.predict(X)
         assert preds.shape == (100,), f"Predictions should have shape (100,) (optimizer: {optimizer})"
 
         # Also test with X as a numpy array
         X_np = X.numpy()
-        kmeans = RiemannianFuzzyKMeans(manifold=pm, n_clusters=5)
+        kmeans = RiemannianFuzzyKMeans(manifold=pm, n_clusters=5, random_state=42)
         kmeans.fit(X_np)
         preds_np = kmeans.predict(X_np)
         assert torch.tensor(preds_np).shape == (100,), f"Predictions should have shape (100,) (optimizer: {optimizer})"
@@ -28,4 +30,3 @@ def test_riemannian_fuzzy_k_means():
         kmeans.fit(X0)
         preds = kmeans.predict(X0)
         assert preds.shape == (100,), f"Predictions should have shape (100,) (optimizer: {optimizer})"
-