🔬 LabChain

The Modern ML Experimentation Framework

Build, experiment, and deploy ML pipelines with confidence

Documentation • Quick Start • Examples • Contributing

🎯 What is LabChain?

LabChain is a production-ready ML experimentation framework that combines the flexibility of research with the rigor of production deployment. Stop fighting with boilerplate code and focus on what matters: your models.

✨ Why LabChain?

🧩 Modular by Design

Compose pipelines from reusable filters
Plug-and-play architecture
No vendor lock-in

🚀 Production Ready

Automatic caching and versioning
Distributed processing support
Cloud-native storage backends

🔄 Reproducible

Version-controlled experiments
Deterministic pipelines
Full audit trails

⚡ Experimental Features

Remote code injection
Zero-deployment pipelines
Automatic dependency management

🚀 Quick Start

Installation

pip install framework3

Your First Pipeline (2 minutes)

from labchain import Container, F3Pipeline
from labchain.plugins.filters import StandardScalerPlugin, KnnFilter
from labchain.plugins.metrics import F1, Precission, Recall
from labchain.base import XYData
from sklearn.datasets import load_iris

# Load data
iris = load_iris()
X = XYData.mock(iris.data)
y = XYData.mock(iris.target)

# Build pipeline
pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        KnnFilter(n_neighbors=5)
    ],
    metrics=[F1("weighted"), Precission("weighted"), Recall("weighted")]
)

# Train and evaluate
pipeline.fit(X, y)
predictions = pipeline.predict(X)
results = pipeline.evaluate(X, y, predictions)

print(results)
# {'F1': 0.95, 'Precision': 0.95, 'Recall': 0.95}

That's it! 🎉 You just built, trained, and evaluated an ML pipeline.

💡 Key Features

🏗️ Modular Architecture

# Mix and match components like LEGO blocks
from labchain.plugins.filters import (
    PCAPlugin,
    StandardScalerPlugin,
    ClassifierSVMPlugin
)

pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        PCAPlugin(n_components=2),
        ClassifierSVMPlugin(kernel='rbf')
    ]
)

🔄 Smart Caching

from labchain.plugins.filters import Cached

# Cache expensive operations automatically
pipeline = F3Pipeline(
    filters=[
        Cached(
            filter=ExpensivePreprocessor(),
            cache_data=True,
            cache_filter=True
        ),
        MyModel()
    ]
)

📊 Hyperparameter Optimization

from labchain import WandbOptimizer

# Optimize with Weights & Biases
optimizer = WandbOptimizer(
    project="my-experiment",
    scorer=F1(),
    method="bayes",
    n_trials=50
)

# Define search space
pipeline = F3Pipeline(
    filters=[
        KnnFilter().grid({
            'n_neighbors': [3, 5, 7, 9]
        })
    ]
)

optimizer.optimize(pipeline)
optimizer.fit(X_train, y_train)

⚡ Remote Injection (Experimental)

Deploy pipelines without deploying code:

# On your laptop
@Container.bind(persist=True)
class MyCustomFilter(BaseFilter):
    def predict(self, x):
        return x * 2

Container.storage = S3Storage(bucket="my-models")
Container.ppif.push_all()

# On production server (no source code needed!)
from labchain.base import BasePlugin

pipeline = BasePlugin.build_from_dump(config, Container.ppif)
predictions = pipeline.predict(data)  # Just works! ✨

🌐 Distributed Processing (Experimental)

from labchain import HPCPipeline

# Automatic Spark distribution
pipeline = HPCPipeline(
    app_name="distributed-training",
    filters=[Filter1(), Filter2(), Filter3()]
)

pipeline.fit(large_dataset)

📚 Examples

Classification with Cross-Validation

from labchain import F3Pipeline, KFoldSplitter
from labchain.plugins.filters import StandardScalerPlugin, ClassifierSVMPlugin
from labchain.plugins.metrics import F1, Precission, Recall

pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        ClassifierSVMPlugin(kernel='rbf', C=1.0)
    ],
    metrics=[F1(), Precission(), Recall()]
).splitter(
    KFoldSplitter(n_splits=5, shuffle=True, random_state=42)
)

pipeline.fit(X_train, y_train)
results = pipeline.evaluate(X_test, y_test, pipeline.predict(X_test))

Parallel Processing

from labchain import LocalThreadPipeline
from labchain.plugins.filters import Filter1, Filter2, Filter3

# Process filters in parallel
pipeline = LocalThreadPipeline(
    filters=[
        Filter1(),  # Runs in parallel
        Filter2(),  # Runs in parallel
        Filter3()   # Runs in parallel
    ]
)

# Results are concatenated automatically
predictions = pipeline.predict(X)

Custom Components

from labchain import Container
from labchain.base import BaseFilter, XYData

@Container.bind()
class MyCustomFilter(BaseFilter):
    def __init__(self, threshold: float = 0.5):
        super().__init__(threshold=threshold)

    def fit(self, x: XYData, y: XYData = None):
        # Your training logic
        pass

    def predict(self, x: XYData) -> XYData:
        # Your prediction logic
        return XYData.mock(x.value > self.threshold)

# Use it like any other filter

pipeline = F3Pipeline(filters=[MyCustomFilter(threshold=0.7)])

Version Control & Rollback

# Version 1
@Container.bind(persist=True)
class MyModel(BaseFilter):
    def predict(self, x):
        return x * 1

Container.ppif.push_all()
hash_v1 = Container.pcm.get_class_hash(MyModel)

# Version 2
@Container.bind(persist=True)
class MyModel(BaseFilter):
    def predict(self, x):
        return x * 2

Container.ppif.push_all()
hash_v2 = Container.pcm.get_class_hash(MyModel)

# Rollback to V1
ModelV1 = Container.ppif.get_version("MyModel", hash_v1)

📖 Documentation

Resource	Description
📘 Quick Start Guide	Get up and running in 5 minutes
🎓 Tutorials	Step-by-step guides and examples
📚 API Reference	Complete API documentation
⚡ Remote Injection	Deploy without code (experimental)
🏗️ Architecture	Deep dive into design principles
💡 Best Practices	Production-ready patterns

🛠️ Supported Components

Filters

✅ Classification (SVM, KNN, Random Forest, etc.)
✅ Clustering (KMeans, DBSCAN, etc.)
✅ Transformation (PCA, StandardScaler, etc.)
✅ Text Processing (TF-IDF, Embeddings, etc.)
✅ Custom filters (extend BaseFilter)

Pipelines

✅ F3Pipeline: Sequential execution
✅ MonoPipeline: Parallel execution
✅ HPCPipeline: Spark-based distribution

Optimizers

✅ Optuna: Bayesian optimization
✅ Weights & Biases: Experiment tracking
✅ Grid Search: Exhaustive search
✅ Sklearn: Scikit-learn integration

Storage

✅ Local Storage: Filesystem caching
✅ S3 Storage: Cloud-native storage
✅ Custom backends: Extend BaseStorage

🚦 Roadmap

🤝 Contributing

We ❤️ contributions! Here's how you can help:

Ways to Contribute

🐛 Report bugs by opening an issue
💡 Suggest features in discussions
📝 Improve documentation
🔧 Submit pull requests
⭐ Star the repo to show support

Development Setup

# Clone the repository
git clone https://github.com/manucouto1/LabChain.git
cd LabChain

# Install dependencies
pip install -r requirements.txt

# Run tests
pytest tests/

# Build documentation
cd docs && mkdocs serve

Guidelines

Follow PEP 8 style guide
Add tests for new features
Update documentation
Keep commits atomic and well-described

📊 Community & Support

🐛 Issue Tracker - Report bugs and request features
📧 Email - Contact the maintainers
📖 Documentation - Comprehensive guides

📜 License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.

What this means:

✅ Use LabChain for free in your projects
✅ Modify and distribute the code
⚠️ If you modify and distribute LabChain, you must release your changes under AGPL-3.0
⚠️ If you use LabChain in a network service, you must make the source available

⬆ back to top

Made with ☕ and Python

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/workflows		.github/workflows
docs		docs
framework3		framework3
labchain		labchain
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yaml		environment.yaml
git_flow.py		git_flow.py
main.py		main.py
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 LabChain

The Modern ML Experimentation Framework

🎯 What is LabChain?

✨ Why LabChain?

🚀 Quick Start

Installation

Your First Pipeline (2 minutes)

💡 Key Features

🏗️ Modular Architecture

🔄 Smart Caching

📊 Hyperparameter Optimization

⚡ Remote Injection (Experimental)

🌐 Distributed Processing (Experimental)

📚 Examples

📖 Documentation

🛠️ Supported Components

Filters

Pipelines

Optimizers

Storage

🚦 Roadmap

🤝 Contributing

Ways to Contribute

Development Setup

Guidelines

📊 Community & Support

📜 License

What this means:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

manucouto1/LabChain

Folders and files

Latest commit

History

Repository files navigation

🔬 LabChain

The Modern ML Experimentation Framework

🎯 What is LabChain?

✨ Why LabChain?

🚀 Quick Start

Installation

Your First Pipeline (2 minutes)

💡 Key Features

🏗️ Modular Architecture

🔄 Smart Caching

📊 Hyperparameter Optimization

⚡ Remote Injection (Experimental)

🌐 Distributed Processing (Experimental)

📚 Examples

📖 Documentation

🛠️ Supported Components

Filters

Pipelines

Optimizers

Storage

🚦 Roadmap

🤝 Contributing

Ways to Contribute

Development Setup

Guidelines

📊 Community & Support

📜 License

What this means:

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages