Skip to content

A framework for building, experimenting with, and evaluating LLM chains (agents) in a systematic and reproducible way.

License

Notifications You must be signed in to change notification settings

manucouto1/LabChain

πŸ”¬ LabChain

The Modern ML Experimentation Framework

test_on_push Python 3.11+ License: AGPL-3.0 PyPI version Documentation

Build, experiment, and deploy ML pipelines with confidence

Documentation β€’ Quick Start β€’ Examples β€’ Contributing


🎯 What is LabChain?

LabChain is a production-ready ML experimentation framework that combines the flexibility of research with the rigor of production deployment. Stop fighting with boilerplate code and focus on what matters: your models.

✨ Why LabChain?

🧩 Modular by Design

  • Compose pipelines from reusable filters
  • Plug-and-play architecture
  • No vendor lock-in

πŸš€ Production Ready

  • Automatic caching and versioning
  • Distributed processing support
  • Cloud-native storage backends

πŸ”„ Reproducible

  • Version-controlled experiments
  • Deterministic pipelines
  • Full audit trails

⚑ Experimental Features

  • Remote code injection
  • Zero-deployment pipelines
  • Automatic dependency management

πŸš€ Quick Start

Installation

pip install framework3

Your First Pipeline (2 minutes)

from labchain import Container, F3Pipeline
from labchain.plugins.filters import StandardScalerPlugin, KnnFilter
from labchain.plugins.metrics import F1, Precission, Recall
from labchain.base import XYData
from sklearn.datasets import load_iris

# Load data
iris = load_iris()
X = XYData.mock(iris.data)
y = XYData.mock(iris.target)

# Build pipeline
pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        KnnFilter(n_neighbors=5)
    ],
    metrics=[F1("weighted"), Precission("weighted"), Recall("weighted")]
)

# Train and evaluate
pipeline.fit(X, y)
predictions = pipeline.predict(X)
results = pipeline.evaluate(X, y, predictions)

print(results)
# {'F1': 0.95, 'Precision': 0.95, 'Recall': 0.95}

That's it! πŸŽ‰ You just built, trained, and evaluated an ML pipeline.


πŸ’‘ Key Features

πŸ—οΈ Modular Architecture

# Mix and match components like LEGO blocks
from labchain.plugins.filters import (
    PCAPlugin,
    StandardScalerPlugin,
    ClassifierSVMPlugin
)

pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        PCAPlugin(n_components=2),
        ClassifierSVMPlugin(kernel='rbf')
    ]
)

πŸ”„ Smart Caching

from labchain.plugins.filters import Cached

# Cache expensive operations automatically
pipeline = F3Pipeline(
    filters=[
        Cached(
            filter=ExpensivePreprocessor(),
            cache_data=True,
            cache_filter=True
        ),
        MyModel()
    ]
)

πŸ“Š Hyperparameter Optimization

from labchain import WandbOptimizer

# Optimize with Weights & Biases
optimizer = WandbOptimizer(
    project="my-experiment",
    scorer=F1(),
    method="bayes",
    n_trials=50
)

# Define search space
pipeline = F3Pipeline(
    filters=[
        KnnFilter().grid({
            'n_neighbors': [3, 5, 7, 9]
        })
    ]
)

optimizer.optimize(pipeline)
optimizer.fit(X_train, y_train)

⚑ Remote Injection (Experimental)

Deploy pipelines without deploying code:

# On your laptop
@Container.bind(persist=True)
class MyCustomFilter(BaseFilter):
    def predict(self, x):
        return x * 2

Container.storage = S3Storage(bucket="my-models")
Container.ppif.push_all()

# On production server (no source code needed!)
from labchain.base import BasePlugin

pipeline = BasePlugin.build_from_dump(config, Container.ppif)
predictions = pipeline.predict(data)  # Just works! ✨

🌐 Distributed Processing (Experimental)

from labchain import HPCPipeline

# Automatic Spark distribution
pipeline = HPCPipeline(
    app_name="distributed-training",
    filters=[Filter1(), Filter2(), Filter3()]
)

pipeline.fit(large_dataset)

πŸ“š Examples

Classification with Cross-Validation
from labchain import F3Pipeline, KFoldSplitter
from labchain.plugins.filters import StandardScalerPlugin, ClassifierSVMPlugin
from labchain.plugins.metrics import F1, Precission, Recall

pipeline = F3Pipeline(
    filters=[
        StandardScalerPlugin(),
        ClassifierSVMPlugin(kernel='rbf', C=1.0)
    ],
    metrics=[F1(), Precission(), Recall()]
).splitter(
    KFoldSplitter(n_splits=5, shuffle=True, random_state=42)
)

pipeline.fit(X_train, y_train)
results = pipeline.evaluate(X_test, y_test, pipeline.predict(X_test))
Parallel Processing
from labchain import LocalThreadPipeline
from labchain.plugins.filters import Filter1, Filter2, Filter3

# Process filters in parallel
pipeline = LocalThreadPipeline(
    filters=[
        Filter1(),  # Runs in parallel
        Filter2(),  # Runs in parallel
        Filter3()   # Runs in parallel
    ]
)

# Results are concatenated automatically
predictions = pipeline.predict(X)
Custom Components
from labchain import Container
from labchain.base import BaseFilter, XYData

@Container.bind()
class MyCustomFilter(BaseFilter):
    def __init__(self, threshold: float = 0.5):
        super().__init__(threshold=threshold)

    def fit(self, x: XYData, y: XYData = None):
        # Your training logic
        pass

    def predict(self, x: XYData) -> XYData:
        # Your prediction logic
        return XYData.mock(x.value > self.threshold)

# Use it like any other filter

pipeline = F3Pipeline(filters=[MyCustomFilter(threshold=0.7)])
Version Control & Rollback
# Version 1
@Container.bind(persist=True)
class MyModel(BaseFilter):
    def predict(self, x):
        return x * 1

Container.ppif.push_all()
hash_v1 = Container.pcm.get_class_hash(MyModel)

# Version 2
@Container.bind(persist=True)
class MyModel(BaseFilter):
    def predict(self, x):
        return x * 2

Container.ppif.push_all()
hash_v2 = Container.pcm.get_class_hash(MyModel)

# Rollback to V1
ModelV1 = Container.ppif.get_version("MyModel", hash_v1)

πŸ“– Documentation

Resource Description
πŸ“˜ Quick Start Guide Get up and running in 5 minutes
πŸŽ“ Tutorials Step-by-step guides and examples
πŸ“š API Reference Complete API documentation
⚑ Remote Injection Deploy without code (experimental)
πŸ—οΈ Architecture Deep dive into design principles
πŸ’‘ Best Practices Production-ready patterns

πŸ› οΈ Supported Components

Filters

  • βœ… Classification (SVM, KNN, Random Forest, etc.)
  • βœ… Clustering (KMeans, DBSCAN, etc.)
  • βœ… Transformation (PCA, StandardScaler, etc.)
  • βœ… Text Processing (TF-IDF, Embeddings, etc.)
  • βœ… Custom filters (extend BaseFilter)

Pipelines

  • βœ… F3Pipeline: Sequential execution
  • βœ… MonoPipeline: Parallel execution
  • βœ… HPCPipeline: Spark-based distribution

Optimizers

  • βœ… Optuna: Bayesian optimization
  • βœ… Weights & Biases: Experiment tracking
  • βœ… Grid Search: Exhaustive search
  • βœ… Sklearn: Scikit-learn integration

Storage

  • βœ… Local Storage: Filesystem caching
  • βœ… S3 Storage: Cloud-native storage
  • βœ… Custom backends: Extend BaseStorage

🚦 Roadmap

  • Core pipeline functionality
  • Automatic caching system
  • Hyperparameter optimization
  • Distributed processing (Spark)
  • Remote injection (experimental)
  • Multi-cloud storage backends (GCS, Azure)
  • Real-time inference API
  • AutoML capabilities
  • Model registry integration
  • Kubernetes deployment templates

🀝 Contributing

We ❀️ contributions! Here's how you can help:

Ways to Contribute

  • πŸ› Report bugs by opening an issue
  • πŸ’‘ Suggest features in discussions
  • πŸ“ Improve documentation
  • πŸ”§ Submit pull requests
  • ⭐ Star the repo to show support

Development Setup

# Clone the repository
git clone https://github.com/manucouto1/LabChain.git
cd LabChain

# Install dependencies
pip install -r requirements.txt

# Run tests
pytest tests/

# Build documentation
cd docs && mkdocs serve

Guidelines

  • Follow PEP 8 style guide
  • Add tests for new features
  • Update documentation
  • Keep commits atomic and well-described

πŸ“Š Community & Support

GitHub issues GitHub pull requests GitHub stars


πŸ“œ License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.

What this means:

  • βœ… Use LabChain for free in your projects
  • βœ… Modify and distribute the code
  • ⚠️ If you modify and distribute LabChain, you must release your changes under AGPL-3.0
  • ⚠️ If you use LabChain in a network service, you must make the source available

⬆ back to top

Made with β˜• and Python

About

A framework for building, experimenting with, and evaluating LLM chains (agents) in a systematic and reproducible way.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages