This repository provides a minimal, reproducible example of how to use ClearML to build machine learning pipelines, track experiments, and manage datasets using both task-based pipelines and function-based pipelines.
AI-Studio-ClearML/
├── .github/workflows/
│ └── pipeline.yaml # CI/CD: runs s1 on PR to main
│
├── model_artifacts/ # Example outputs or saved models
├── work_dataset/ # Dataset samples (Iris.csv)
│
│── ─── Demo 1: Basic Pipeline (s1 → s3) ───
├── s1_dataset_artifact.py # Step 1: Upload dataset as pickle artifact
├── s2_data_preprocessing.py # Step 2: Preprocess (artifact API)
├── s3_train_model.py # Step 3: Train model (hardcoded params)
├── pipeline_from_tasks.py # 3-step pipeline orchestrator
│
│── ─── Demo 2: HPO Pipeline (s1 → final model) ───
├── hpo_s1_dataset_artifact.py # Step 1: Upload dataset (ClearML Dataset API)
├── hpo_s2_process_dataset.py # Step 2: Preprocess (ClearML Dataset API)
├── hpo_s3_train_model.py # Step 3: Train model (parameterized for HPO)
├── task_hpo.py # Step 4: Hyperparameter optimization
├── final_model.py # Step 5: Train final model with best params
├── pipeline_hpo.py # 5-step pipeline orchestrator
│
│── ─── Shared ───
├── main.py # Entry point (runs pipeline_from_tasks)
├── requirements.txt # Pinned Python dependencies
├── AI-Studio-Agent.ipynb # Start/stop ClearML Agent daemon
├── AI-Studio-ClearML.ipynb # End-to-end demo notebook
├── AI-Studio-ClearML_HPO_ZOE.ipynb # HPO demo notebook (Colab)
└── ClearML_Pipeline_Demo.ipynb # Task-based pipeline demo notebook
- ✅ Task-based pipeline using
PipelineController.add_step(...) - ✅ Hyperparameter Optimization (HPO) with ClearML
HyperParameterOptimizer - ✅ Final model retraining with best HPO parameters
- ✅ CI/CD pipeline via GitHub Actions
- [TBD] Function-based pipeline using
PipelineController.add_function_step(...) - ✅ Reusable ClearML Task templates
- ✅ Dataset and model artifact management with ClearML
- ✅ End-to-end ML workflow: Dataset → Preprocessing → Training → HPO → Final Model
- ✅ Fully compatible with ClearML Hosted and ClearML Server
pip install -r requirements.txtSet up ClearML by running:
clearml-initYou will be prompted to enter:
- ClearML Credential
Use https://app.clear.ml to register for a free account if needed.
- Install the ClearML agent on your machine or server.
pip install clearml-agentA simple pipeline demonstrating ClearML task-based pipelines with dataset artifacts.
Before running the pipeline, execute the following scripts once to create reusable ClearML Tasks:
Note: When running for the first time, comment out
task.execute_remotely()in each .py file to successfully create a task template.
# Step 1: Upload dataset
python s1_dataset_artifact.py
# Step 2: Preprocess dataset
python s2_data_preprocessing.py
# Step 3: Train model
python s3_train_model.pyThese will appear in your ClearML dashboard and serve as base tasks for the pipeline.
Create Queue with name as basic_demo (or your customized one), ensure it is consistent in pipeline_from_tasks.py:
pipe.set_default_execution_queue("basic_demo")Run the agent for queue worker:
clearml-agent daemon --queue "basic_demo" --detachedOnce all base tasks are registered, run the pipeline:
python main.py # Executes run_pipeline() from pipeline_from_tasks.pyAn advanced pipeline that adds hyperparameter optimization and final model retraining. Uses the ClearML Dataset API for more robust data management.
Note: When running for the first time, comment out
task.execute_remotely()in each .py file to successfully create a task template.
# Step 1: Upload dataset (ClearML Dataset API)
python hpo_s1_dataset_artifact.py
# Step 2: Preprocess dataset (ClearML Dataset API)
python hpo_s2_process_dataset.py
# Step 3: Train model (parameterized for HPO)
python hpo_s3_train_model.py
# Step 4: Hyperparameter optimization
python task_hpo.py
# Step 5: Final model with best parameters
python final_model.pyCreate Queue with name as hpo_demo (or your customized one), ensure it is consistent in pipeline_hpo.py:
EXECUTION_QUEUE = "hpo_demo"Run the agent for queue worker:
clearml-agent daemon --queue "hpo_demo" --detachedpython pipeline_hpo.pyThis version demonstrates using add_function_step(...) to wrap Python logic as pipeline steps.
The repository includes a GitHub Actions workflow (.github/workflows/pipeline.yaml) that:
- Triggers on pull requests to
main - Sets up Python 3.10 and installs dependencies
- Verifies ClearML connectivity using GitHub Secrets
- Runs
s1_dataset_artifact.pyas a smoke test
Required GitHub Secrets:
CLEARML_API_ACCESS_KEYCLEARML_API_SECRET_KEYCLEARML_API_HOST
This project is developed and maintained by:
- Jacoo-Zhao (GitHub: @Jacoo-Zhao)
- Zoe Lin (Github: @Zoe Lin)
This project is licensed under the MIT License. See the LICENSE file for details.