This directory contains GitHub Actions workflows for automating the Guardian AI training and deployment pipeline.
Trigger: Pull requests to main branch
Purpose: Validates code changes by running the complete training pipeline
What it does:
- Sets up Python environment
- Installs dependencies
- Validates ClearML connection
- Runs the Guardian training pipeline
- Uploads training artifacts
- Provides pipeline summary
Trigger: Push to main branch or manual dispatch
Purpose: Trains the model and deploys it if it meets quality thresholds
What it does:
- Runs the complete training pipeline (50 trials HPO)
- Extracts best model and performance metrics
- Validates model meets minimum accuracy threshold (75%)
- Deploys model to ClearML serving if approved
- Creates deployment records
- Provides deployment summary
Add these secrets to your GitHub repository:
CLEARML_API_ACCESS_KEY=your_access_key
CLEARML_API_SECRET_KEY=your_secret_key
CLEARML_API_HOST=https://api.clear.ml
How to add secrets:
- Go to your GitHub repository
- Settings → Secrets and variables → Actions
- Click "New repository secret"
- Add each secret with the exact names above
Ensure your clearml.conf file is properly configured and your ClearML account has:
- Access to the Guardian_Training project
- Permissions to create and publish models
- Dataset access for Guardian_Dataset
In this context, model deployment means:
- Model Registration: The trained model is registered in ClearML's model registry
- Performance Validation: Model accuracy is checked against minimum thresholds
- Model Publishing: Model is marked as "published" and tagged for serving
- Metadata Recording: Deployment information is recorded for tracking
- Serving Ready: Model becomes available for inference through ClearML Serving
- Staging: For testing and validation (default)
- Production: For live inference (requires manual approval)
- Minimum accuracy threshold: 75%
- Model must complete all training phases successfully
- All hyperparameter optimization trials must complete
# Create a feature branch
git checkout -b feature/model-improvements
# Make your changes
git add .
git commit -m "Improve model architecture"
git push origin feature/model-improvements
# Create PR to main → triggers training pipeline- Go to Actions tab in GitHub
- Select "Deploy Guardian AI Model"
- Click "Run workflow"
- Choose environment (staging/production)
- Click "Run workflow"
- GitHub Actions: View real-time logs and progress
- ClearML Dashboard: Monitor training metrics and experiments
- Artifacts: Download training plots and model files
-
ClearML Connection Failed
- Check if secrets are properly set
- Verify ClearML API host URL
- Ensure account has proper permissions
-
Pipeline Timeout
- Training with 50 trials can take 2-4 hours
- Timeout is set to 4 hours (240 minutes)
- Consider reducing trials for testing
-
Model Deployment Blocked
- Check if model accuracy meets 75% threshold
- Review training logs for issues
- Verify model was properly saved
-
Artifact Upload Failed
- Check if training generated expected files
- Verify file paths in workflow
- Review storage permissions
- Training Accuracy: Per-epoch training performance
- Validation Accuracy: Model generalization performance
- Test Accuracy: Final model evaluation
- Hyperparameter Performance: HPO trial results
- Training Time: Pipeline execution duration
- GitHub Actions: Workflow execution status and logs
- ClearML: Experiment tracking and model registry
- Parallel Coordinates: Hyperparameter optimization visualization
Edit Guardian_pipeline.py:
total_max_trials: Number of HPO trials (default: 50)epochs: Training epochs per trial- Hyperparameter ranges in the optimizer
Edit guardian-deploy.yml:
MIN_ACCURACY: Minimum accuracy for deployment (default: 75%)timeout-minutes: Maximum workflow duration- Environment options
You can extend the workflows to add:
- Slack notifications
- Email alerts
- Teams messages
- Custom webhooks
GitHub Push/PR
↓
GitHub Actions Runner
↓
Guardian Pipeline
↓
ClearML Experiments
↓
Model Registry
↓
Deployment Validation
↓
ClearML Serving
This setup provides a complete MLOps pipeline with automated training, validation, and deployment capabilities.