A complete end-to-end MLOps system for real-time anomaly detection using FastAPI, ONNX, Azure ML, AKS, Docker & Kubernetes. Built for scalability, performance, and production-readiness.
Whether you're a data scientist, ML engineer, or DevOps enthusiast, this repository provides a comprehensive guide to building an industrial-grade AI pipeline β from data ingestion and model training to deployment, monitoring, and optimization. It's designed to empower you with the knowledge and tools to operationalize machine learning effectively.
This repository serves as the foundational codebase for many of our popular YouTube videos, covering a wide range of topics in MLOps, Azure ML, FastAPI, Docker, Kubernetes (AKS), ONNX, and more. Each component within this project is meticulously crafted to demonstrate best practices and real-world applications.
Title | Description |
---|---|
Azure ML & AKS MLOps Pipeline Full Course π₯ | Cloud Machine Learning Masterclass | Welcome to the ultimate Azure ML + AKS MLOps Masterclass! π Whether you're a beginner or seasoned ML engineer, this video walks you through building a full production-ready pipeline using Azure tools. |
Azure ML v2 Key Components Explained β Pipelines, Compute, Assets & More π | Master Azure Machine Learning v2 in just 10 minutes! This comprehensive guide breaks down core components essential for building industrial-grade MLOps pipelines. |
π₯ Why 85% of ML Projects FAIL | Build Real Industrial MLOps with Azure ML | Discover why most ML projects never reach production and how you can avoid these pitfalls using real-world MLOps strategies on Azure ML. |
π Industrial MLOps Stack Setup Guide on Windows π» For Beginners & Pros | Ready to build a foundation-grade MLOps environmentβon Windows? Learn to level up your dev setup with this full-stack guide. |
π Deploy Anomaly Detection with FastAPI & Streamlit! MLOps for Beginners | Turn your ML model into a real-time API with a Streamlit dashboard. In this beginner-friendly tutorial, deploy end-to-end anomaly detection using FastAPI and Streamlit. |
Let me know if you want the real YouTube video IDs inserted into the links.
π‘ Want to explore more? Check out our full playlist on YouTube: [Deep Knowledge Space]
AnomaVision AI is a comprehensive solution for detecting anomalies in images using state-of-the-art methods like PaDiM, complemented by custom enhancements for industrial applications. Key aspects of this project include:
- β End-to-End ML Training & Inference: A complete workflow from data preparation to model training and real-time inference.
- π High-Performance Inference Service: Built with FastAPI for low-latency, high-throughput anomaly detection.
- π³ Containerized Deployments: Utilizes Docker with multi-stage builds for lightweight and reproducible environments.
- π Scalable Cloud Deployment: Deploys seamlessly via Kubernetes (AKS) on Azure ML for robust and elastic scaling.
- π§ͺ Automated CI/CD Pipelines: Powered by Azure DevOps YAML for continuous integration and continuous delivery, ensuring rapid and reliable updates.
- π Comprehensive Monitoring & Logging: Integrates with Application Insights, Prometheus, and includes advanced drift detection capabilities.
- πΈ Cost Optimization Strategies: Implements techniques to significantly reduce Azure ML infrastructure costs, demonstrating savings of up to 85%.
- ποΈ Infrastructure as Code (IaC): Azure resources (ACR, AKS, Azure ML Workspace, Key Vault) are provisioned programmatically using Python SDK and Azure CLI.
- π MLflow Integration: Tracks experiments, manages model versions, and facilitates the machine learning lifecycle.
- π Git Submodules Management: Demonstrates best practices for managing complex, multi-repository projects.
This repository is meticulously organized to facilitate understanding and collaboration. Below is a high-level overview of the main directories and their contents:
.
βββ azure_components/ # Scripts and presentations for provisioning Azure resources (Resource Groups, Storage Accounts, ML Workspaces)
βββ data/ # Placeholder for datasets used in training and evaluation
βββ deployment/ # Kubernetes and Azure ML deployment configurations and manifests
βββ devops/ # Azure DevOps pipeline definitions and environment variables
βββ distributions/ # Exported models (ONNX, PyTorch) and associated metadata (e.g., model_info.json)
βββ doc/ # Comprehensive documentation, visual assets (like banner.png), and diagrams
βββ docker/ # Dockerfiles tailored for various purposes: training, inference, and lightweight builds
βββ environment/ # Conda environments and Dockerfiles for setting up development and production environments
βββ integration/ # End-to-end integration tests to validate system functionality
βββ jobs/ # Azure ML Jobs definitions for automated tasks such as data validation and model training
βββ k8s/ # Kubernetes manifests for deploying applications and services to AKS
βββ keyvault/ # Scripts for Azure Key Vault setup and secure management of secrets
βββ load_testing/ # Locust scripts for performance and stress testing of FastAPI applications
βββ logs/ # Directory for application logs
βββ models/ # Stored machine learning models
βββ model_output/ # Output directory for trained models or inference results
βββ monitoring/ # Scripts for Application Insights integration, drift detection, and Prometheus configuration
βββ outputs/ # Output from various pipeline stages (e.g., data validation results)
βββ pipelines/ # Azure DevOps YAML pipelines for CI/CD and infrastructure management
β βββ infra/ # Infrastructure provisioning pipelines
β βββ templates/ # Reusable pipeline templates for common tasks
βββ requirements/ # Python dependency files (requirements.txt, requirements_np.txt)
βββ src/ # Core application source code, including FastAPI, Streamlit, and ML logic
β βββ AnomaVision/ # The main anomaly detection library, containing PaDiM implementation and utilities
β βββ static/ # Shared static assets for web applications
βββ tests/ # Unit tests for individual components and modules
This project leverages a robust stack of technologies to deliver a high-performance and scalable MLOps solution:
Category | Tool |
---|---|
ML Frameworks | PyTorch, ONNX, Scikit-learn |
Inference | FastAPI, Streamlit |
Containerization | Docker, Minikube |
Orchestration | Kubernetes (AKS) |
Cloud Platform | Azure ML, Azure Container Registry (ACR), Azure Key Vault |
CI/CD | Azure DevOps, GitHub Actions |
Monitoring | Application Insights, Prometheus |
Data Validation | Custom validation scripts |
Utilities | Poetry, Python 3.11+, MLflow |
Follow these steps to set up and run the AnomaVision AI MLOps pipeline on your local machine and deploy it to Azure.
Start by cloning the repository, ensuring you also initialize and update the Git submodules, which are crucial for the AnomaVision
library:
git clone --recurse-submodules https://github.com/DeepKnowledge1/industrial_anodet_mlops
cd anomavision
Note: Replace https://github.com/DeepKnowledge1/industrial_anodet_mlops
with the actual URL of your repository.
This project uses poetry
for dependency management. Install the required packages by running:
poetry install
The train.py
script is designed to train the anomaly detection models. It requires specifying the dataset path and the backbone network. Ensure your dataset is accessible (e.g., at D:/01-DATA/bottle
or a similar path on your system).
python src/train.py --dataset_path "path/to/your/dataset" --backbone resnet18
Key Parameters for train.py
:
--dataset_path
: (Required) Path to the dataset folder containingtrain/good
images (e.g.,D:/01-DATA/bottle
).--backbone
: (Optional) Backbone network for feature extraction. Choose betweenresnet18
(default) orwide_resnet50
.--model_data_path
: (Optional) Directory to save model distributions and ONNX file (default:./distributions/
).--output_model
: (Optional) Output folder for model export (default:model_output
).--batch_size
: (Optional) Batch size for training and inference (default:2
).--layer_indices
: (Optional) List of layer indices to extract features from (default:[0]
).--feat_dim
: (Optional) Number of random feature dimensions to keep (default:50
).--mlflow_tracking_uri
: (Optional) MLflow tracking URI (default:file:./mlruns
).--mlflow_experiment_name
: (Optional) MLflow experiment name (default:padim_anomaly_detection
).--run_name
: (Optional) MLflow run name. If not provided, it will be auto-generated.--registered_model_name
: (Optional) Name for the registered model in MLflow Model Registry (default:PadimONNX
).--test_dataset_path
: (Optional) Path to test dataset for evaluation. If not provided, it will usedataset_path
.--evaluate_model
: (Flag) Include this flag to evaluate the model after training.
To run the FastAPI application locally:
uvicorn src/fastapi_app:app --reload --host 0.0.0.0 --port 8080
This will start the FastAPI server, typically accessible at http://localhost:8080
.
To connect the Streamlit frontend with your running FastAPI backend:
streamlit run src/streamlit_app.py --server.port 8501 --server.enableCORS true --server.enableXsrfProtection false
Access the Streamlit application in your browser, usually at http://localhost:8501
.
The score.py
file is critical for model inference within Azure ML deployments. It's essential to test its functionality thoroughly before deployment.
To run the unit tests for score.py
(assuming test_padim.py
is an example of such a test, located in tests/
):
pytest tests/test_padim.py
Note: Ensure all necessary dependencies are installed before running tests.
To containerize your FastAPI application, build the Docker image:
docker build -f docker/Dockerfile.np -t fastapi-anomavision:latest .
Then, tag the image for your container registry (e.g., Docker Hub or Azure Container Registry):
docker tag fastapi-anomavision:latest deepknowledge/fastapi-anomavision:latest
For full deployment to Azure Kubernetes Service (AKS) via Azure ML, follow the instructions in the deployment/
and pipelines/
directories. A typical deployment command would look like:
az ml online-deployment create -n my-deployment --endpoint my-endpoint --file deployment/endpoint-k8s-config.yml
π For detailed setup instructions, including Azure infrastructure provisioning and Azure DevOps pipeline configurations, refer to the
/doc
folder or watch our dedicated [Getting Started Video] on YouTube.
This project emphasizes rigorous testing and validation at every stage of the MLOps pipeline:
- Unit Tests:
pytest tests/
- Integration Tests:
pytest integration/
- Load Testing:
locust -f load_testing/locustfile.py
- Data Validation:
python src/data_validation.py
Effective monitoring and logging are crucial for maintaining healthy ML systems in production:
- Application Logs: Stored in
logs/application.log
- Metrics Collection: Utilizes Prometheus for metrics, with visualization capabilities via Grafana.
- Drift Detection: Automated monitoring for model and data drift using
monitor_drift.py
. - Alerting: Configured with Azure Monitor and Application Insights for proactive notifications.
Automate your MLOps workflow with pre-built Azure DevOps YAML templates, covering the entire lifecycle:
- β Model training and retraining
- π§ͺ Unit and integration testing
- π³ Docker image build and push to ACR
- π Automated deployment to AKS
- π§Ή Cleanup and rollback strategies
Explore the pipeline definitions in devops/azure-pipelines.yml
and pipelines/*.yml
.
Key model artifacts and metadata are stored and managed as follows:
- ONNX Models:
models/padim_model.onnx
- PyTorch Models:
models/padim_model.pt
- Model Information:
distributions/model_info.json
(contains metadata about trained models)
We welcome contributions from the community! Whether it's bug fixes, new features, documentation improvements, or tutorials β your input is valuable. Please feel free to open an issue or submit a pull request.
- Fork the repository.
- Create a new feature branch (
git checkout -b feature/YourFeature
). - Implement your changes, adding tests where applicable.
- Ensure your code adheres to the project's coding standards.
- Submit a pull request with a clear description of your changes.
Stay up-to-date with the latest developments, tutorials, and insights from our team:
- πΊ Subscribe to our [https://www.youtube.com/@DeepKnowledgeSpace]
- π¦ Follow us on [https://x.com/KnowledgeD76945/]
- πΌ Connect with us on [https://www.linkedin.com/in/deep-knowledge/]
If you found this project useful or insightful, please consider showing your support:
- π Star the repository on GitHub.
- π’ Share it with your colleagues and network.
- π¬ Leave feedback or suggest new topics for future development.
This project is licensed under the MIT License β see the LICENSE
file for details.
Empowering ML Engineers to Deliver Production-Ready AI Systems β One Line at a Time. Made with β€οΈ by the Deep Knowledge