Fine-tuned Vision Transformer with AWS Fargate serverless deployment, Application Load Balancing, and automated CI/CD pipeline
Production-grade human pose classification API featuring fine-tuned Vision Transformer (ViT) model, AWS Fargate serverless compute, Application Load Balancer, and automated CI/CD deployment with zero-infrastructure management and automatic scaling capabilities.
- State-of-the-art ViT architecture (google/vit-base-patch16-224-in21k)
- Fine-tuned on human action recognition dataset
- Self-attention mechanism for superior pose understanding
- Transfer learning from ImageNet-21k (14M images)
- Zero server management - fully managed containers
- Pay-per-use pricing (vCPU-seconds + GB-seconds)
- No infrastructure provisioning or maintenance
- Automatic OS patching and security updates
- Intelligent traffic distribution across Fargate tasks
- Health-check based routing to healthy containers
- Multi-AZ deployment for high availability
- Single DNS endpoint for all requests
- ECS Service auto-scaling based on CPU/Memory metrics
- Multi-Availability Zone deployment for fault tolerance
- Rolling deployments with zero downtime
- Automatic task replacement on failures
- Machine Learning: PyTorch, Transformers, Vision Transformer (ViT)
- Backend: FastAPI, Uvicorn
- Infrastructure: Docker, AWS ECS, AWS Fargate
- Load Balancing: AWS Application Load Balancer (ALB)
- Container Registry: Amazon ECR
- DevOps & CI/CD: GitHub Actions, AWS ECS Auto-deployment
- Monitoring: CloudWatch Logs & Metrics
human-pose-classification/
├── fastapi_app/
│ ├── app.py # FastAPI application with prediction endpoints
│ ├── requirements.txt # Python dependencies
│ ├── ml-models/
│ │ └── vit-human-pose-classification/
│ │ ├── config.json # Model configuration
│ │ ├── model.safetensors # Fine-tuned ViT weights
│ │ └── preprocessor_config.json
│ ├── scripts/
│ │ ├── data_model.py # Pydantic response models
│ │ ├── huggingface_load.py # Model loader from Hugging Face
│ │ └── s3.py # S3 model downloader (optional)
│ └── templates/
│ └── index.html # Web interface for image upload
├── ImageClassification.ipynb # Model training notebook
├── Dockerfile # FastAPI container (Port 8000)
└── README.md
- Vision Transformer architecture and self-attention mechanisms
- Transfer learning with pre-trained ViT models
- Fine-tuning transformers for image classification
- AWS Fargate serverless container deployment
- ECS cluster and service orchestration
- Application Load Balancer configuration and target groups
- Security groups and VPC networking in AWS
- CloudWatch monitoring and logging
- CI/CD pipeline automation with GitHub Actions
- Serverless architecture benefits (cost, scalability, zero management)
- Docker optimization for ML workloads
- Production-ready ML API design with FastAPI
- A/B Testing: Deploy multiple model versions with traffic splitting
- Batch Inference: Support batch prediction endpoints
- Prometheus Metrics: Custom metrics export for advanced monitoring
- SageMaker Integration: Continuous model retraining pipeline
| Aspect | EC2 | AWS Fargate |
|---|---|---|
| Server Management | Manual provisioning, patching, scaling | Zero management - fully automated |
| Scaling | Configure auto-scaling groups manually | ECS handles automatically |
| Cost Model | Pay 24/7 for running instances | Pay only for compute time used |
| Deployment | Manage EC2 + Docker separately | Container-native, seamless |
| High Availability | Configure across AZs manually | Built-in multi-AZ deployment |
| Estimated Cost | $50-100/month (always running) | $20-40/month (low-moderate load) |
Harsh Patel
📧 code.by.hp@gmail.com
🔗 GitHub • LinkedIn
⭐ If you find this project helpful, please star it!



