-
Project Overview
-
Tech Stack
-
Architecture & Workflow
-
Data Ingestion
-
Data Transformation
-
Model Training & Evaluation
-
Prediction Pipeline
-
FastAPI Deployment
-
MLflow & DagsHub Integration
-
Dockerization
-
CI/CD Pipeline
-
Project Structure
-
References
This project predicts car prices in Pakistan using historical car data. Features include:
-
Company / Brand
-
Car Model
-
Year of Manufacture
-
Kilometers Driven
-
Fuel Type (Petrol / Diesel)
The project implements a complete ML workflow including data ingestion, preprocessing, modeling, deployment, and CI/CD.
-
Backend & ML: Python, Pandas, NumPy, Scikit-learn, XGBoost
-
API & Web: FastAPI, Jinja2 Templates, JavaScript, HTML/CSS
-
Database: MySQL
-
Experiment Tracking: MLflow, DagsHub
-
Containerization: Docker
-
CI/CD: GitHub Actions / GitLab CI
-
Version Control: Git / DagsHub
flowchart TD
A[MySQL Database] --> B[Data Ingestion]
B --> C[Data Transformation & Feature Engineering]
C --> D[Model Training & Evaluation]
D --> E[Prediction Pipeline]
E --> F[FastAPI Backend]
F --> G[Frontend UI]
D --> H[MLflow & DagsHub Tracking]
F --> I[Docker Container]
I --> J[CI/CD Pipeline: GitHub Actions / GitLab CI]-
Data is fetched from MySQL using Python (pymysql).
-
Split data into train and test sets.
Feature Engineering:
-
age = 2025 - year
-
One-hot encode categorical features: company, name, fuel_type
-
Scale numerical features: age, kms_driven
Preprocessor object is saved for later prediction: artifacts/preprocessor.pkl
Train a regression model (XGBoost / RandomForest) using the transformed dataset.
-
Evaluate metrics:
-
RMSE, MAE, R²
-
Save trained model: artifacts/model.pkl
= Log metrics with MLflow.
predict_pipeline.py handles:
-
Input validation using Pydantic
-
Feature engineering (computing age)
-
Transformation using saved preprocessor
-
Prediction using saved model
Example:
pipeline = PredictionPipeline(model_path="artifacts/model.pkl",
preprocessor_path="artifacts/preprocessor.pkl")
prediction = pipeline.predict(input_df)
Endpoints:
-
/ → Homepage with prediction form
-
/predict → POST API for predictions
-
/company → GET all companies
-
/name/{company_name} → GET car models per company
-
Frontend Integration:
-
HTML/CSS/JS form
-
AJAX calls to API endpoints
-
Dynamic display of predicted price
Example request payload:
{
"name": "Civic",
"company": "Honda",
"year": 2018,
"kms_driven": 45000,
"fuel_type": "Petrol"
}-
Track experiments, metrics, and parameters with MLflow.
-
Example:
import mlflow
mlflow.set_tracking_uri("https://dagshub.com/<username>/<repo>.mlflow")
mlflow.log_param("model", "XGBRegressor")
mlflow.log_metric("RMSE", rmse)
mlflow.sklearn.log_model(best_model, "model")Benefits:
-
Model versioning
-
Experiment comparison
-
Collaboration via DagsHub
Dockerfile:
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "src.mlproject.main:app", "--host", "0.0.0.0", "--port", "8000"]
Commands:
docker build -t car-price-predictor .
docker run -d -p 8000:8000 car-price-predictor- Automate build, test, and deployment using GitHub Actions or GitLab CI.
Sample GitHub Action workflow:
name: CI/CD
on:
push:
branches: [main]
jobs:
build-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.10
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest
- name: Build Docker
run: docker build -t username/car-price-predictor .
- name: Push Docker
run: docker push username/car-price-predictorCar-Price-Predictor/ │
├─ src/
│ ├─ mlproject/
│ │ ├─ components/
│ │ │ ├─ data_ingestion.py
│ │ │ ├─ data_transformation.py
│ │ │ ├─ model_trainer.py
│ │ ├─ pipelines/
│ │ │ └─ prediction_pipeline.py
│ │ ├─ utils.py
│ │ ├─ logger.py
│ │ └─ main.py
│
├─ artifacts/
│ ├─ raw_data.csv
│ ├─ train.csv
│ ├─ test.csv
│ ├─ model.pkl
│ └─ preprocessor.pkl
│
├─ templates/
│ └─ index.html
├─ static/
│ └─ style.css
├─ requirements.txt ├─ Dockerfile └─ README.md
-
FastAPI Docs
-
MLflow Docs
-
DagsHub Docs
-
Docker Docs