Skip to content

pohlchri/aml_vjepa2_finetune_MLOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎥 Fine-Tuned World Model (V-JEPA2 Video Classification) in Azure ML

A major challenge for modern AI is learning to understand the world and act largely by observation. While the field of LLMs has advanced tremendously, robots and agents acting in the physical world still face major challenges in, for example, physics understanding. Meta has developed models capable of understanding, predicting, and planning in the physical world—such as V-JEPA2—based on video and robotics data. You can find more details here.

This repo offers a production‑ready Azure Machine Learning pipeline for fine‑tuning and deploying Facebook’s V-JEPA2 (Video Joint‑Embedding Predictive Architecture) model for custom video classification tasks.


📋 Table of Contents

  1. Overview
  2. Features
  3. Prerequisites
  4. Quick Start
  5. Project Structure
  6. Credits

🎯 Overview

This repository provides an end‑to‑end solution for video classification using the state‑of‑the‑art V-JEPA2 model on Azure Machine Learning. It automates the entire ML lifecycle—from model fine‑tuning to deployment—with built‑in permission management and a user‑friendly Gradio interface.

What is V-JEPA2?

V-JEPA2 is Facebook’s advanced video understanding model that uses a joint‑embedding predictive architecture to learn powerful video representations. This project makes it easy to:

  • Fine‑tune V-JEPA2 on your custom video dataset
  • Deploy the model as a scalable Azure ML endpoint
  • Access the model through a web interface or REST API

✨ Features

  • 🚀 Automated ML Pipeline

    • End‑to‑end automation: from data ingestion to model deployment
    • MLflow integration: comprehensive experiment tracking and model versioning
  • 🎨 User Interfaces with three options

    • Gradio Web App: interactive interface for video classification only
    • REST API: programmatic access for integration
    • Agentic AI Chat Interface: Using Azure AI Agent Service to deploy video classification and RAG chatbot
  • 📊 Production Features

    • Model versioning: track and manage multiple model versions
    • Monitoring: built‑in logging and performance tracking

📋 Prerequisites

  • Azure Subscription with appropriate permissions
  • Azure CLI installed and authenticated (az login)
  • Python 3.12 with Conda/Miniconda
  • Git for cloning the repository
  • AzureML workspace with attached storage account (user-based credential) and sufficient quota for 2 NC24A100

Required Azure Permissions

Your user account needs:

  • Contributor or Owner role on the resource group
  • Ability to grant permissions programmatically

🚀 Quick Start

1. Clone & Setup

# Clone the repository
git clone <repository-url>
cd vjepa2_aml

# Create environment file from template
cp .env_sample config/.env

2. Configure Azure Resources

# Edit config/.env with your Azure configuration:
AZURE_SUBSCRIPTION_ID="your-subscription-id"
AZURE_RESOURCE_GROUP="your-resource-group"
AZURE_WORKSPACE_NAME="your-ml-workspace"

# Training parameters (optional — defaults provided)
NUM_EPOCHS=2
BATCH_SIZE=1
LEARNING_RATE=1e-5

3. Create Local Environment

# Create and activate the environment
python -m venv my-env
.\my-env\Scripts\Activate.ps1
pip install -r requirements.txt

4. Run the Pipeline

python pipelines/vjepa2_finetune_aml_job.py

This single command will:

  • Setup compute resources

  • Grant necessary permissions

  • Create environments

  • Fine‑tune the model

  • Register the model

  • Deploy to an endpoint

5. Launch the Web Interface

python apps/setup_and_run_gradio.py

Alternatively, you can also access the model via the REST API from test.py.

6. Integrate everything into an Azure AI Agent Service

Deploy a basic setup of an AI Foundry Project see here. Download the VJEPA-2 paper from here and place it in a examples folder. Add the Azure AI Agent Service Parameters in your env and execute the agent/run_agent_system.py you will get an agent that can chat both about the VJEPA-2 model as well as predict video classes when you upload a video. You can see your agent in Azure AI Foundry.

Project structure

vjepa2_aml/
├── agent/                      
│   ├── agent_backend.py         # Backend
│   └── agent_frontend.py # Frontend 
│   └── run_agent_system.py # Setup
├── apps/                      
│   ├── gradio_app.py         # Frontend
│   └── setup_and_run_gradio.py # Frontend setup
├── config/                    
│   ├── .env                  # Your Azure config (create from .env_sample)
│   └── config.py             # Configuration loader
├── environments/              
│   ├── vjepa2_conda_finetune.yml 
│   └── vjepa2_conda_inference.yml
├── pipelines/                 
│   └── vjepa2_finetune_aml_job.py # Main pipeline to run
├── src/
│   ├── deployment/           
│   │   ├── deploy.py
│   │   ├── register.py
│   │   └── score.py
│   ├── training/             
│   │   └── run_vjepa2_finetune.py
│   └── utils/                
│       └── setup_permission.py
├── tests/                    
│   └── test.py               # Quick test of deployment
├── .env_sample               
├── .gitignore
├── requirements.txt          # Python dependencies local
└── README.md

Sources

Thanks to Meta for publishing this new model here and this Jupyter Notebook here for giving a great idea about a fine-tuning use case and dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages