🎥 Fine-Tuned World Model (V-JEPA2 Video Classification) in Azure ML

A major challenge for modern AI is learning to understand the world and act largely by observation. While the field of LLMs has advanced tremendously, robots and agents acting in the physical world still face major challenges in, for example, physics understanding. Meta has developed models capable of understanding, predicting, and planning in the physical world—such as V-JEPA2—based on video and robotics data. You can find more details here.

This repo offers a production‑ready Azure Machine Learning pipeline for fine‑tuning and deploying Facebook’s V-JEPA2 (Video Joint‑Embedding Predictive Architecture) model for custom video classification tasks.

🎯 Overview

This repository provides an end‑to‑end solution for video classification using the state‑of‑the‑art V-JEPA2 model on Azure Machine Learning. It automates the entire ML lifecycle—from model fine‑tuning to deployment—with built‑in permission management and a user‑friendly Gradio interface.

What is V-JEPA2?

V-JEPA2 is Facebook’s advanced video understanding model that uses a joint‑embedding predictive architecture to learn powerful video representations. This project makes it easy to:

Fine‑tune V-JEPA2 on your custom video dataset
Deploy the model as a scalable Azure ML endpoint
Access the model through a web interface or REST API

✨ Features

🚀 Automated ML Pipeline
- End‑to‑end automation: from data ingestion to model deployment
- MLflow integration: comprehensive experiment tracking and model versioning
🎨 User Interfaces with three options
- Gradio Web App: interactive interface for video classification only
- REST API: programmatic access for integration
- Agentic AI Chat Interface: Using Azure AI Agent Service to deploy video classification and RAG chatbot
📊 Production Features
- Model versioning: track and manage multiple model versions
- Monitoring: built‑in logging and performance tracking

📋 Prerequisites

Azure Subscription with appropriate permissions
Azure CLI installed and authenticated (az login)
Python 3.12 with Conda/Miniconda
Git for cloning the repository
AzureML workspace with attached storage account (user-based credential) and sufficient quota for 2 NC24A100

Required Azure Permissions

Your user account needs:

Contributor or Owner role on the resource group
Ability to grant permissions programmatically

🚀 Quick Start

1. Clone & Setup

# Clone the repository
git clone <repository-url>
cd vjepa2_aml

# Create environment file from template
cp .env_sample config/.env

2. Configure Azure Resources

# Edit config/.env with your Azure configuration:
AZURE_SUBSCRIPTION_ID="your-subscription-id"
AZURE_RESOURCE_GROUP="your-resource-group"
AZURE_WORKSPACE_NAME="your-ml-workspace"

# Training parameters (optional — defaults provided)
NUM_EPOCHS=2
BATCH_SIZE=1
LEARNING_RATE=1e-5

3. Create Local Environment

# Create and activate the environment
python -m venv my-env
.\my-env\Scripts\Activate.ps1
pip install -r requirements.txt

4. Run the Pipeline

python pipelines/vjepa2_finetune_aml_job.py

This single command will:

Setup compute resources
Grant necessary permissions
Create environments
Fine‑tune the model
Register the model
Deploy to an endpoint

5. Launch the Web Interface

python apps/setup_and_run_gradio.py

Alternatively, you can also access the model via the REST API from test.py.

6. Integrate everything into an Azure AI Agent Service

Deploy a basic setup of an AI Foundry Project see here. Download the VJEPA-2 paper from here and place it in a examples folder. Add the Azure AI Agent Service Parameters in your env and execute the agent/run_agent_system.py you will get an agent that can chat both about the VJEPA-2 model as well as predict video classes when you upload a video. You can see your agent in Azure AI Foundry.

Project structure

vjepa2_aml/
├── agent/                      
│   ├── agent_backend.py         # Backend
│   └── agent_frontend.py # Frontend 
│   └── run_agent_system.py # Setup
├── apps/                      
│   ├── gradio_app.py         # Frontend
│   └── setup_and_run_gradio.py # Frontend setup
├── config/                    
│   ├── .env                  # Your Azure config (create from .env_sample)
│   └── config.py             # Configuration loader
├── environments/              
│   ├── vjepa2_conda_finetune.yml 
│   └── vjepa2_conda_inference.yml
├── pipelines/                 
│   └── vjepa2_finetune_aml_job.py # Main pipeline to run
├── src/
│   ├── deployment/           
│   │   ├── deploy.py
│   │   ├── register.py
│   │   └── score.py
│   ├── training/             
│   │   └── run_vjepa2_finetune.py
│   └── utils/                
│       └── setup_permission.py
├── tests/                    
│   └── test.py               # Quick test of deployment
├── .env_sample               
├── .gitignore
├── requirements.txt          # Python dependencies local
└── README.md

Sources

Thanks to Meta for publishing this new model here and this Jupyter Notebook here for giving a great idea about a fine-tuning use case and dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎥 Fine-Tuned World Model (V-JEPA2 Video Classification) in Azure ML

📋 Table of Contents

🎯 Overview

What is V-JEPA2?

✨ Features

📋 Prerequisites

Required Azure Permissions

🚀 Quick Start

1. Clone & Setup

2. Configure Azure Resources

3. Create Local Environment

4. Run the Pipeline

5. Launch the Web Interface

6. Integrate everything into an Azure AI Agent Service

Project structure

Sources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent		agent
apps		apps
config		config
environments		environments
pipelines		pipelines
src		src
tests		tests
.env_sample		.env_sample
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎥 Fine-Tuned World Model (V-JEPA2 Video Classification) in Azure ML

📋 Table of Contents

🎯 Overview

What is V-JEPA2?

✨ Features

📋 Prerequisites

Required Azure Permissions

🚀 Quick Start

1. Clone & Setup

2. Configure Azure Resources

3. Create Local Environment

4. Run the Pipeline

5. Launch the Web Interface

6. Integrate everything into an Azure AI Agent Service

Project structure

Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages