A major challenge for modern AI is learning to understand the world and act largely by observation. While the field of LLMs has advanced tremendously, robots and agents acting in the physical world still face major challenges in, for example, physics understanding. Meta has developed models capable of understanding, predicting, and planning in the physical world—such as V-JEPA2—based on video and robotics data. You can find more details here.
This repo offers a production‑ready Azure Machine Learning pipeline for fine‑tuning and deploying Facebook’s V-JEPA2 (Video Joint‑Embedding Predictive Architecture) model for custom video classification tasks.
This repository provides an end‑to‑end solution for video classification using the state‑of‑the‑art V-JEPA2 model on Azure Machine Learning. It automates the entire ML lifecycle—from model fine‑tuning to deployment—with built‑in permission management and a user‑friendly Gradio interface.
V-JEPA2 is Facebook’s advanced video understanding model that uses a joint‑embedding predictive architecture to learn powerful video representations. This project makes it easy to:
- Fine‑tune V-JEPA2 on your custom video dataset
- Deploy the model as a scalable Azure ML endpoint
- Access the model through a web interface or REST API
-
🚀 Automated ML Pipeline
- End‑to‑end automation: from data ingestion to model deployment
- MLflow integration: comprehensive experiment tracking and model versioning
-
🎨 User Interfaces with three options
- Gradio Web App: interactive interface for video classification only
- REST API: programmatic access for integration
- Agentic AI Chat Interface: Using Azure AI Agent Service to deploy video classification and RAG chatbot
-
📊 Production Features
- Model versioning: track and manage multiple model versions
- Monitoring: built‑in logging and performance tracking
- Azure Subscription with appropriate permissions
- Azure CLI installed and authenticated (
az login) - Python 3.12 with Conda/Miniconda
- Git for cloning the repository
- AzureML workspace with attached storage account (user-based credential) and sufficient quota for 2 NC24A100
Your user account needs:
- Contributor or Owner role on the resource group
- Ability to grant permissions programmatically
# Clone the repository
git clone <repository-url>
cd vjepa2_aml
# Create environment file from template
cp .env_sample config/.env# Edit config/.env with your Azure configuration:
AZURE_SUBSCRIPTION_ID="your-subscription-id"
AZURE_RESOURCE_GROUP="your-resource-group"
AZURE_WORKSPACE_NAME="your-ml-workspace"
# Training parameters (optional — defaults provided)
NUM_EPOCHS=2
BATCH_SIZE=1
LEARNING_RATE=1e-5# Create and activate the environment
python -m venv my-env
.\my-env\Scripts\Activate.ps1
pip install -r requirements.txtpython pipelines/vjepa2_finetune_aml_job.pyThis single command will:
-
Setup compute resources
-
Grant necessary permissions
-
Create environments
-
Fine‑tune the model
-
Register the model
-
Deploy to an endpoint
python apps/setup_and_run_gradio.pyAlternatively, you can also access the model via the REST API from test.py.
Deploy a basic setup of an AI Foundry Project see here. Download the VJEPA-2 paper from here and place it in a examples folder. Add the Azure AI Agent Service Parameters in your env and execute the agent/run_agent_system.py you will get an agent that can chat both about the VJEPA-2 model as well as predict video classes when you upload a video. You can see your agent in Azure AI Foundry.
vjepa2_aml/
├── agent/
│ ├── agent_backend.py # Backend
│ └── agent_frontend.py # Frontend
│ └── run_agent_system.py # Setup
├── apps/
│ ├── gradio_app.py # Frontend
│ └── setup_and_run_gradio.py # Frontend setup
├── config/
│ ├── .env # Your Azure config (create from .env_sample)
│ └── config.py # Configuration loader
├── environments/
│ ├── vjepa2_conda_finetune.yml
│ └── vjepa2_conda_inference.yml
├── pipelines/
│ └── vjepa2_finetune_aml_job.py # Main pipeline to run
├── src/
│ ├── deployment/
│ │ ├── deploy.py
│ │ ├── register.py
│ │ └── score.py
│ ├── training/
│ │ └── run_vjepa2_finetune.py
│ └── utils/
│ └── setup_permission.py
├── tests/
│ └── test.py # Quick test of deployment
├── .env_sample
├── .gitignore
├── requirements.txt # Python dependencies local
└── README.mdThanks to Meta for publishing this new model here and this Jupyter Notebook here for giving a great idea about a fine-tuning use case and dataset.