PiXLLaVA: Interleaved object-centric Vision-Language Alignment for improving MLLMs

Add .env file:

WANDB_API_KEY=YOUR_WANDB_API_KEY
HF_TOKEN=YOUR_HUGGING_FACE_TOKEN # to upload trained models to HuggingFace Hub

Steps to run:

# install packages and load in editable mode
pip install -e .

# download data (96 GB)
python download_data.py pretrain_data
python download_data.py finetune_data # takes 1-2 hours

# init base model
bash scripts/pixllava/get_base_model.sh

# pretrain
bash scripts/pixllava/pretrain.sh

# finetune
bash scripts/pixllava/finetune.sh

Or run the script bash run.sh to run all the scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
docs		docs
models		models
pixl		pixl
scripts		scripts
README.md		README.md
download_data.py		download_data.py
preprocess_data.py		preprocess_data.py
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PiXLLaVA: Interleaved object-centric Vision-Language Alignment for improving MLLMs

Add .env file:

Steps to run:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

hunarbatra/PiXLLaVA

Folders and files

Latest commit

History

Repository files navigation

PiXLLaVA: Interleaved object-centric Vision-Language Alignment for improving MLLMs

Add .env file:

Steps to run:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages