This week, we are experimenting with multi-model transformer/LLM models.
We will try using both a self-built/trained decoder, and a fine-tuned Qwen base model; paired with a contrastive visual transformer to create image embedding tokens.
- Install the git lfs extension before cloning this repository
- Install the uv package manager
Then install dependencies with:
uv sync --all-packages --devRun the following, with an optional --model "model_name" parameter
uv run -m model.start_trainuv run streamlit run streamlit/app.py- Add positional encoding
- Offset the output