Skip to content

Optimize model loading performance#34

Merged
davidkoski merged 4 commits intoml-explore:mainfrom
DePasqualeOrg:optimize-model-loading
Jan 8, 2026
Merged

Optimize model loading performance#34
davidkoski merged 4 commits intoml-explore:mainfrom
DePasqualeOrg:optimize-model-loading

Conversation

@DePasqualeOrg
Copy link
Copy Markdown
Contributor

@DePasqualeOrg DePasqualeOrg commented Dec 28, 2025

Parallel loading of tokenizer and weights

Tokenizer loading now runs concurrently with weight loading. For vision models, preprocessor_config.json is also loaded in parallel.

Single config.json read

config.json was being read from disk twice during model loading. Now it's read once and the data is reused for both the base config and model-specific config decoding.

Benchmark results

I've added model loading benchmarks in a separate test target that won't run in CI.

To show the total improvement of all my optimizations, I ran the benchmark with #33 in this repo, including pending optimizations #302, #303, and #304 in swift-transformers and the Hub client in offline mode to avoid a network call. After swift-transformers migrates to swift-huggingface for the Hub API, specify a model repo revision instead to avoid the network call:

Model Before This PR Improvement
LLM (Qwen3-0.6B-4bit) 345 ms 297 ms 48 ms
VLM (Qwen2-VL-2B-Instruct-4bit) 465 ms 361 ms 104 ms

These are the results of the improvement in this PR with swift-transformers in its current, unoptimized state:

Model Before This PR Improvement
LLM (Qwen3-0.6B-4bit) 3912 ms 3879 ms 33 ms
VLM (Qwen2-VL-2B-Instruct-4bit) 4509 ms 4480 ms 29 ms

The cumulative result of all my improvements is that model loading time goes from ~3900–4500 ms to ~300–360 ms.

@davidkoski
Copy link
Copy Markdown
Collaborator

Great idea! I will give this a look in the first week of Jan when I am back

@DePasqualeOrg DePasqualeOrg force-pushed the optimize-model-loading branch 2 times, most recently from 49d2e79 to 3ea6266 Compare December 29, 2025 09:26
@DePasqualeOrg DePasqualeOrg marked this pull request as draft January 6, 2026 13:16
@DePasqualeOrg DePasqualeOrg force-pushed the optimize-model-loading branch 2 times, most recently from b8b55f6 to d98fea8 Compare January 6, 2026 19:38
@DePasqualeOrg DePasqualeOrg force-pushed the optimize-model-loading branch from d98fea8 to 2b855bb Compare January 6, 2026 19:42
@DePasqualeOrg DePasqualeOrg force-pushed the optimize-model-loading branch from 2b855bb to 12c48ac Compare January 6, 2026 19:52
@DePasqualeOrg DePasqualeOrg marked this pull request as ready for review January 6, 2026 19:53
@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

This is now ready for review.

Copy link
Copy Markdown
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great, thank you!

@davidkoski davidkoski merged commit 27a2f21 into ml-explore:main Jan 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants