Optimize model loading performance by DePasqualeOrg · Pull Request #34 · ml-explore/mlx-swift-lm

DePasqualeOrg · 2025-12-28T02:25:15Z

Parallel loading of tokenizer and weights

Tokenizer loading now runs concurrently with weight loading. For vision models, preprocessor_config.json is also loaded in parallel.

Single `config.json` read

config.json was being read from disk twice during model loading. Now it's read once and the data is reused for both the base config and model-specific config decoding.

Benchmark results

I've added model loading benchmarks in a separate test target that won't run in CI.

To show the total improvement of all my optimizations, I ran the benchmark with #33 in this repo, including pending optimizations #302, #303, and #304 in swift-transformers and the Hub client in offline mode to avoid a network call. After swift-transformers migrates to swift-huggingface for the Hub API, specify a model repo revision instead to avoid the network call:

Model	Before	This PR	Improvement
LLM (Qwen3-0.6B-4bit)	345 ms	297 ms	48 ms
VLM (Qwen2-VL-2B-Instruct-4bit)	465 ms	361 ms	104 ms

These are the results of the improvement in this PR with swift-transformers in its current, unoptimized state:

Model	Before	This PR	Improvement
LLM (Qwen3-0.6B-4bit)	3912 ms	3879 ms	33 ms
VLM (Qwen2-VL-2B-Instruct-4bit)	4509 ms	4480 ms	29 ms

The cumulative result of all my improvements is that model loading time goes from ~3900–4500 ms to ~300–360 ms.

davidkoski · 2025-12-28T21:00:30Z

Great idea! I will give this a look in the first week of Jan when I am back

DePasqualeOrg · 2026-01-06T19:54:13Z

This is now ready for review.

davidkoski

Changes look great, thank you!

DePasqualeOrg force-pushed the optimize-model-loading branch from 01ac099 to 0417a58 Compare December 28, 2025 20:26

DePasqualeOrg force-pushed the optimize-model-loading branch 2 times, most recently from 49d2e79 to 3ea6266 Compare December 29, 2025 09:26

DePasqualeOrg marked this pull request as draft January 6, 2026 13:16

DePasqualeOrg force-pushed the optimize-model-loading branch 2 times, most recently from b8b55f6 to d98fea8 Compare January 6, 2026 19:38

DePasqualeOrg added 2 commits January 6, 2026 20:40

Add model loading benchmarks

aa562fc

Parallelize loading of weights, tokenizer, and processor config

c5725ae

DePasqualeOrg force-pushed the optimize-model-loading branch from d98fea8 to 2b855bb Compare January 6, 2026 19:42

DePasqualeOrg added 2 commits January 6, 2026 20:46

Improve error handling

58dbb8e

Clarify parallelism in comments

12c48ac

DePasqualeOrg force-pushed the optimize-model-loading branch from 2b855bb to 12c48ac Compare January 6, 2026 19:52

DePasqualeOrg marked this pull request as ready for review January 6, 2026 19:53

techcomthanh mentioned this pull request Jan 7, 2026

Unable to run request in parallel on the same model instance Trans-N-ai/swama#41

Open

davidkoski approved these changes Jan 8, 2026

View reviewed changes

davidkoski merged commit 27a2f21 into ml-explore:main Jan 8, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize model loading performance#34

Optimize model loading performance#34
davidkoski merged 4 commits intoml-explore:mainfrom
DePasqualeOrg:optimize-model-loading

DePasqualeOrg commented Dec 28, 2025 •

edited

Loading

Uh oh!

davidkoski commented Dec 28, 2025

Uh oh!

DePasqualeOrg commented Jan 6, 2026

Uh oh!

davidkoski left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DePasqualeOrg commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parallel loading of tokenizer and weights

Single config.json read

Benchmark results

Uh oh!

davidkoski commented Dec 28, 2025

Uh oh!

DePasqualeOrg commented Jan 6, 2026

Uh oh!

davidkoski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DePasqualeOrg commented Dec 28, 2025 •

edited

Loading

Single `config.json` read