feat: add new backend - MLX #7459

louis-jan · 2026-02-04T15:56:27Z

MLX Integration Architecture

Overview

This PR adds native MLX inference support for Jan, enabling Apple Silicon Macs to run MLX-optimized models (safetensors format) with Metal GPU acceleration. The implementation provides an OpenAI-compatible API via a Swift server, enabling seamless integration with the existing Jan frontend. Future optimization would come soon.

Screen.Recording.2026-02-04.at.22.55.12.mov

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                           JAN DESKTOP APP                                    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                       FRONTEND (React/TypeScript)                    │    │
│  │  ┌──────────────────────┐  ┌─────────────────────────────────────┐  │    │
│  │  │  MlxModelDownload    │  │           Chat Interface            │  │    │
│  │  │     Action.tsx       │  │         (useChat, useThreads)       │  │    │
│  │  │                     │  │                                       │  │    │
│  │  │ - Browse HuggingFace │  │ - Send chat completions requests     │  │    │
│  │  │ - Download MLX model │  │ - Handle streaming responses         │  │    │
│  │  │ - Detect vision cap  │  │ - Abort controller support           │  │    │
│  │  └──────────────────────┘  └─────────────────────────────────────┘  │    │
│  │           │                           │                                │    │
│  │           ▼                           ▼                                │    │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │    │
│  │  │              ModelFactory / DefaultModelService                  │ │    │
│  │  │   - Model import/export  - Capability detection (vision/tools)  │ │    │
│  │  └─────────────────────────────────────────────────────────────────┘ │    │
│  └─────────────────────────────┬───────────────────────────────────────┘    │
│                                │                                            │
│                                │ Tauri IPC (invoke)                         │
│                                ▼                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    MLX EXTENSION (TypeScript)                        │    │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │    │
│  │  │                    mlx_extension.ts                              │ │    │
│  │  │                                                                 │ │    │
│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │ │    │
│  │  │  │   load()    │  │   chat()    │  │      import()           │  │ │    │
│  │  │  │             │  │             │  │                         │  │ │    │
│  │  │  │ - Get model │  │ - Find session│ │ - Download from URL    │  │ │    │
│  │  │  │ - Call Tauri│  │ - HTTP POST │  │ - Write model.yml       │  │ │    │
│  │  │  │   plugin    │  │   to server │  │ - Detect capabilities   │  │ │    │
│  │  │  └─────────────┘  └─────────────┘  └─────────────────────────┘  │ │    │
│  │  └──────────────────────────────────────────────────────────────────┘ │    │
│  │                                │                                      │    │
│  │                                │ @janhq/tauri-plugin-mlx-api          │    │
│  │                                ▼                                      │    │
│  │  ┌─────────────────────────────────────────────────────────────────────┐ │
│  │  │               TAURI PLUGIN MLX (Rust)                               │ │
│  │  │   src-tauri/plugins/tauri-plugin-mlx/src/commands.rs                │ │
│  │  │                                                                     │ │
│  │  │  ┌────────────────┐  ┌────────────────┐  ┌──────────────────────┐   │ │
│  │  │  │ loadMlxModel() │  │unloadMlxModel()│  │ isMlxProcessRunning()│   │ │
│  │  │  │                │  │                │  │                      │   │ │
│  │  │  │ - Spawn Swift │  │ - Terminate    │  │ - Health check       │   │ │
│  │  │  │   binary      │  │   process      │  │ - Verify alive       │   │ │
│  │  │  │ - Parse config│  │ - Cleanup      │  │                      │   │ │
│  │  │  │ - Wait for    │  │ - Update state │  │                      │   │ │
│  │  │  │   ready signal│  │                │  │                      │   │ │
│  │  │  └────────────────┘  └────────────────┘  └──────────────────────┘   │ │
│  │  └─────────────────────────────────────────────────────────────────────┘ │
│  │                                                                             │
│  └─────────────────────────────┬─────────────────────────────────────────────┘
│                                │
│                                │ spawn child process
│                                ▼
│  ┌─────────────────────────────────────────────────────────────────────┐
│  │                  MLX SERVER (Swift)                                  │
│  │  mlx-server/Sources/MLXServer/                                       │
│  │                                                                     │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  │                      Server.swift                               │ │
│  │  │  - Hummingbird HTTP framework                                   │ │
│  │  │  - Routes: /health, /metrics, /v1/models, /v1/chat/completions │ │
│  │  └─────────────────────────────────────────────────────────────────┘ │
│  │                              │                                        │
│  │              ┌───────────────┴───────────────┐                       │
│  │              ▼                               ▼                       │
│  │  ┌────────────────────────┐    ┌────────────────────────┐           │
│  │  │    BatchScheduler      │    │   ModelRunner (actor)  │           │
│  │  │                        │    │                        │           │
│  │  │  - BatchProcessor      │    │  - Load MLX/MLXLLM     │           │
│  │  │  - Continuous batching │    │  - Load VLM models     │           │
│  │  │  - Request queuing     │    │  - generateStream()    │           │
│  │  │  - KV cache opt        │    │  - Prompt cache        │           │
│  │  │  - Prefix caching      │    │  - Warmup              │           │
│  │  └────────────────────────┘    └────────────────────────┘           │
│  │                              │                                        │
│  │                              ▼                                        │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  │                     MLX Libraries                                │ │
│  │  │         MLX (Tensor ops)  +  MLXLLM  +  MLXVLM  +  MLXLMCommon  │ │
│  │  │                                                                 │ │
│  │  │              Metal GPU Acceleration on Apple Silicon             │ │
│  │  └─────────────────────────────────────────────────────────────────┘ │
│  │                                                                     │
│  └─────────────────────────────────────────────────────────────────────┘
│                                                                             │
│                         macOS (Apple Silicon only)                          │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow

1. Model Download

Hub UI → MlxModelDownloadAction → DefaultModelService.fetchHuggingFaceRepo()
→ download-extension → Save to ~/Jan/mlx/models/{modelId}/
→ write_yaml(model.yml) → Detect capabilities (vision/tools)

2. Model Load

Chat → mlx_extension.load() → loadMlxModel() Tauri command
→ Spawn mlx-server Swift binary → Wait for "/health" ready
→ Return SessionInfo (port, pid, api_key)

3. Inference

Chat request → mlx_extension.chat()
→ HTTP POST localhost:{port}/v1/chat/completions
→ Server.swift routes to ModelRunner.generateStream()
→ Streaming SSE response → Frontend renders tokens

4. Model Unload

mlx_extension.unload() → unloadMlxModel() Tauri command
→ Terminate Swift process → Clean up resources

Features Added

OpenAI-compatible API at /v1/chat/completions
Streaming SSE responses for real-time token generation
Vision model support via MLXVLM (Qwen2-VL, etc.)
Tool calling support with function calling
Prompt cache for repeated system prompts
Metrics endpoint at /metrics for monitoring

# Conflicts: # Makefile # web-app/src/routes/settings/providers/$providerName.tsx

fix: notarize mlx bin

Copilot

Pull request overview

This PR adds comprehensive MLX inference support for Apple Silicon Macs, enabling native GPU-accelerated model execution using the MLX framework. The implementation provides an OpenAI-compatible API via a Swift server and integrates seamlessly with Jan's existing architecture.

Changes:

Adds new MLX Swift server with OpenAI-compatible API, streaming support, and batch processing
Implements Rust Tauri plugin for process management and session handling
Creates TypeScript extension for frontend integration with model loading/unloading
Updates UI components to support MLX provider alongside existing llamacpp
Adds prompt caching, vision model support, and tool calling capabilities

Reviewed changes

Copilot reviewed 70 out of 75 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
mlx-server/Sources/MLXServer/*	Swift HTTP server with OpenAI-compatible endpoints, batch processing, and model runner
src-tauri/plugins/tauri-plugin-mlx/*	Rust plugin for MLX process management, session tracking, and cleanup
extensions/mlx-extension/src/index.ts	TypeScript extension implementing AIEngine interface for MLX backend
web-app/src/services/models/*	Model service updates to support safetensors files and MLX provider
web-app/src/routes/settings/providers/$providerName.tsx	Provider settings UI extended for MLX support
web-app/src/lib/model-factory.ts	Model factory updated with MLX model creation and reasoning middleware
src-tauri/src/core/server/proxy.rs	Proxy server updated to route requests to both llamacpp and MLX sessions
Makefile	Build scripts for compiling and signing MLX server binary
package.json	Build tasks for MLX server compilation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

extensions/mlx-extension/src/index.ts

mlx-server/scripts/performance_bench.py

mlx-server/scripts/simple_bench.py

mlx-server/scripts/performance_bench.py

mlx-server/scripts/simple_bench.py

mlx-server/scripts/benchmark.py

Copilot

Pull request overview

Copilot reviewed 70 out of 75 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

extensions/mlx-extension/src/index.ts

mlx-server/scripts/simple_bench.py

mlx-server/scripts/benchmark.py

github-actions · 2026-02-04T16:32:39Z

Barecheck - Code coverage report

Total: 25.18%

Your code coverage diff: -0.22% ▾

Uncovered files and lines

File	Lines
web-app/src/containers/MlxModelDownloadAction.tsx	1-8, 10-14, 16-19, 21, 23, 25-26, 30-31, 33-43, 45-47, 49-50, 53-59, 62-78, 80-91, 93-94, 96, 98-100, 102-104, 107-108, 110-122, 124-126, 129-134, 138-140, 142-144, 146, 150-155, 157-160, 162-171, 173-181, 183-188, 190-191, 193-198, 200-201, 203, 205, 208-210
web-app/src/containers/TokenSpeedIndicator.tsx	1-4, 23-24, 26-31, 34-38, 40-46, 48, 51-53, 55-57, 60, 63, 65, 67-76, 78, 80-81, 83
web-app/src/containers/dialogs/DeleteModel.tsx	1-2, 12-13, 15, 17-20, 27-35, 37, 39, 41-49, 51-69, 72-78, 81, 83-84, 86-88, 90-105, 107-120, 122
web-app/src/containers/dialogs/ImportMlxModelDialog.tsx	1, 9-13, 17, 25-34, 36-50, 52-53, 56-59, 61-64, 66-70, 72-75, 79-82, 85-87, 89-94, 96, 98-99, 102-105, 107-111, 113-115, 118-131, 133-136, 138-145, 147-149, 151-157, 159-160, 162-163, 166-167, 169, 171-172, 174-182, 184-185, 188-190, 192-193, 195-196, 198-212, 214-216, 218-224, 226, 228, 231-234, 236-241, 243, 245-251, 253-257, 259-264, 266
web-app/src/hooks/use-chat.ts	1, 5, 15-17, 30-43, 46-47, 49-51, 54-62, 64-68, 71, 74-78, 81-82, 84-89, 91, 93-97, 99-105, 107-111, 114-118, 121-122, 125-127, 130-145, 147-151
web-app/src/hooks/useThreads.ts	46-47, 49-53, 68, 72, 77, 123, 125-129, 201-203, 221-222, 225-228, 230-238, 240-241, 244-247, 250-253, 256-264, 266-273, 278-283, 287-289, 304-333, 335, 338, 341, 343-349, 351-372, 374-391, 419-421, 424-427, 430-431, 434, 436-448, 450-452, 454-458, 460, 462-473
web-app/src/lib/custom-chat-transport.ts	2, 13-16, 46-55, 57-60, 62, 64-66, 68-70, 78-85, 88-89, 97-101, 103, 106-110, 113-116, 119, 121-124, 126, 128-143, 146-148, 151, 153-167, 169-170, 175-177, 179-180, 188, 191, 194-201, 205-217, 220-222, 225-226, 229, 231-238, 240, 242-243, 245-247, 249-253, 256-257, 262-264, 267-268, 271-276, 278-291, 293-295, 299-310, 312-313, 316-323, 325, 327, 330, 333-334, 341-344, 349-351, 353-360, 362-375, 377-380
web-app/src/lib/model-factory.ts	65-77, 79, 81-102, 121, 124, 151-153, 155-158, 160-169, 172-175, 177-179, 181-188, 190-195, 197-204, 211-213, 215-218, 220-229, 232-235, 237-239, 241-248, 250-254, 256-263, 301-304
web-app/src/lib/utils.ts	15-16, 19-23, 26-27, 68, 72, 76, 78, 80, 82, 84, 90, 99, 103, 111, 156-158, 196-197, 230-239
web-app/src/routes/hub/index.tsx	2-7, 15-19, 26-27, 32-33, 39, 41-50, 56-61, 63-66, 68-75, 77-79, 81-87, 89-99, 101-104, 106-111, 114-121, 123-125, 128, 130-135, 137-138, 140-141, 143-148, 150-162, 164-174, 177-181, 183-185, 187-193, 195-198, 200-226, 228-231, 233-236, 238, 240-243, 245-259, 261-263, 266-268, 271-274, 276, 278-281, 283-296, 299-305, 307-308, 310-320, 322-326, 328-335, 337-345, 347, 349-357, 359-362, 364-389, 391-400, 402-418, 420-434, 437-443, 446-455, 457-461, 463-472, 474-481, 483-486, 488-489, 492-501, 503-507, 509-556, 558-573, 575-584, 586-590, 592-596, 598-632, 634-650, 652-653, 655-662, 664-666, 668, 671-672, 674-679, 681, 683-686, 688, 690, 692, 694-695, 697-701, 703-707, 709
web-app/src/routes/settings/providers/$providerName.tsx	2-22, 28-37, 40-42, 44-48, 50-66, 69-77, 79-82, 84, 86, 89-94, 96-98, 100-106, 111-112, 114-121, 123-124, 126-138, 140-141, 143, 145-151, 154-163, 166-170, 172-174, 177-178, 180, 182-187, 190-191, 193-196, 201-207, 209-213, 216-222, 225-228, 230, 232-236, 238-262, 264, 266-268, 270, 273-279, 281-284, 286, 288-291, 293-301, 303-305, 307-312, 314-320, 322-324, 326-327, 329-338, 340, 344, 347, 350-351, 353-355, 358-369, 371-388, 390-397, 400-401, 403-410, 412-418, 422-423, 426, 429-431, 433-444, 447-451, 453-456, 459-460, 463, 465-469, 471-480, 482, 485-491, 493, 496-505, 507-512, 514-520, 522-524, 526-538, 540-551, 553-570, 572-585, 587, 589-590, 592, 594-595, 598-611, 613-617, 619-622, 624-626, 628-635, 637-644, 646, 648-659, 661, 663-664, 667-677, 679-682, 684-691, 693-702, 704-718, 720-721, 723-728, 730-736, 738, 740, 742, 744, 746, 748, 750-757, 759-761, 764-774, 776-781, 783, 785-790, 792
web-app/src/routes/threads/$threadId.tsx	1-3, 5-9, 11-20, 25, 27-28, 32-33, 39-40, 44-54, 56-59, 62-64, 66-76, 78, 81-85, 88-89, 92, 95-106, 109, 112-116, 120-123, 125-127, 130-145, 147, 150, 152, 154, 160-171, 174-177, 179-185, 188-189, 192-193, 196-197, 199-201, 203-204, 207-209, 211, 213-220, 222, 225, 227-239, 241-244, 246-260, 262-272, 275-277, 279-290, 293-295, 298-301, 304-317, 319-322, 324-329, 331-338, 341, 344-349, 351-356, 358, 361, 363-370, 372-379, 381, 384-388, 390-395, 398, 401-405, 407, 409-410, 412-413, 415, 418-422, 424, 427-436, 439-442, 445-458, 461-467, 469-471, 473, 475-481, 484, 487-492, 494-502, 504-508, 511-524, 527-528, 530, 532, 534, 536, 538-539, 542-544, 549-555, 558-566, 571-572, 575, 577-579, 581-582, 585-586, 588-594, 597, 600-606, 610-611, 614-619, 621, 623, 626-635, 638-647, 650, 653-656, 659-669, 672-674, 677-683, 686-687, 689-691, 693-696, 698, 701-703, 705-717, 719-720, 722-724, 726, 728-737, 739, 741, 743-750, 752-755, 757-771, 773-780, 782-795, 797, 799-803, 805-808, 811-820, 822
web-app/src/services/models/default.ts	37-38, 84-86, 145, 147-151, 176, 178-183, 234-243, 247-249, 251-252, 254, 256, 258, 260-266, 269-289, 291-292, 295-304, 375, 387-388, 390-391, 394-395, 399-401, 403-404, 407-408, 411-414, 416-417, 420, 422-442, 445-450, 452-462, 464, 466-476, 478-492, 494-500, 503-504, 507-514, 540-541, 545-547, 550-562, 565-570, 578, 580-586, 588-593, 596-616, 619-623, 645, 647-648, 650, 659, 661, 663-665, 667, 669-683, 685-691, 693-698, 700-702, 704-713, 715-722, 725-731

louis-jan and others added 13 commits February 4, 2026 09:27

feat: support mlx plugin

b16b519

# Conflicts: # Makefile # web-app/src/routes/settings/providers/$providerName.tsx

feat: add prompt cache and fix binary bundle

023e5de

feat: vision support

1482565

feat: detect vision capability while importing model

d01ec56

fix: prompt cache

2816aea

feat: support mlx model download from hub

0d95371

reactor: clean up settings

b501b40

fix: add build step for darwin

1a43a87

feat: add local api server support for mlx model

cb7a35a

fix: notarize mlx bin

9ed3007

fix: simplify token speed counter

5ed0bed

Merge pull request #7458 from janhq/fix/notarize-mlx-bin

5d6b497

fix: notarize mlx bin

chore: clean up

ab80405

Copilot AI review requested due to automatic review settings February 4, 2026 15:56

github-project-automation bot added this to Jan Feb 4, 2026

github-actions bot assigned louis-jan Feb 4, 2026

Copilot started reviewing on behalf of louis-jan February 4, 2026 15:56 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

fix: linter

53495a0

Copilot AI review requested due to automatic review settings February 4, 2026 16:11

louis-jan force-pushed the feat/support-mlx-backend branch from ca77995 to 53495a0 Compare February 4, 2026 16:11

Copilot started reviewing on behalf of louis-jan February 4, 2026 16:11 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

extensions/mlx-extension/src/index.ts Show resolved Hide resolved

mlx-server/scripts/simple_bench.py Show resolved Hide resolved

mlx-server/scripts/benchmark.py Show resolved Hide resolved

fix: test

a70aeee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add new backend - MLX #7459

feat: add new backend - MLX #7459

Uh oh!

louis-jan commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add new backend - MLX #7459

Are you sure you want to change the base?

feat: add new backend - MLX #7459

Uh oh!

Conversation

louis-jan commented Feb 4, 2026

MLX Integration Architecture

Overview

Architecture Diagram

Data Flow

1. Model Download

2. Model Load

3. Inference

4. Model Unload

Features Added

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 4, 2026

Barecheck - Code coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants