Skip to content

Conversation

@louis-jan
Copy link
Contributor

MLX Integration Architecture

Overview

This PR adds native MLX inference support for Jan, enabling Apple Silicon Macs to run MLX-optimized models (safetensors format) with Metal GPU acceleration. The implementation provides an OpenAI-compatible API via a Swift server, enabling seamless integration with the existing Jan frontend. Future optimization would come soon.

Screen.Recording.2026-02-04.at.22.55.12.mov

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                           JAN DESKTOP APP                                    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                       FRONTEND (React/TypeScript)                    │    │
│  │  ┌──────────────────────┐  ┌─────────────────────────────────────┐  │    │
│  │  │  MlxModelDownload    │  │           Chat Interface            │  │    │
│  │  │     Action.tsx       │  │         (useChat, useThreads)       │  │    │
│  │  │                     │  │                                       │  │    │
│  │  │ - Browse HuggingFace │  │ - Send chat completions requests     │  │    │
│  │  │ - Download MLX model │  │ - Handle streaming responses         │  │    │
│  │  │ - Detect vision cap  │  │ - Abort controller support           │  │    │
│  │  └──────────────────────┘  └─────────────────────────────────────┘  │    │
│  │           │                           │                                │    │
│  │           ▼                           ▼                                │    │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │    │
│  │  │              ModelFactory / DefaultModelService                  │ │    │
│  │  │   - Model import/export  - Capability detection (vision/tools)  │ │    │
│  │  └─────────────────────────────────────────────────────────────────┘ │    │
│  └─────────────────────────────┬───────────────────────────────────────┘    │
│                                │                                            │
│                                │ Tauri IPC (invoke)                         │
│                                ▼                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    MLX EXTENSION (TypeScript)                        │    │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │    │
│  │  │                    mlx_extension.ts                              │ │    │
│  │  │                                                                 │ │    │
│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │ │    │
│  │  │  │   load()    │  │   chat()    │  │      import()           │  │ │    │
│  │  │  │             │  │             │  │                         │  │ │    │
│  │  │  │ - Get model │  │ - Find session│ │ - Download from URL    │  │ │    │
│  │  │  │ - Call Tauri│  │ - HTTP POST │  │ - Write model.yml       │  │ │    │
│  │  │  │   plugin    │  │   to server │  │ - Detect capabilities   │  │ │    │
│  │  │  └─────────────┘  └─────────────┘  └─────────────────────────┘  │ │    │
│  │  └──────────────────────────────────────────────────────────────────┘ │    │
│  │                                │                                      │    │
│  │                                │ @janhq/tauri-plugin-mlx-api          │    │
│  │                                ▼                                      │    │
│  │  ┌─────────────────────────────────────────────────────────────────────┐ │
│  │  │               TAURI PLUGIN MLX (Rust)                               │ │
│  │  │   src-tauri/plugins/tauri-plugin-mlx/src/commands.rs                │ │
│  │  │                                                                     │ │
│  │  │  ┌────────────────┐  ┌────────────────┐  ┌──────────────────────┐   │ │
│  │  │  │ loadMlxModel() │  │unloadMlxModel()│  │ isMlxProcessRunning()│   │ │
│  │  │  │                │  │                │  │                      │   │ │
│  │  │  │ - Spawn Swift │  │ - Terminate    │  │ - Health check       │   │ │
│  │  │  │   binary      │  │   process      │  │ - Verify alive       │   │ │
│  │  │  │ - Parse config│  │ - Cleanup      │  │                      │   │ │
│  │  │  │ - Wait for    │  │ - Update state │  │                      │   │ │
│  │  │  │   ready signal│  │                │  │                      │   │ │
│  │  │  └────────────────┘  └────────────────┘  └──────────────────────┘   │ │
│  │  └─────────────────────────────────────────────────────────────────────┘ │
│  │                                                                             │
│  └─────────────────────────────┬─────────────────────────────────────────────┘
│                                │
│                                │ spawn child process
│                                ▼
│  ┌─────────────────────────────────────────────────────────────────────┐
│  │                  MLX SERVER (Swift)                                  │
│  │  mlx-server/Sources/MLXServer/                                       │
│  │                                                                     │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  │                      Server.swift                               │ │
│  │  │  - Hummingbird HTTP framework                                   │ │
│  │  │  - Routes: /health, /metrics, /v1/models, /v1/chat/completions │ │
│  │  └─────────────────────────────────────────────────────────────────┘ │
│  │                              │                                        │
│  │              ┌───────────────┴───────────────┐                       │
│  │              ▼                               ▼                       │
│  │  ┌────────────────────────┐    ┌────────────────────────┐           │
│  │  │    BatchScheduler      │    │   ModelRunner (actor)  │           │
│  │  │                        │    │                        │           │
│  │  │  - BatchProcessor      │    │  - Load MLX/MLXLLM     │           │
│  │  │  - Continuous batching │    │  - Load VLM models     │           │
│  │  │  - Request queuing     │    │  - generateStream()    │           │
│  │  │  - KV cache opt        │    │  - Prompt cache        │           │
│  │  │  - Prefix caching      │    │  - Warmup              │           │
│  │  └────────────────────────┘    └────────────────────────┘           │
│  │                              │                                        │
│  │                              ▼                                        │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  │                     MLX Libraries                                │ │
│  │  │         MLX (Tensor ops)  +  MLXLLM  +  MLXVLM  +  MLXLMCommon  │ │
│  │  │                                                                 │ │
│  │  │              Metal GPU Acceleration on Apple Silicon             │ │
│  │  └─────────────────────────────────────────────────────────────────┘ │
│  │                                                                     │
│  └─────────────────────────────────────────────────────────────────────┘
│                                                                             │
│                         macOS (Apple Silicon only)                          │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow

1. Model Download

Hub UI → MlxModelDownloadAction → DefaultModelService.fetchHuggingFaceRepo()
→ download-extension → Save to ~/Jan/mlx/models/{modelId}/
→ write_yaml(model.yml) → Detect capabilities (vision/tools)

2. Model Load

Chat → mlx_extension.load() → loadMlxModel() Tauri command
→ Spawn mlx-server Swift binary → Wait for "/health" ready
→ Return SessionInfo (port, pid, api_key)

3. Inference

Chat request → mlx_extension.chat()
→ HTTP POST localhost:{port}/v1/chat/completions
→ Server.swift routes to ModelRunner.generateStream()
→ Streaming SSE response → Frontend renders tokens

4. Model Unload

mlx_extension.unload() → unloadMlxModel() Tauri command
→ Terminate Swift process → Clean up resources

Features Added

  • OpenAI-compatible API at /v1/chat/completions
  • Streaming SSE responses for real-time token generation
  • Vision model support via MLXVLM (Qwen2-VL, etc.)
  • Tool calling support with function calling
  • Prompt cache for repeated system prompts
  • Metrics endpoint at /metrics for monitoring

Copilot AI review requested due to automatic review settings February 4, 2026 15:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive MLX inference support for Apple Silicon Macs, enabling native GPU-accelerated model execution using the MLX framework. The implementation provides an OpenAI-compatible API via a Swift server and integrates seamlessly with Jan's existing architecture.

Changes:

  • Adds new MLX Swift server with OpenAI-compatible API, streaming support, and batch processing
  • Implements Rust Tauri plugin for process management and session handling
  • Creates TypeScript extension for frontend integration with model loading/unloading
  • Updates UI components to support MLX provider alongside existing llamacpp
  • Adds prompt caching, vision model support, and tool calling capabilities

Reviewed changes

Copilot reviewed 70 out of 75 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
mlx-server/Sources/MLXServer/* Swift HTTP server with OpenAI-compatible endpoints, batch processing, and model runner
src-tauri/plugins/tauri-plugin-mlx/* Rust plugin for MLX process management, session tracking, and cleanup
extensions/mlx-extension/src/index.ts TypeScript extension implementing AIEngine interface for MLX backend
web-app/src/services/models/* Model service updates to support safetensors files and MLX provider
web-app/src/routes/settings/providers/$providerName.tsx Provider settings UI extended for MLX support
web-app/src/lib/model-factory.ts Model factory updated with MLX model creation and reasoning middleware
src-tauri/src/core/server/proxy.rs Proxy server updated to route requests to both llamacpp and MLX sessions
Makefile Build scripts for compiling and signing MLX server binary
package.json Build tasks for MLX server compilation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 4, 2026 16:11
@louis-jan louis-jan force-pushed the feat/support-mlx-backend branch from ca77995 to 53495a0 Compare February 4, 2026 16:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 70 out of 75 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Barecheck - Code coverage report

Total: 25.18%

Your code coverage diff: -0.22% ▾

Uncovered files and lines
FileLines
web-app/src/containers/MlxModelDownloadAction.tsx1-8, 10-14, 16-19, 21, 23, 25-26, 30-31, 33-43, 45-47, 49-50, 53-59, 62-78, 80-91, 93-94, 96, 98-100, 102-104, 107-108, 110-122, 124-126, 129-134, 138-140, 142-144, 146, 150-155, 157-160, 162-171, 173-181, 183-188, 190-191, 193-198, 200-201, 203, 205, 208-210
web-app/src/containers/TokenSpeedIndicator.tsx1-4, 23-24, 26-31, 34-38, 40-46, 48, 51-53, 55-57, 60, 63, 65, 67-76, 78, 80-81, 83
web-app/src/containers/dialogs/DeleteModel.tsx1-2, 12-13, 15, 17-20, 27-35, 37, 39, 41-49, 51-69, 72-78, 81, 83-84, 86-88, 90-105, 107-120, 122
web-app/src/containers/dialogs/ImportMlxModelDialog.tsx1, 9-13, 17, 25-34, 36-50, 52-53, 56-59, 61-64, 66-70, 72-75, 79-82, 85-87, 89-94, 96, 98-99, 102-105, 107-111, 113-115, 118-131, 133-136, 138-145, 147-149, 151-157, 159-160, 162-163, 166-167, 169, 171-172, 174-182, 184-185, 188-190, 192-193, 195-196, 198-212, 214-216, 218-224, 226, 228, 231-234, 236-241, 243, 245-251, 253-257, 259-264, 266
web-app/src/hooks/use-chat.ts1, 5, 15-17, 30-43, 46-47, 49-51, 54-62, 64-68, 71, 74-78, 81-82, 84-89, 91, 93-97, 99-105, 107-111, 114-118, 121-122, 125-127, 130-145, 147-151
web-app/src/hooks/useThreads.ts46-47, 49-53, 68, 72, 77, 123, 125-129, 201-203, 221-222, 225-228, 230-238, 240-241, 244-247, 250-253, 256-264, 266-273, 278-283, 287-289, 304-333, 335, 338, 341, 343-349, 351-372, 374-391, 419-421, 424-427, 430-431, 434, 436-448, 450-452, 454-458, 460, 462-473
web-app/src/lib/custom-chat-transport.ts2, 13-16, 46-55, 57-60, 62, 64-66, 68-70, 78-85, 88-89, 97-101, 103, 106-110, 113-116, 119, 121-124, 126, 128-143, 146-148, 151, 153-167, 169-170, 175-177, 179-180, 188, 191, 194-201, 205-217, 220-222, 225-226, 229, 231-238, 240, 242-243, 245-247, 249-253, 256-257, 262-264, 267-268, 271-276, 278-291, 293-295, 299-310, 312-313, 316-323, 325, 327, 330, 333-334, 341-344, 349-351, 353-360, 362-375, 377-380
web-app/src/lib/model-factory.ts65-77, 79, 81-102, 121, 124, 151-153, 155-158, 160-169, 172-175, 177-179, 181-188, 190-195, 197-204, 211-213, 215-218, 220-229, 232-235, 237-239, 241-248, 250-254, 256-263, 301-304
web-app/src/lib/utils.ts15-16, 19-23, 26-27, 68, 72, 76, 78, 80, 82, 84, 90, 99, 103, 111, 156-158, 196-197, 230-239
web-app/src/routes/hub/index.tsx2-7, 15-19, 26-27, 32-33, 39, 41-50, 56-61, 63-66, 68-75, 77-79, 81-87, 89-99, 101-104, 106-111, 114-121, 123-125, 128, 130-135, 137-138, 140-141, 143-148, 150-162, 164-174, 177-181, 183-185, 187-193, 195-198, 200-226, 228-231, 233-236, 238, 240-243, 245-259, 261-263, 266-268, 271-274, 276, 278-281, 283-296, 299-305, 307-308, 310-320, 322-326, 328-335, 337-345, 347, 349-357, 359-362, 364-389, 391-400, 402-418, 420-434, 437-443, 446-455, 457-461, 463-472, 474-481, 483-486, 488-489, 492-501, 503-507, 509-556, 558-573, 575-584, 586-590, 592-596, 598-632, 634-650, 652-653, 655-662, 664-666, 668, 671-672, 674-679, 681, 683-686, 688, 690, 692, 694-695, 697-701, 703-707, 709
web-app/src/routes/settings/providers/$providerName.tsx2-22, 28-37, 40-42, 44-48, 50-66, 69-77, 79-82, 84, 86, 89-94, 96-98, 100-106, 111-112, 114-121, 123-124, 126-138, 140-141, 143, 145-151, 154-163, 166-170, 172-174, 177-178, 180, 182-187, 190-191, 193-196, 201-207, 209-213, 216-222, 225-228, 230, 232-236, 238-262, 264, 266-268, 270, 273-279, 281-284, 286, 288-291, 293-301, 303-305, 307-312, 314-320, 322-324, 326-327, 329-338, 340, 344, 347, 350-351, 353-355, 358-369, 371-388, 390-397, 400-401, 403-410, 412-418, 422-423, 426, 429-431, 433-444, 447-451, 453-456, 459-460, 463, 465-469, 471-480, 482, 485-491, 493, 496-505, 507-512, 514-520, 522-524, 526-538, 540-551, 553-570, 572-585, 587, 589-590, 592, 594-595, 598-611, 613-617, 619-622, 624-626, 628-635, 637-644, 646, 648-659, 661, 663-664, 667-677, 679-682, 684-691, 693-702, 704-718, 720-721, 723-728, 730-736, 738, 740, 742, 744, 746, 748, 750-757, 759-761, 764-774, 776-781, 783, 785-790, 792
web-app/src/routes/threads/$threadId.tsx1-3, 5-9, 11-20, 25, 27-28, 32-33, 39-40, 44-54, 56-59, 62-64, 66-76, 78, 81-85, 88-89, 92, 95-106, 109, 112-116, 120-123, 125-127, 130-145, 147, 150, 152, 154, 160-171, 174-177, 179-185, 188-189, 192-193, 196-197, 199-201, 203-204, 207-209, 211, 213-220, 222, 225, 227-239, 241-244, 246-260, 262-272, 275-277, 279-290, 293-295, 298-301, 304-317, 319-322, 324-329, 331-338, 341, 344-349, 351-356, 358, 361, 363-370, 372-379, 381, 384-388, 390-395, 398, 401-405, 407, 409-410, 412-413, 415, 418-422, 424, 427-436, 439-442, 445-458, 461-467, 469-471, 473, 475-481, 484, 487-492, 494-502, 504-508, 511-524, 527-528, 530, 532, 534, 536, 538-539, 542-544, 549-555, 558-566, 571-572, 575, 577-579, 581-582, 585-586, 588-594, 597, 600-606, 610-611, 614-619, 621, 623, 626-635, 638-647, 650, 653-656, 659-669, 672-674, 677-683, 686-687, 689-691, 693-696, 698, 701-703, 705-717, 719-720, 722-724, 726, 728-737, 739, 741, 743-750, 752-755, 757-771, 773-780, 782-795, 797, 799-803, 805-808, 811-820, 822
web-app/src/services/models/default.ts37-38, 84-86, 145, 147-151, 176, 178-183, 234-243, 247-249, 251-252, 254, 256, 258, 260-266, 269-289, 291-292, 295-304, 375, 387-388, 390-391, 394-395, 399-401, 403-404, 407-408, 411-414, 416-417, 420, 422-442, 445-450, 452-462, 464, 466-476, 478-492, 494-500, 503-504, 507-514, 540-541, 545-547, 550-562, 565-570, 578, 580-586, 588-593, 596-616, 619-623, 645, 647-648, 650, 659, 661, 663-665, 667, 669-683, 685-691, 693-698, 700-702, 704-713, 715-722, 725-731

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants