Skip to content

Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.

License

Notifications You must be signed in to change notification settings

NexaAI/nexa-sdk

Repository files navigation

Nexa AI Banner

🤝 Trusted by Partners

Qualcomm NVIDIA AMD Intel

Documentation Vote for Next Models X account Join us on Discord Join us on Slack

NexaSDK - Run any AI model on any backend

NexaSDK is an easy-to-use developer toolkit for running any AI model locally — across NPUs, GPUs, and CPUs — powered by our NexaML engine, built entirely from scratch for peak performance on every hardware stack. Unlike wrappers that depend on existing runtimes, NexaML is a unified inference engine built at the kernel level. It’s what lets NexaSDK achieve Day-0 support for new model architectures (LLMs, multimodal, audio, vision). NexaML supports 3 model formats: GGUF, MLX, and Nexa AI's own .nexa format.

⚙️ Differentiation

Features NexaSDK Ollama llama.cpp LM Studio
NPU support ✅ NPU-first
Support any model in GGUF, MLX, NEXA format ✅ Low-level Control ⚠️
Full multimodality support ✅ Image, Audio, Text ⚠️ ⚠️ ⚠️
Cross-platform support ✅ Desktop, Mobile, Automotive, IoT ⚠️ ⚠️ ⚠️
One line of code to run ⚠️
OpenAI-compatible API + Function calling

Legend: ✅ Supported   |   ⚠️ Partial or limited support   |   ❌ No

Recent Wins

Quick Start

Step 1: Download Nexa CLI with one click

macOS

Windows

Linux

For x86_64:

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_x86_64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

For arm64:

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_arm64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

Step 2: Run models with one line of code

You can run any compatible GGUF, MLX, or nexa model from 🤗 Hugging Face by using the nexa infer <full repo name>.

GGUF models

Tip

GGUF runs on macOS, Linux, and Windows on CPU/GPU. Note certain GGUF models are only supported by NexaSDK (e.g. Qwen3-VL-4B and 8B).

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer ggml-org/Qwen3-1.7B-GGUF

🖼️ Run and chat with Multimodal models, e.g. Qwen3-VL-4B:

nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF

MLX models

Tip

MLX is macOS-only (Apple Silicon). Many MLX models in the Hugging Face mlx-community organization have quality issues and may not run reliably. We recommend starting with models from our curated NexaAI Collection for best results. For example

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer NexaAI/Qwen3-4B-4bit-MLX

🖼️ Run and chat with Multimodal models, e.g. Gemma3n:

nexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX

Qualcomm NPU models

Tip

You need to download the arm64 with Qualcomm NPU support and make sure you have Snapdragon® X Elite chip on your laptop.

Quick Start (Windows arm64, Snapdragon X Elite)

  1. Login & Get Access Token (required for Pro Models)

    • Create an account at sdk.nexa.ai
    • Go to Deployment → Create Token
    • Run this once in your terminal (replace with your token):
      nexa config set license '<your_token_here>'
  2. Run and chat with our multimodal model, OmniNeural-4B, or other models on NPU

nexa infer NexaAI/OmniNeural-4B
nexa infer NexaAI/Granite-4-Micro-NPU
nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU

CLI Reference

Essential Command What it does
nexa -h show all CLI commands
nexa pull <repo> Interactive download & cache of a model
nexa infer <repo> Local inference
nexa list Show all cached models with sizes
nexa remove <repo> / nexa clean Delete one / all cached models
nexa serve --host 127.0.0.1:8080 Launch OpenAI‑compatible REST server
nexa run <repo> Chat with a model via an existing server

👉 To interact with multimodal models, you can drag photos or audio clips directly into the CLI — you can even drop multiple images at once!

See CLI Reference for full commands.

Import model from local filesystem

# hf download <model> --local-dir /path/to/modeldir
nexa pull <model> --model-hub localfs --local-path /path/to/modeldir

🎯 You Decide What Model We Support Next

Nexa Wishlist — Request and vote for the models you want to run on-device.

Drop a Hugging Face repo ID, pick your preferred backend (GGUF, MLX, or Nexa format for Qualcomm + Apple NPUs), and watch the community's top requests go live in NexaSDK.

👉 Vote now at sdk.nexa.ai/wishlist

Acknowledgements

We would like to thank the following projects:

Join Builder Bounty Program

Earn up to 1,500 USD for building with NexaSDK.

Developer Bounty

Learn more in our Participant Details.

About

Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published