Skip to content

Decouple from tokenizer and downloader packages#118

Open
DePasqualeOrg wants to merge 32 commits intoml-explore:mainfrom
DePasqualeOrg:swift-tokenizers
Open

Decouple from tokenizer and downloader packages#118
DePasqualeOrg wants to merge 32 commits intoml-explore:mainfrom
DePasqualeOrg:swift-tokenizers

Conversation

@DePasqualeOrg
Copy link
Copy Markdown
Contributor

@DePasqualeOrg DePasqualeOrg commented Feb 24, 2026

MLX Swift LM currently has two fundamental problems:

  • Model loading is tightly coupled to the Hugging Face Hub. A Hub client is required even when loading models from a local directory.
  • Model loading performance with Swift Transformers lags far behind the Python equivalent, typically taking several seconds in Swift versus a few hundred milliseconds in Python.

This PR implements the following solutions:

  • Swift Transformers is replaced with Swift Tokenizers, a streamlined and optimized fork that focuses purely on tokenizer functionality, with no Hugging Face dependency and no extraneous Core ML code. This unlocks a 10x to 15x speedup in model loading times.
  • The Downloader protocol abstracts away the model hosting provider, making it easy to use other providers such as ModelScope or define custom providers such as downloading from storage buckets.
  • Swift Hugging Face, a dedicated client for the Hub, is used in an optional module. No Hugging Face Hub code is bundled for users who don't need it.

Benchmarks

Model loading times on M3 MacBook Pro:

Benchmark Before After Speedup
LLM 3228 ms 319 ms 10.1x
VLM 3788 ms 365 ms 10.4x
Embedding 1479 ms 95 ms 15.6x

To run the benchmarks before the changes in this PR, check out commit 3752cc2.

You can run the benchmarks in a separate scheme in Xcode with RUN_BENCHMARKS=1, or from the command line:

TEST_RUNNER_RUN_BENCHMARKS=1 xcodebuild test -scheme mlx-swift-lm-Package -destination 'platform=macOS' -only-testing:Benchmarks

Usage

Loading from a local directory:

import MLXLLM
import MLXLMCommon

let modelDirectory = URL(filePath: "/path/to/model")
let container = try await loadModelContainer(from: modelDirectory)

Convenience method from MLXLMHuggingFace module (uses default Hub client):

import MLXLLM
import MLXLMHuggingFace

let container = try await loadModelContainer(id: "mlx-community/Qwen3-4B-4bit")

Using a custom Hugging Face Hub client:

import MLXLLM
import MLXLMHuggingFace

let hub = HubClient(token: "hf_...")
let container = try await loadModelContainer(
    from: hub,
    id: "mlx-community/Qwen3-4B-4bit"
)

Using a custom downloader:

import MLXLLM
import MLXLMCommon

struct S3Downloader: Downloader {
    func download(
        id: String, revision: String?, matching patterns: [String],
        useLatest: Bool, progressHandler: @Sendable @escaping (Progress) -> Void
    ) async throws -> URL {
        // Download model files and return a local directory URL
    }
}

let container = try await loadModelContainer(
    from: S3Downloader(),
    id: "my-bucket/my-model"
)

Embedding models and adapters follow the same patterns.

Cache strategy

The Downloader protocol includes a useLatest parameter (default false) that controls whether to check the network for updates:

  • useLatest: false: Resolves refs (e.g. "main") to commit hashes locally via the cache's refs/ directory and returns cached files immediately, with no network call. This avoids 100–200ms of latency on every model load.
  • useLatest: true: Always checks the network for the latest commit, then downloads any missing or updated files.

This improves on the Python huggingface_hub in two ways: Python always makes an api.repo_info() network call before returning cached files, even for commit hashes. Swift skips the network entirely for commit hashes (which are immutable, so cached files are always valid) and additionally resolves branch names locally via resolveCachedSnapshot() when freshness isn't needed. Users who want the latest files can opt in to the network call explicitly.

In Swift Hugging Face, this is implemented as a two-method design:

  • resolveCachedSnapshot() resolves refs locally using cached metadata
  • downloadSnapshot() only uses the fast path on commit hashes (which are immutable), while branch names always trigger a network call

Breaking changes

Loading API

The hub parameter (previously HubApi) has been replaced with from (any Downloader or URL for a local directory). Functions that previously defaulted to defaultHubApi no longer have a default – callers must either pass a Downloader explicitly or use the convenience methods in MLXLMHuggingFace / MLXEmbeddersHuggingFace, which default to HubClient.default.

For most users who were using the default Hub client, adding import MLXLMHuggingFace or import MLXEmbeddersHuggingFace and using the convenience overloads is sufficient.

Users who were passing a custom HubApi instance should create a HubClient instead and pass it as the from parameter. HubClient conforms to Downloader via MLXLMHuggingFace.

ModelConfiguration

  • tokenizerId and overrideTokenizer have been replaced by tokenizerSource: TokenizerSource?, which supports .id(String) for remote sources and .directory(URL) for local paths.
  • preparePrompt has been removed. This shouldn't be used anyway, since support for chat templates is available.
  • modelDirectory(hub:) has been removed. For local directories, pass the URL directly to the loading functions. For remote models, the Downloader protocol handles resolution.

Tokenizer loading

loadTokenizer(configuration:hub:) has been removed. Tokenizer loading now uses AutoTokenizer.from(directory:) from Swift Tokenizers directly.

replacementTokenizers (the TokenizerReplacementRegistry) has been removed. Use AutoTokenizer.register(_:for:) from Swift Tokenizers instead.

defaultHubApi

The defaultHubApi global has been removed. Hugging Face Hub access is now provided by HubClient.default from the HuggingFace module.

Low-level APIs

  • downloadModel(hub:configuration:progressHandler:)Downloader.download(id:revision:matching:useLatest:progressHandler:)
  • loadTokenizerConfig(configuration:hub:)AutoTokenizer.from(directory:)
  • ModelFactory._load(hub:configuration:progressHandler:)_load(configuration: ResolvedModelConfiguration)
  • ModelFactory._loadContainer: removed (base loadContainer now builds the container from _load)

Maintainership of Swift Tokenizers

I'm currently maintaining Swift Tokenizers, but I think a better home for it would be the ml-explore organization. Hugging Face's packages are tightly coupled to their platform, while Swift Tokenizers is designed for a clean separation of concerns and is more closely related to the model code in MLX Swift LM.

To do

  • Review and merge PR #41 in swift-huggingface, which improves cache hit performance by avoiding unnecessary network calls, then switch to a tagged release (currently pinned to branch in my fork)
  • Review changes in Swift Tokenizers since fork from Swift Transformers
  • Decide on maintainership of Swift Tokenizers

@davidkoski
Copy link
Copy Markdown
Collaborator

I really like this idea of decoupling from the HF libs, see also #98. This having an alternate back is the key that delivers the reason to do so. The numbers from your measurements are impressive and compelling.

I am concerned with backward compatibility and a little bit with the default implementation. I am not disparaging your fork but I don't know how everybody feels about it (I am not an app developer) -- I guess this is along the lines of the old phrase "nobody ever got fired for buying IBM".

I wonder if this could be done like this:

  • we decouple the mlx-swift-lm code from the HF implementation to allow for your fork or if somebody wanted to make BobsCustomTokenizer they could do so
  • but leave the current API in place, but deprecated
  • that is if you do nothing you will get the current behavior
  • add the new methods that would split out some of the actions (Hub vs Tokenizers)

Then:

  • have a way to pick between HF, your fork, some custom implementation
  • this could be done by import CustomAdaptor that provides the API as in your example:
import MLXLLM
import MLXLMHuggingFace

let container = try await loadModelContainer(id: "mlx-community/Qwen3-4B-4bit")

Maybe:

  • we could use some of the "trait" features in newer SwiftPM to make it easier to pick and do this
  • I think we still need the current Package.swift, so that would "select" the current HF implementation
  • but there are ways to provide Package.swift files for newer Xcode / swift versions that would use the traits
  • the default trait might still be "HF" to keep compatibility but it would let us "unload" that

I think one tricky part is your fork probably looks identical to the standard HF API from a symbols point of view -- likely you cannot have both.

Hopefully the main point of this is clear:

  • yes, want!
  • but I think we need to find a way to keep current code building without change (or we force a major version bump -- the code change is not significant, but it won't build)
  • I do think we need a way for people to chose between your fork and standard HF if they are more comfortable with that

The exact mechanics of doing so need to be worked out. I wonder if delivering this in pieces would make it easier?

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

@davidkoski, to be clear, Swift Hugging Face is maintained by Hugging Face, and that's the part that's interchangeable in this PR. Swift Tokenizers is the pure tokenizer library that I forked from Swift Transformers, and I didn't envision that being interchangeable, although I'll investigate whether it could be.

Swift Transformers encompasses tokenization and model downloading, and in this PR that has been decomposed into Swift Hugging Face (now an interchangeable downloader, maintained by Hugging Face) and Swift Tokenizers (core tooling that probably doesn't need to be interchangeable if it is well maintained).

I understand not wanting to depend on a single individual's package for tokenization, which is why I proposed bringing this package over to ml-explore, and I'm happy to continue contributing to it there if you want to go that route. I took care to make the changes easily auditable by breaking them into focused PRs with discussion in the descriptions.

@davidkoski
Copy link
Copy Markdown
Collaborator

why I proposed bringing this package over to ml-explore

For logistical reasons outside of my control I don't think we can do that.

I don't think there is a problem with having people choose to use your repo -- it has clear performance wins. But they should probably opt-in to doing so. We should make it possible/easy (and if needed provide the integration in this repo).

I will give this a closer read and see if I have any feedback or ideas about how we can achieve these goals, but thank you so much for pushing on this -- these are impressive performance gains and it would be great if people can use them!

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

Okay, thanks for clarifying. I think I have found a way to make the tokenizer package interchangeable using a protocol and traits. It would require Package.swift to use Swift 6.0 tools. From my perspective, it would be a lot easier to do this all in one go as a major version bump.

@davidkoski
Copy link
Copy Markdown
Collaborator

Okay, thanks for clarifying. I think I have found a way to make the tokenizer package interchangeable using a protocol and traits. It would require Package.swift to use Swift 6.0 tools. From my perspective, it would be a lot easier to do this all in one go as a major version bump.

This makes sense. What do you think about this:

  • Package.swift -- current package file with whatever toolchain it is at for older clients
  • Package@swift-6.1.swift -- new package using traits

Then we need to decide what the old version should do. I think it should be the new API for sure -- we don't want to bifurcate there. But what about the backend(s)? I think the choices are:

  • only provides HF adaptor layer (MLXLMHuggingFace) and has static dependencies to swift-transformers
  • or provides adaptors to both HF and your optimized tokenizer and both are pulled as dependencies and we rely on SwiftPM build tech to only build the ones that are needed

I might still be confused as to what specifically this is providing, so if that didn't make sense that is probably why. Anyway, the older clients that do not have swift 6.1 toolchains can still build but it is possible that they don't have as many options or it isn't a more dynamic build using traits.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I've investigated various approaches to making the tokenizer package interchangeable, and I think I've landed on a good design:

  • MLX Swift LM defines a Tokenizer protocol that matches what's currently used in Swift Transformers and Swift Tokenizers.
  • Instead of selecting the tokenizer package with traits, separate integration packages for tokenizers (Swift Transformers and Swift Tokenizers) and downloaders (currently just Swift Hugging Face, later also Swift ModelScope and others) are imported.
    • Rationale for importing tokenizer integration packages instead of using traits:
      • If I'm not mistaken, traits won't work for users who add the package to an Xcode project through the Xcode UI.
      • Explicit imports rather than pre-configured defaults allow users to make a conscious choice.
      • This matches the approach of importing a downloader integration package.
    • Rationale for separate integration packages rather than modules within this package:
      • Both Swift Transformers and Swift Tokenizers export a module called Tokenizers, which could lead to conflicts if both are used in this package, even in separate modules. I looked into ways to resolve this and couldn't find any, but please correct me if I'm wrong.

Usage with explicit configuration

The integration packages provide protocol conformance.

// Package.swift
dependencies: [
    .package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-tokenizers", from: "1.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-huggingface", from: "1.0.0"),
]

// Consuming app
import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers

let container = try await loadModelContainer(
    from: HubClient.default,
    using: TokenizersLoader(),
    id: "mlx-community/Qwen3-4B-4bit"
)

Usage with convenience overloads

The integration packages provide protocol conformance and convenience overloads.

import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers

// Default downloader provided by convenience overload
let container = try await loadModelContainer(
    using: TokenizersLoader(),
    id: "mlx-community/Qwen3-4B-4bit"
)

// Default tokenizer loader provided by convenience overload
let container = try await loadModelContainer(
    from: HubClient.default,
    id: "mlx-community/Qwen3-4B-4bit"
)

Core API shape

public func loadModelContainer(
    from downloader: any Downloader,
    using tokenizerLoader: any TokenizerLoader,
    id: String,
    revision: String = "main",
    useLatest: Bool = false,
    progressHandler: @Sendable @escaping (Progress) -> Void = { _ in }
) async throws -> sending ModelContainer

TokenizerLoader protocol

public protocol TokenizerLoader: Sendable {
    func loadTokenizer(from directory: URL) async throws -> any Tokenizer
}

@davidkoski
Copy link
Copy Markdown
Collaborator

I think the approach is good overall but there is one problem we will have to figure out:

// Package.swift
dependencies: [
    .package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-tokenizers", from: "1.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-huggingface", from: "1.0.0"),
]

The same "logistics" issue appears here -- we cannot easily add new repositories. All of the functionality will have to go into mlx-swift-lm.

I think this might be a place where the traits would be useful. If you can use them, it could select which back ends you actually want to pull. If not, you will pull more dependencies than you need but the build process should only build, link and copy the ones you use.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

As I mentioned, traits are not a viable option for anyone who adds MLX Swift LM to their Xcode project through the Xcode UI (e.g. app developers). There's no way for them to select a trait, since they're not editing a Package.swift file.

I experimented with using module aliases, and even when used in separate targets of MLX Swift LM, Swift Transformers and Swift Tokenizers collide, since both export a module called Tokenizers.

If MLX Swift LM includes only one integration target with one of those packages as a dependency, it won't be possible for consumers to import an integration package that uses the other tokenizer package, because the module names collide.

The only remaining option, which actually has advantages over the others, is to create separate integration packages for Swift Tokenizers (swift-tokenizers-mlx) and Swift Transformers (swift-transformers-mlx). Since the ml-explore organization can't host these packages, they'll need to be hosted by the maintainers of the respective tokenizer packages.

This approach is ideal for the following reasons:

  • Users make an explicit choice about their dependencies.
  • Unused dependencies aren't pulled during package resolution.

It would also make sense for the maintainers of the downloader packages (currently Swift Hugging Face, later also others) to host the respective integration packages.

The integration packages are minimal and only need to include protocol conformance for tokenizer loading or model downloading. They can optionally also include convenience overloads for the loading functions.

If this approach sounds good to you, I'll start implementing it for this PR and create integration packages for Swift Tokenizers, Swift Transformers, and Swift Hugging Face (the last two only as a proof of concept, since Hugging Face should ultimately be responsible for them).

@z-also
Copy link
Copy Markdown

z-also commented Mar 3, 2026

Great work!
For the first option, I think they need to create a wrapper package (like @_exported import AnyLanguageModel), but still, might not be a great option.

I vote for option three, but if there's a usage demonstration, it would be more clear~ @DePasqualeOrg

@davidkoski
Copy link
Copy Markdown
Collaborator

The only remaining option, which actually has advantages over the others, is to create separate integration packages for Swift Tokenizers (swift-tokenizers-mlx) and Swift Transformers (swift-transformers-mlx). Since the ml-explore organization can't host these packages, they'll need to be hosted by the maintainers of the respective tokenizer packages.

This approach is ideal for the following reasons:

  • Users make an explicit choice about their dependencies.
  • Unused dependencies aren't pulled during package resolution.

It would also make sense for the maintainers of the downloader packages (currently Swift Hugging Face, later also others) to host the respective integration packages.

The integration packages are minimal and only need to include protocol conformance for tokenizer loading or model downloading. They can optionally also include convenience overloads for the loading functions.

If this approach sounds good to you, I'll start implementing it for this PR and create integration packages for Swift Tokenizers, Swift Transformers, and Swift Hugging Face (the last two only as a proof of concept, since Hugging Face should ultimately be responsible for them).

Yeah, agreed about Xcode consumers. I was thinking maybe it could work but not be as optimal a build -- you can still depend on individual targets inside the swiftpm, but colliding package names sound like trouble.

It makes sense but it also seems like something is inverted. A (mlx) depending on B (hf) requires that B implement their own integration with A. B shouldn't have to do that with every library that depends on them.

People suggest a workaround:

but that looks like it probably isn't worth pursuing.

I experimented with using module aliases, and even when used in separate targets of MLX Swift LM, Swift Transformers and Swift Tokenizers collide, since both export a module called Tokenizers.

What about using non-colliding names? FastTokenizers or something? It would still leave us with downloading the HuggingFace implementation even if you weren't using it, but I think build and link would work fine since a caller would not have to depend on it.

That could leave the integration with the libraries that MLX depends on inside MLX (B does not have to make an integration with A), or in the case of your optimized library it could be completely external to mlx-swift-lm if you want (and we refer to it in the documentation).

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

DePasqualeOrg commented Mar 4, 2026

That approach would not be fair or ideal for the following reasons:

  • It would unfairly allow one company's library to exclusively use the name Tokenizers and would require others to use a different name, even in other contexts.
  • It would require people to pull libraries that they're not using during package resolution.

For those reasons, I think the integration packages should be separate. Anyone can make and host one, and they're just a few lines of code for protocol conformance.

@davidkoski
Copy link
Copy Markdown
Collaborator

That approach would not be fair or ideal for the following reasons:

  • It would unfairly allow one company's library to exclusively use the name Tokenizers and would require others to use a different name, even in other contexts.
  • It would require people to pull libraries that they're not using during package resolution.

For those reasons, I think the integration packages should be separate. Anyone can make and host one, and they're just a few lines of code for protocol conformance.

I think point 1 is already true. That name is in use and Xcode/swiftpm simply won't allow it:

multiple packages ('swift-tokenizers', 'swift-transformers') declare targets with a conflicting name: 'Tokenizers’; target names need to be unique across the package graph

However https://docs.swift.org/swiftpm/documentation/packagemanagerdocs/modulealiasing/ does allow for this, but in my testing (and perhaps this is what you ran into as well) since you have a fork it has the same package name:

let package = Package(
    name: "swift-transformers",

As far as Xcode/swiftpm are concerned, these are the same packages. I could get the aliases to work in a single package but when I used both Xcode would complain (Could not compute dependency graph: unable to load ... duplicate...).

I don't think it is reasonable to have HuggingFace have a dependency on MLX to implement an integration for mlx-swift-lm (they could chose to do so of course) as MLX has a dependency on them (HF).

So would renaming the Package (not the modules) work? Maybe. It looks like that is what the aliases are meant for.

I am looking into getting a new repo in ml-explore, but no guarantees and no idea on the timeline if possible.

Point 2: agreed, it would check out some extra code but may or may not build it (if not used it shouldn't be built). I would go for "working" over "best". This would let us keep the default integration in mlx-swift-lm and not need another repository and might be what we should aim for while the extra repo is pondered.

I have a little test program set up, currently not building (per point 1), but I may try a fork of your fork and try renaming the Package and see what happens. I am happy to attach that if you are interested (but it sounded like you may have something similar).

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I think there may be a misunderstanding, because my package already has a different package name, swift-tokenizers, and still module aliases didn't work in my testing, because of the module name collision.

Hugging Face would not be required to have a dependency on MLX. The alternative is consumers can set up the protocol conformance themselves. But since MLX Swift LM is currently the main use case for Swift tokenizer packages, it would be in the interest of anyone who makes one to offer this trivial integration, if it's not offered here.

I think it's clear that separate integration packages are needed, and the only open question is where they should be hosted, so I'll go ahead with implementing the Tokenizer protocol to demonstrate that.

@davidkoski
Copy link
Copy Markdown
Collaborator

I think there may be a misunderstanding, because my package already has a different package name, swift-tokenizers, and still module aliases didn't work in my testing, because of the module name collision.

You are correct -- I am confusing myself with the various implementations :-)

Yes, as you said it looks like the aliases are not working as expected.

Hugging Face would not be required to have a dependency on MLX. The alternative is consumers can set up the protocol conformance themselves. But since MLX Swift LM is currently the main use case for Swift tokenizer packages, it would be in the interest of anyone who makes one to offer this trivial integration, if it's not offered here.

@angeloskath asked if a macro might work -- something that would implement the trivial forwarding mechanism. I will give this a try. That might give us a way to let consumers set up the conformance without knowing they were doing so.

I think it's clear that separate integration packages are needed, and the only open question is where they should be hosted, so I'll go ahead with implementing the Tokenizer protocol to demonstrate that.

I agree this is the easiest way and am circling the idea it might be the only way. I still have hope :-)

@davidkoski
Copy link
Copy Markdown
Collaborator

davidkoski commented Mar 5, 2026

OK, I have a proof of concept using macros. I have a stand-in for the real thing that looks like this:

public protocol MLXTokenizer: Sendable {
    func encode(text: String) -> [Int]
    func decode(tokens: [Int]) -> String
}

public func generate(tokenizer: MLXTokenizer) -> String {
    let tokens = tokenizer.encode(text: "testing")
    return tokenizer.decode(tokens: tokens)
}

Note: there is no hard dependency on any concrete Tokenizer

We want to call it along these lines:

let tokenizer = PreTrainedTokenizer(...) // e.g. the HuggingFace Tokenizer
print(generate(tokenizer: tokenizer))

That won't work as-is because PreTrainedTokenizer doesn't implement MLXTokenizer, at least not type-wise.

If we added:

extension PreTrainedTokenizer: @retroactive MLXTokenizer { }

Then it would work, but we are conforming a type we don't own to a protocol.

OK, so try 1 with a macro looks like this:

enum Tokenizers {
    #MLXTokenizer(PreTrainedTokenizer.self)
}

let tokenizer = try Tokenizers.MLXPreTrainedTokenizer()
print(generate(tokenizer: tokenizer))

The enum is needed because the macro can't generate a top level type (unless it has a static name). The macro ends up generating a simple wrapper for the type (assuming it looks like a HuggingFace Tokenizer API-wise) and forwards the protocol methods.

Try 2 looks like this:

#TokenizerFactory(PreTrainedTokenizer.self)

let tokenizer = try makeTokenizer()
print(generate(tokenizer: tokenizer))

The factory generates a function with a fixed name so it can appear at the top level. Assuming you could build/link it would allow multiple providers of tokenizers if you did this in different files.

Try 3:

let tokenizer = try #MakeTokenizer(PreTrainedTokenizer.self)
print(generate(tokenizer: tokenizer))

No top level function, just an inline expression.

For all of these the import Tokenizers is in the application, not the library. I have handwaved a bunch of stuff about how to actually generate the wrapper knowing that the underlying type looks like the current HuggingFace Tokenizer API. I am ignoring how we get the configuration, etc. but I presume this could be worked out.

This wouldn't block nicer integrations that actually implemented loadModelContainer(), but might let us break the dependency in a different way.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

DePasqualeOrg commented Mar 5, 2026

@davidkoski, I'll review your macro POC now. Before I do, this is what I was about to post regarding my own POC with separate integration packages, which I've pushed to this branch:

I've implemented the Tokenizer protocol, factored out the tokenizer and downloader code, and created the following integration packages as a proof of concept:

The last one is for my fork of Swift Hugging Face, which includes ergonomic and performance improvements, avoiding a network roundtrip when possible for even faster model/tokenizer loading.

I'll review everything again tomorrow and run benchmarks with the different integrations to show the performance improvement of my tokenizer and downloader packages.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

@davidkoski, I think we would need to see a working code example of that macro approach, but I suspect that it won't be able to do everything that we need to do to make this work. Check how I've set things up in the integration packages to see what I mean.

I really think the integration packages are the happy, simple path, and they should be hosted alongside the respective tokenizer/downloader packages.

@davidkoski
Copy link
Copy Markdown
Collaborator

Yeah, agreed that separate repos will be the cleanest way. Here is my POC if you want to see what I did:

LT2.zip

Look at ContentView for the integrations. It doesn't run (I think it will throw) but it does build. I think the packaging can be simplified along with coming up with a real implementation if we use this path. Think != know.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I think the main issue with the macro approach is that it would require all the tokenizer and downloader libraries to have the same shapes, which isn't realistic – and indeed isn't even the case with the ones we have now. Protocols allow for libraries of any shape to integrate with this one.

Even setting that aside, it would add complexity in this library and require consumers to use a less-familiar syntax.

@davidkoski
Copy link
Copy Markdown
Collaborator

I think the main issue with the macro approach is that it would require all the tokenizer and downloader libraries to have the same shapes, which isn't realistic – and indeed isn't even the case with the ones we have now. Protocols allow for libraries of any shape to integrate with this one.

Not required as the manual implementation is trivial. It is true of any "automatic" integration. It is basically just a way to move the dependency to "compile" time rather than "Project: Resolve Packages" time.

Even setting that aside, it would add complexity in this library and require consumers to use a less-familiar syntax.

Agreed

@DePasqualeOrg DePasqualeOrg force-pushed the swift-tokenizers branch 2 times, most recently from 74fecfd to 4932f20 Compare March 6, 2026 21:59
@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

@davidkoski, I've decoupled the tokenizer and downloader packages from the integration tests and benchmarks, so now the decoupling is complete. The logic for those tests still lives in this library, which exports helpers to run them in the integration packages.

I'll review this all again and add some polish over the weekend, but I think this is getting close to an optimal design. Let me know what you think whenever you get a chance to look at it.

@DePasqualeOrg DePasqualeOrg changed the title Use Swift Tokenizers and Swift Hugging Face for improved performance and provider agnosticism Decouple from tokenizer and downloader packages Mar 6, 2026
@DePasqualeOrg DePasqualeOrg force-pushed the swift-tokenizers branch 4 times, most recently from 2a4bc1f to 335b0d3 Compare March 8, 2026 19:55
@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I just rebased on the new changes on main and resolved conflicts.

@davidkoski
Copy link
Copy Markdown
Collaborator

Sorry, not ignoring this, just busy -- I will get some time to play with the traits and file bugs on Thursday I think.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I suspect this is not really a bug and it's just due to how module resolution or build planning works in Swift.

I think this PR is ready to be reviewed and merged. Users just need to import their preferred adapters or copy ~100 lines of code.

Resolve modify/delete conflict for EmbedderIntegrationTests.swift by
keeping the deletion (tests live in IntegrationTestHelpers on this
branch) and porting the new Gemma 3 embedder test into EmbedderTests.
# Conflicts:
#	Libraries/MLXLMCommon/Load.swift
@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I've resolved more merge conflicts.

@davidkoski
Copy link
Copy Markdown
Collaborator

davidkoski commented Mar 23, 2026

OK, I think I am getting my head wrapped around traits enough to see how they won't work for this case (some tests on the trait-test branch and more changes I have locally).

swift-log

So a case where this might work: let's say we wanted to use swift-log as an opt-in:

  • we define a SwiftLog trait
  • we have #if SwiftLog in the code to import and call functions
  • we have a conditional dependency on the MLXLMCommon target to add swift-log if the trait is enabled

This would always check out swift-log because the Package dependencies can't depend on traits (not ideal). swift-log would not build if it were not being used and it would not be linked (good).

Note the #if SwiftLog is not available in Package.swift -- it is a setting on building the products.

SDKTest (proxy for fancy linkage)

Another variant, a little closer to what we need with the tokenizers/hub: what if we wanted some library that required a different SDK? I made an SDKTest directory with a Packages.swift with its own platforms setting. That seems to work. I can refer to this from the top level Package.swift. But:

  • same issue with traits -- I can control linkage in a target but not the package level dependency
  • the products/targets in the sub-package are not visible to code that depends on the top level Package -- in other words SDKTest is not visible
  • worse, Xcode doesn't show the contents of SDKTest -- it is like the files do not exist

We could hoist the API surface from SDKTest into e.g. MLXLMCommon and #if SDKTest around code that forwarded to the library.

The Issue at Hand

This leaves us with:

  • we can't dynamically control Package level dependencies with traits, only Target level
  • that means we will pull all the adaptors and tokenizers, etc. (not great, but not fatal yet)
  • Xcode won't build with the duplicate names, even using the swiftpm module aliasing
  • at best we could make a generic library that used traits to conditionally compile code that exposed the various integrations, but Xcode can't actually build such a thing

@davidkoski
Copy link
Copy Markdown
Collaborator

So in terms of merging. I would like to see the HF integration repos either here in ml-explore or in huggingface. I can't promise the former but I am looking into it (still). I have no control over the latter.

I also want to get the various in-flight mlx-swift and mlx-swift-lm pieces landed before we do a major API bump.

What do you think about getting the protocol landed with the HF integration in place and all the model conformance merged first (without breaking API). That would get a bunch of merge conflict surface out of the way.

Potentially I can set up a branch in mlx-swift-lm so that people could easily try the new integration (I think that may be easier to consume than the fork in your org, but I am not sure). That would still leave us waiting for homes for the HF integration repos.

Thoughts?

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I doubt Hugging Face will be willing to host a package that reduces the lock-in to their platform. But we shouldn't have to depend on Hugging Face taking action to benefit from better performance and be free from lock-in. With this PR, we don't have to: Users can either import an integration package or copy some trivial integration code.

I don't quite understand the path you have in mind, but it sounds like you want to keep the Hugging Face dependency, which would prevent anyone from using my faster Swift Tokenizers package.

It has been an enormous amount of work to get this to this point, and I would really like to get this merged so that we can move on. I think it's in an ideal state, with a clean separation of concerns and a straightforward way for users to migrate (whenever you do a major version bump) and pick what dependencies they want to use.

@davidkoski
Copy link
Copy Markdown
Collaborator

I don't quite understand the path you have in mind, but it sounds like you want to keep the Hugging Face dependency, which would prevent anyone from using my faster Swift Tokenizers package.

Not exactly. I have a few things in mind:

  • can we get the non-api-breaking part of this merged right away to avoid merge conflicts
  • I am leery of making an API break that would require people to use a new repository outside of what they may currently trust (supply chain attack worries) -- maybe I am being overly cautious for nothing, but it is something I think about. It is a reason I haven't merged use ReerCodable macro to allow for default values #106
  • we don't currently have a solution that deals with that

I think people should try your improved tokenizers package and in time it may become the de-facto standard, but I don't want to force it on anyone yet.

Right now your integration with the HF implementations are around 160 lines of code -- I don't think it is reasonable to have people copy that into their projects.

It has been an enormous amount of work to get this to this point, and I would really like to get this merged so that we can move on. I think it's in an ideal state, with a clean separation of concerns and a straightforward way for users to migrate (whenever you do a major version bump) and pick what dependencies they want to use.

I agree, I like what you have done. If the integration repos were in place (above) I would be preparing to merge.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

DePasqualeOrg commented Mar 23, 2026

  • can we get the non-api-breaking part of this merged right away to avoid merge conflicts

This is the part I don't understand. Even though I've taken great care to keep breaking changes to a minimum, the protocol is a breaking change.

  • I am leery of making an API break that would require people to use a new repository outside of what they may currently trust (supply chain attack worries)

That's a valid concern, and it's why I suggested that people can copy ~100 lines of code instead of importing my packages.

I think people should try your improved tokenizers package and in time it may become the de-facto standard, but I don't want to force it on anyone yet.

No one is forced to use my packages. They can copy ~100 lines of code and use Hugging Face's packages.

Right now your integration with the HF implementations are around 160 lines of code -- I don't think it is reasonable to have people copy that into their projects.

That includes convenience overloads that are not required. Only ~100 lines (including code comments) are needed for this to work. I included links to the relevant files above.

Anyone who doesn't want to copy this trivial code can import their preferred integration packages.

@davidkoski
Copy link
Copy Markdown
Collaborator

  • can we get the non-api-breaking part of this merged right away to avoid merge conflicts

This is the part I don't understand. Even though I've taken great care to keep breaking changes to a minimum, the protocol is a breaking change.

Ah, I am not explaining myself well then and looking at it closer, I think you are correct.

On the LLM side the tokenizer is separate from the model. The fact that the type changes from Tokenizers.Tokenizer to MLXLMCommon.Tokenizer is an API linkage break but not a source break (which is generally OK for swiftpm packages since you build from source).

The VLM side it tougher because the UserInputProcessor (which all the VLMs specialize) takes a Tokenizers.Tokenizer as input. We can change all of them in MLXVLM but if somebody had a custom model in their application / library it would be a breaking change.

My idea (which I think is incorrect now) is that we would supply the implementation of MLXLMCommon.Tokenizer to keep it building on top of HF, but the custom VLM inits would all break (outside of MLXVLM).

  • I am leery of making an API break that would require people to use a new repository outside of what they may currently trust (supply chain attack worries)

That's a valid concern, and it's why I suggested that people can copy ~100 lines of code instead of importing my packages.

I don't think copying 100 lines is a reasonable upgrade path (perhaps number of lines is not the metric as you would likely just copy a file into your repository). But perhaps here we can add an HF specific macro to build the integration?

The minimum change to keep as-is would be:

import MLXLMCommon
import Tokenizers
import HuggingFaceAdaptorMacro

// named TBD, but let's say
let container = try await #huggingFaceLoadModelContainer(
    id: "mlx-community/Qwen3-4B-4bit"
)

Plus a change in their Package.swift. The macro would let us inject the code at build-time and could ship as part of mlx-swift-lm without requiring a hugging face dependency (I think, though I have thought other things that turned out to be false). This would give a couple of lines change while breaking the hard link dependency in mlx-swift-lm.

So you would have two ways of integrating:

  • use the repositories that you provide that allow a choice between Hub and Tokenizer implementations
  • use the macro and link (at the app level) to the hugging face libraries

If this works I think it would solve my concerns and let us move toward the conformance repositories.

What do you think about this?

@zallsold-lgtm
Copy link
Copy Markdown

zallsold-lgtm commented Mar 24, 2026

Jump in to add another use case to see if it fits in the path you discussed above. I have a custom package to provide the model download logic (a custom downloader implementation. Yes, I don't use the swift-transformers or swift-huggingface package provided by HuggingFace at all for some other reason). so my expectation is that there is a protocol based layer that I can make my own conformance.

I think the download and progress or other logic should be out of mlx-swift-lm.

class MyModelLoader: SomeModelLoaderProtocol {
    async loadModel() -> []  //  load model from file system
    
}

// the model download and ensure logic should be completely handle by my own business logic.
// I will check and ensure the model is downloaded and then proceed to the loadContainer below. 

let container = try await somewhatLoadContainer(
      provider: MyModelLoader()
)

just basic pseudo-code above to demonstrate my use case. I wonder if it is supported in the possibilities your discussion above?

much appreciation for the hard works so far ❤

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

@zallsold-lgtm, you can try out this PR branch for your use case. Everything is already in place.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

@davidkoski, I tried using macros to replace copying the integration code, and it doesn't work due to fundamental compiler issues related to the extensions and retroactive protocol conformances that we would need.

Given that we've now ruled out the alternatives through exhaustive testing, users have these options, which I think are acceptable:

  • Import an integration package pinned to a specific version (easy and solves the trust problem)
  • Copy ~100 lines of code (a tiny bit more work, but achieves full code ownership)

@davidkoski
Copy link
Copy Markdown
Collaborator

OK, I think I have something working with macros that looks ok. It would let an app link to the hugging face packages directly:

image

mlx-swift-lm not shown as I have it as a local package.

Then somebody could write this:

import Foundation
import MLXLMCommon

// this is the library with the macros -- injecting the ~100 lines
import MLXHuggingFace

// import these as expected
import Tokenizers
import HuggingFace

func test1() async throws {
    // use the new API directly
    let m = try await loadModel(
        from: #hubDownloader(),
        using: #huggingFaceTokenizerLoader(),
        configuration: .init(id: "mlx-community/SmolLM3-3B-4bit")
    )
}

func test2() async throws {
    // two integration points for common calls
    let m = try await #huggingFaceLoadModel(configuration: .init(id: "mlx-community/SmolLM3-3B-4bit"))
}

func test3() async throws {
    // and progress
    let m = try await #huggingFaceLoadModelContainer(configuration: .init(id: "mlx-community/SmolLM3-3B-4bit")) { v in
        print(v)
    }
}

This works because the macro expands in the context of the file, which has access to the HF API.

So people could:

  • use the macro
  • use your adaptor library
  • adapt by hand -- as you noted it is easy

Here is what I came up with macro-wise:

macros.patch

I think this could work and it would resolve my concerns about the HF integration and new GitHub repos + allow use of your optimized versions.

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I'd just like to verify that this works with my Swift Tokenizers and Swift HF API packages. Will you make this available somewhere for me to test, or would you like to test that yourself?

@DePasqualeOrg
Copy link
Copy Markdown
Contributor Author

I resolved more merge conflicts. It is very difficult to resolve these conflicts with so many changes happening upstream. I hope I've done everything correctly. The more things get merged before this PR, the more opportunities for mistakes in resolving these conflicts there will be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants