Decouple from tokenizer and downloader packages#118
Decouple from tokenizer and downloader packages#118DePasqualeOrg wants to merge 32 commits intoml-explore:mainfrom
Conversation
|
I really like this idea of decoupling from the HF libs, see also #98. This having an alternate back is the key that delivers the reason to do so. The numbers from your measurements are impressive and compelling. I am concerned with backward compatibility and a little bit with the default implementation. I am not disparaging your fork but I don't know how everybody feels about it (I am not an app developer) -- I guess this is along the lines of the old phrase "nobody ever got fired for buying IBM". I wonder if this could be done like this:
Then:
import MLXLLM
import MLXLMHuggingFace
let container = try await loadModelContainer(id: "mlx-community/Qwen3-4B-4bit")Maybe:
I think one tricky part is your fork probably looks identical to the standard HF API from a symbols point of view -- likely you cannot have both. Hopefully the main point of this is clear:
The exact mechanics of doing so need to be worked out. I wonder if delivering this in pieces would make it easier? |
|
@davidkoski, to be clear, Swift Hugging Face is maintained by Hugging Face, and that's the part that's interchangeable in this PR. Swift Tokenizers is the pure tokenizer library that I forked from Swift Transformers, and I didn't envision that being interchangeable, although I'll investigate whether it could be. Swift Transformers encompasses tokenization and model downloading, and in this PR that has been decomposed into Swift Hugging Face (now an interchangeable downloader, maintained by Hugging Face) and Swift Tokenizers (core tooling that probably doesn't need to be interchangeable if it is well maintained). I understand not wanting to depend on a single individual's package for tokenization, which is why I proposed bringing this package over to ml-explore, and I'm happy to continue contributing to it there if you want to go that route. I took care to make the changes easily auditable by breaking them into focused PRs with discussion in the descriptions. |
For logistical reasons outside of my control I don't think we can do that. I don't think there is a problem with having people choose to use your repo -- it has clear performance wins. But they should probably opt-in to doing so. We should make it possible/easy (and if needed provide the integration in this repo). I will give this a closer read and see if I have any feedback or ideas about how we can achieve these goals, but thank you so much for pushing on this -- these are impressive performance gains and it would be great if people can use them! |
|
Okay, thanks for clarifying. I think I have found a way to make the tokenizer package interchangeable using a protocol and traits. It would require |
This makes sense. What do you think about this:
Then we need to decide what the old version should do. I think it should be the new API for sure -- we don't want to bifurcate there. But what about the backend(s)? I think the choices are:
I might still be confused as to what specifically this is providing, so if that didn't make sense that is probably why. Anyway, the older clients that do not have swift 6.1 toolchains can still build but it is possible that they don't have as many options or it isn't a more dynamic build using traits. |
|
I've investigated various approaches to making the tokenizer package interchangeable, and I think I've landed on a good design:
Usage with explicit configurationThe integration packages provide protocol conformance. // Package.swift
dependencies: [
.package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.0.0"),
.package(url: "https://github.com/ml-explore/mlx-swift-lm-tokenizers", from: "1.0.0"),
.package(url: "https://github.com/ml-explore/mlx-swift-lm-huggingface", from: "1.0.0"),
]
// Consuming app
import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers
let container = try await loadModelContainer(
from: HubClient.default,
using: TokenizersLoader(),
id: "mlx-community/Qwen3-4B-4bit"
)Usage with convenience overloadsThe integration packages provide protocol conformance and convenience overloads. import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers
// Default downloader provided by convenience overload
let container = try await loadModelContainer(
using: TokenizersLoader(),
id: "mlx-community/Qwen3-4B-4bit"
)
// Default tokenizer loader provided by convenience overload
let container = try await loadModelContainer(
from: HubClient.default,
id: "mlx-community/Qwen3-4B-4bit"
)Core API shapepublic func loadModelContainer(
from downloader: any Downloader,
using tokenizerLoader: any TokenizerLoader,
id: String,
revision: String = "main",
useLatest: Bool = false,
progressHandler: @Sendable @escaping (Progress) -> Void = { _ in }
) async throws -> sending ModelContainerTokenizerLoader protocolpublic protocol TokenizerLoader: Sendable {
func loadTokenizer(from directory: URL) async throws -> any Tokenizer
} |
|
I think the approach is good overall but there is one problem we will have to figure out:
The same "logistics" issue appears here -- we cannot easily add new repositories. All of the functionality will have to go into I think this might be a place where the traits would be useful. If you can use them, it could select which back ends you actually want to pull. If not, you will pull more dependencies than you need but the build process should only build, link and copy the ones you use. |
|
As I mentioned, traits are not a viable option for anyone who adds MLX Swift LM to their Xcode project through the Xcode UI (e.g. app developers). There's no way for them to select a trait, since they're not editing a I experimented with using module aliases, and even when used in separate targets of MLX Swift LM, Swift Transformers and Swift Tokenizers collide, since both export a module called Tokenizers. If MLX Swift LM includes only one integration target with one of those packages as a dependency, it won't be possible for consumers to import an integration package that uses the other tokenizer package, because the module names collide. The only remaining option, which actually has advantages over the others, is to create separate integration packages for Swift Tokenizers (swift-tokenizers-mlx) and Swift Transformers (swift-transformers-mlx). Since the ml-explore organization can't host these packages, they'll need to be hosted by the maintainers of the respective tokenizer packages. This approach is ideal for the following reasons:
It would also make sense for the maintainers of the downloader packages (currently Swift Hugging Face, later also others) to host the respective integration packages. The integration packages are minimal and only need to include protocol conformance for tokenizer loading or model downloading. They can optionally also include convenience overloads for the loading functions. If this approach sounds good to you, I'll start implementing it for this PR and create integration packages for Swift Tokenizers, Swift Transformers, and Swift Hugging Face (the last two only as a proof of concept, since Hugging Face should ultimately be responsible for them). |
|
Great work! I vote for option three, but if there's a usage demonstration, it would be more clear~ @DePasqualeOrg |
Yeah, agreed about Xcode consumers. I was thinking maybe it could work but not be as optimal a build -- you can still depend on individual targets inside the swiftpm, but colliding package names sound like trouble. It makes sense but it also seems like something is inverted. A (mlx) depending on B (hf) requires that B implement their own integration with A. B shouldn't have to do that with every library that depends on them. People suggest a workaround: but that looks like it probably isn't worth pursuing.
What about using non-colliding names? That could leave the integration with the libraries that MLX depends on inside MLX (B does not have to make an integration with A), or in the case of your optimized library it could be completely external to mlx-swift-lm if you want (and we refer to it in the documentation). |
|
That approach would not be fair or ideal for the following reasons:
For those reasons, I think the integration packages should be separate. Anyone can make and host one, and they're just a few lines of code for protocol conformance. |
I think point 1 is already true. That name is in use and Xcode/swiftpm simply won't allow it: However https://docs.swift.org/swiftpm/documentation/packagemanagerdocs/modulealiasing/ does allow for this, but in my testing (and perhaps this is what you ran into as well) since you have a fork it has the same package name: let package = Package(
name: "swift-transformers",As far as Xcode/swiftpm are concerned, these are the same packages. I could get the aliases to work in a single package but when I used both Xcode would complain (Could not compute dependency graph: unable to load ... duplicate...). I don't think it is reasonable to have HuggingFace have a dependency on MLX to implement an integration for mlx-swift-lm (they could chose to do so of course) as MLX has a dependency on them (HF). So would renaming the I am looking into getting a new repo in ml-explore, but no guarantees and no idea on the timeline if possible. Point 2: agreed, it would check out some extra code but may or may not build it (if not used it shouldn't be built). I would go for "working" over "best". This would let us keep the default integration in mlx-swift-lm and not need another repository and might be what we should aim for while the extra repo is pondered. I have a little test program set up, currently not building (per point 1), but I may try a fork of your fork and try renaming the Package and see what happens. I am happy to attach that if you are interested (but it sounded like you may have something similar). |
|
I think there may be a misunderstanding, because my package already has a different package name, Hugging Face would not be required to have a dependency on MLX. The alternative is consumers can set up the protocol conformance themselves. But since MLX Swift LM is currently the main use case for Swift tokenizer packages, it would be in the interest of anyone who makes one to offer this trivial integration, if it's not offered here. I think it's clear that separate integration packages are needed, and the only open question is where they should be hosted, so I'll go ahead with implementing the |
You are correct -- I am confusing myself with the various implementations :-) Yes, as you said it looks like the aliases are not working as expected.
@angeloskath asked if a macro might work -- something that would implement the trivial forwarding mechanism. I will give this a try. That might give us a way to let consumers set up the conformance without knowing they were doing so.
I agree this is the easiest way and am circling the idea it might be the only way. I still have hope :-) |
|
OK, I have a proof of concept using macros. I have a stand-in for the real thing that looks like this: public protocol MLXTokenizer: Sendable {
func encode(text: String) -> [Int]
func decode(tokens: [Int]) -> String
}
public func generate(tokenizer: MLXTokenizer) -> String {
let tokens = tokenizer.encode(text: "testing")
return tokenizer.decode(tokens: tokens)
}Note: there is no hard dependency on any concrete Tokenizer We want to call it along these lines: let tokenizer = PreTrainedTokenizer(...) // e.g. the HuggingFace Tokenizer
print(generate(tokenizer: tokenizer))That won't work as-is because If we added: extension PreTrainedTokenizer: @retroactive MLXTokenizer { }Then it would work, but we are conforming a type we don't own to a protocol. OK, so try 1 with a macro looks like this: enum Tokenizers {
#MLXTokenizer(PreTrainedTokenizer.self)
}
let tokenizer = try Tokenizers.MLXPreTrainedTokenizer()
print(generate(tokenizer: tokenizer))The enum is needed because the macro can't generate a top level type (unless it has a static name). The macro ends up generating a simple wrapper for the type (assuming it looks like a HuggingFace Tokenizer API-wise) and forwards the protocol methods. Try 2 looks like this: #TokenizerFactory(PreTrainedTokenizer.self)
let tokenizer = try makeTokenizer()
print(generate(tokenizer: tokenizer))The factory generates a function with a fixed name so it can appear at the top level. Assuming you could build/link it would allow multiple providers of tokenizers if you did this in different files. Try 3: let tokenizer = try #MakeTokenizer(PreTrainedTokenizer.self)
print(generate(tokenizer: tokenizer))No top level function, just an inline expression. For all of these the This wouldn't block nicer integrations that actually implemented |
|
@davidkoski, I'll review your macro POC now. Before I do, this is what I was about to post regarding my own POC with separate integration packages, which I've pushed to this branch: I've implemented the The last one is for my fork of Swift Hugging Face, which includes ergonomic and performance improvements, avoiding a network roundtrip when possible for even faster model/tokenizer loading. I'll review everything again tomorrow and run benchmarks with the different integrations to show the performance improvement of my tokenizer and downloader packages. |
|
@davidkoski, I think we would need to see a working code example of that macro approach, but I suspect that it won't be able to do everything that we need to do to make this work. Check how I've set things up in the integration packages to see what I mean. I really think the integration packages are the happy, simple path, and they should be hosted alongside the respective tokenizer/downloader packages. |
|
Yeah, agreed that separate repos will be the cleanest way. Here is my POC if you want to see what I did: Look at ContentView for the integrations. It doesn't run (I think it will throw) but it does build. I think the packaging can be simplified along with coming up with a real implementation if we use this path. Think != know. |
|
I think the main issue with the macro approach is that it would require all the tokenizer and downloader libraries to have the same shapes, which isn't realistic – and indeed isn't even the case with the ones we have now. Protocols allow for libraries of any shape to integrate with this one. Even setting that aside, it would add complexity in this library and require consumers to use a less-familiar syntax. |
Not required as the manual implementation is trivial. It is true of any "automatic" integration. It is basically just a way to move the dependency to "compile" time rather than "Project: Resolve Packages" time.
Agreed |
74fecfd to
4932f20
Compare
|
@davidkoski, I've decoupled the tokenizer and downloader packages from the integration tests and benchmarks, so now the decoupling is complete. The logic for those tests still lives in this library, which exports helpers to run them in the integration packages. I'll review this all again and add some polish over the weekend, but I think this is getting close to an optimal design. Let me know what you think whenever you get a chance to look at it. |
2a4bc1f to
335b0d3
Compare
Add benchmark helpers for tokenizer loading, tokenization, and decoding
a23aff5 to
ada52c9
Compare
|
I just rebased on the new changes on main and resolved conflicts. |
|
Sorry, not ignoring this, just busy -- I will get some time to play with the traits and file bugs on Thursday I think. |
|
I suspect this is not really a bug and it's just due to how module resolution or build planning works in Swift. I think this PR is ready to be reviewed and merged. Users just need to import their preferred adapters or copy ~100 lines of code. |
Resolve modify/delete conflict for EmbedderIntegrationTests.swift by keeping the deletion (tests live in IntegrationTestHelpers on this branch) and porting the new Gemma 3 embedder test into EmbedderTests.
# Conflicts: # Libraries/MLXLMCommon/Load.swift
|
I've resolved more merge conflicts. |
|
OK, I think I am getting my head wrapped around traits enough to see how they won't work for this case (some tests on the trait-test branch and more changes I have locally). swift-logSo a case where this might work: let's say we wanted to use swift-log as an opt-in:
This would always check out swift-log because the Package dependencies can't depend on traits (not ideal). swift-log would not build if it were not being used and it would not be linked (good). Note the SDKTest (proxy for fancy linkage)Another variant, a little closer to what we need with the tokenizers/hub: what if we wanted some library that required a different SDK? I made an SDKTest directory with a Packages.swift with its own
We could hoist the API surface from SDKTest into e.g. MLXLMCommon and The Issue at HandThis leaves us with:
|
|
So in terms of merging. I would like to see the HF integration repos either here in ml-explore or in huggingface. I can't promise the former but I am looking into it (still). I have no control over the latter. I also want to get the various in-flight mlx-swift and mlx-swift-lm pieces landed before we do a major API bump. What do you think about getting the protocol landed with the HF integration in place and all the model conformance merged first (without breaking API). That would get a bunch of merge conflict surface out of the way. Potentially I can set up a branch in mlx-swift-lm so that people could easily try the new integration (I think that may be easier to consume than the fork in your org, but I am not sure). That would still leave us waiting for homes for the HF integration repos. Thoughts? |
|
I doubt Hugging Face will be willing to host a package that reduces the lock-in to their platform. But we shouldn't have to depend on Hugging Face taking action to benefit from better performance and be free from lock-in. With this PR, we don't have to: Users can either import an integration package or copy some trivial integration code. I don't quite understand the path you have in mind, but it sounds like you want to keep the Hugging Face dependency, which would prevent anyone from using my faster Swift Tokenizers package. It has been an enormous amount of work to get this to this point, and I would really like to get this merged so that we can move on. I think it's in an ideal state, with a clean separation of concerns and a straightforward way for users to migrate (whenever you do a major version bump) and pick what dependencies they want to use. |
Not exactly. I have a few things in mind:
I think people should try your improved tokenizers package and in time it may become the de-facto standard, but I don't want to force it on anyone yet. Right now your integration with the HF implementations are around 160 lines of code -- I don't think it is reasonable to have people copy that into their projects.
I agree, I like what you have done. If the integration repos were in place (above) I would be preparing to merge. |
This is the part I don't understand. Even though I've taken great care to keep breaking changes to a minimum, the protocol is a breaking change.
That's a valid concern, and it's why I suggested that people can copy ~100 lines of code instead of importing my packages.
No one is forced to use my packages. They can copy ~100 lines of code and use Hugging Face's packages.
That includes convenience overloads that are not required. Only ~100 lines (including code comments) are needed for this to work. I included links to the relevant files above. Anyone who doesn't want to copy this trivial code can import their preferred integration packages. |
Ah, I am not explaining myself well then and looking at it closer, I think you are correct. On the LLM side the tokenizer is separate from the model. The fact that the type changes from The VLM side it tougher because the My idea (which I think is incorrect now) is that we would supply the implementation of
I don't think copying 100 lines is a reasonable upgrade path (perhaps number of lines is not the metric as you would likely just copy a file into your repository). But perhaps here we can add an HF specific macro to build the integration? The minimum change to keep as-is would be: import MLXLMCommon
import Tokenizers
import HuggingFaceAdaptorMacro
// named TBD, but let's say
let container = try await #huggingFaceLoadModelContainer(
id: "mlx-community/Qwen3-4B-4bit"
)Plus a change in their Package.swift. The macro would let us inject the code at build-time and could ship as part of mlx-swift-lm without requiring a hugging face dependency (I think, though I have thought other things that turned out to be false). This would give a couple of lines change while breaking the hard link dependency in mlx-swift-lm. So you would have two ways of integrating:
If this works I think it would solve my concerns and let us move toward the conformance repositories. What do you think about this? |
|
Jump in to add another use case to see if it fits in the path you discussed above. I have a custom package to provide the model download logic (a custom downloader implementation. Yes, I don't use the I think the download and progress or other logic should be out of mlx-swift-lm. just basic pseudo-code above to demonstrate my use case. I wonder if it is supported in the possibilities your discussion above? much appreciation for the hard works so far ❤ |
|
@zallsold-lgtm, you can try out this PR branch for your use case. Everything is already in place. |
|
@davidkoski, I tried using macros to replace copying the integration code, and it doesn't work due to fundamental compiler issues related to the extensions and retroactive protocol conformances that we would need. Given that we've now ruled out the alternatives through exhaustive testing, users have these options, which I think are acceptable:
|
|
I'd just like to verify that this works with my Swift Tokenizers and Swift HF API packages. Will you make this available somewhere for me to test, or would you like to test that yourself? |
|
I resolved more merge conflicts. It is very difficult to resolve these conflicts with so many changes happening upstream. I hope I've done everything correctly. The more things get merged before this PR, the more opportunities for mistakes in resolving these conflicts there will be. |

MLX Swift LM currently has two fundamental problems:
This PR implements the following solutions:
Downloaderprotocol abstracts away the model hosting provider, making it easy to use other providers such as ModelScope or define custom providers such as downloading from storage buckets.Benchmarks
Model loading times on M3 MacBook Pro:
To run the benchmarks before the changes in this PR, check out commit
3752cc2.You can run the benchmarks in a separate scheme in Xcode with
RUN_BENCHMARKS=1, or from the command line:TEST_RUNNER_RUN_BENCHMARKS=1 xcodebuild test -scheme mlx-swift-lm-Package -destination 'platform=macOS' -only-testing:BenchmarksUsage
Loading from a local directory:
Convenience method from
MLXLMHuggingFacemodule (uses default Hub client):Using a custom Hugging Face Hub client:
Using a custom downloader:
Embedding models and adapters follow the same patterns.
Cache strategy
The
Downloaderprotocol includes auseLatestparameter (defaultfalse) that controls whether to check the network for updates:useLatest: false: Resolves refs (e.g. "main") to commit hashes locally via the cache'srefs/directory and returns cached files immediately, with no network call. This avoids 100–200ms of latency on every model load.useLatest: true: Always checks the network for the latest commit, then downloads any missing or updated files.This improves on the Python
huggingface_hubin two ways: Python always makes anapi.repo_info()network call before returning cached files, even for commit hashes. Swift skips the network entirely for commit hashes (which are immutable, so cached files are always valid) and additionally resolves branch names locally viaresolveCachedSnapshot()when freshness isn't needed. Users who want the latest files can opt in to the network call explicitly.In Swift Hugging Face, this is implemented as a two-method design:
resolveCachedSnapshot()resolves refs locally using cached metadatadownloadSnapshot()only uses the fast path on commit hashes (which are immutable), while branch names always trigger a network callBreaking changes
Loading API
The
hubparameter (previouslyHubApi) has been replaced withfrom(anyDownloaderorURLfor a local directory). Functions that previously defaulted todefaultHubApino longer have a default – callers must either pass aDownloaderexplicitly or use the convenience methods inMLXLMHuggingFace/MLXEmbeddersHuggingFace, which default toHubClient.default.For most users who were using the default Hub client, adding
import MLXLMHuggingFaceorimport MLXEmbeddersHuggingFaceand using the convenience overloads is sufficient.Users who were passing a custom
HubApiinstance should create aHubClientinstead and pass it as thefromparameter.HubClientconforms toDownloaderviaMLXLMHuggingFace.ModelConfigurationtokenizerIdandoverrideTokenizerhave been replaced bytokenizerSource: TokenizerSource?, which supports.id(String)for remote sources and.directory(URL)for local paths.preparePrompthas been removed. This shouldn't be used anyway, since support for chat templates is available.modelDirectory(hub:)has been removed. For local directories, pass theURLdirectly to the loading functions. For remote models, theDownloaderprotocol handles resolution.Tokenizer loading
loadTokenizer(configuration:hub:)has been removed. Tokenizer loading now usesAutoTokenizer.from(directory:)from Swift Tokenizers directly.replacementTokenizers(theTokenizerReplacementRegistry) has been removed. UseAutoTokenizer.register(_:for:)from Swift Tokenizers instead.defaultHubApiThe
defaultHubApiglobal has been removed. Hugging Face Hub access is now provided byHubClient.defaultfrom theHuggingFacemodule.Low-level APIs
downloadModel(hub:configuration:progressHandler:)→Downloader.download(id:revision:matching:useLatest:progressHandler:)loadTokenizerConfig(configuration:hub:)→AutoTokenizer.from(directory:)ModelFactory._load(hub:configuration:progressHandler:)→_load(configuration: ResolvedModelConfiguration)ModelFactory._loadContainer: removed (baseloadContainernow builds the container from_load)Maintainership of Swift Tokenizers
I'm currently maintaining Swift Tokenizers, but I think a better home for it would be the ml-explore organization. Hugging Face's packages are tightly coupled to their platform, while Swift Tokenizers is designed for a clean separation of concerns and is more closely related to the model code in MLX Swift LM.
To do