Skip to content

Conversation

@DePasqualeOrg
Copy link
Contributor

@DePasqualeOrg DePasqualeOrg commented Dec 27, 2025

JSON parsing is one of the biggest performance bottlenecks for tokenizer loading, and yyjson, a high-performance C library, offers significant speed gains for large tokenizer files: it's 3.4x faster for raw JSON parsing and 2.1x faster for building the Config, saving around 600 ms in a typical tokenizer load.

Changes

  • Add yyjson 0.12.0 as a dependency
  • Add YYJSONParser with direct yyjson → Config conversion (no intermediate Foundation objects)
  • Update HubApi.configuration(fileURL:) to use yyjson
  • Remove JSONSerialization+BOM.swift (yyjson handles BOM correctly)
  • Add Benchmarks test target (run with RUN_BENCHMARKS=1 swift test --filter Benchmarks)

Performance

Tested with the 11.4 MB tokenizer.json from mlx-community/Qwen3-0.6B-Base-DQ5:

Benchmark yyjson JSONSerialization Improvement
Raw JSON parsing 19 ms 66 ms 3.4x (47 ms)
JSON → Config 540 ms 1,160 ms 2.1x (620 ms)

This saves ~600 ms per tokenizer load on an M3 MacBook Pro.

All existing tests pass.

@DePasqualeOrg
Copy link
Contributor Author

@mattt, @pcuenca, I think this PR would be a good one to start with whenever you're ready, since #303 is based on it. For that reason, #303 looks bigger than it actually is. I added some refinements to all three of my PRs in this repo today, and I think they're now all ready for review.

Comment on lines -328 to 337
guard let parsed = try? JSONSerialization.bomPreservingJsonObject(with: data) else {
throw Hub.HubClientError.jsonSerialization(fileURL: fileURL, message: "JSON Serialization failed for \(fileURL). Please verify that you have set the HF_TOKEN environment variable.")
do {
return try YYJSONParser.parseToConfig(data)
} catch {
throw Hub.HubClientError.jsonSerialization(
fileURL: fileURL,
message: "JSON parsing failed for \(fileURL): \(error.localizedDescription). If this is a private model, verify that HF_TOKEN is set."
)
}
guard let dictionary = parsed as? [NSString: Any] else { throw Hub.HubClientError.parse }
return Config(dictionary)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2c on this:

I think theres an opportunity to protocolize json parsing, which would allow the dependency footprint to be reduced for this specific project but still enable yyjson usage outside of it.

protocol JSONParser {
    func parseToConfig(_ data: Data) throws -> Config
}

Then

func configuration(fileURL: URL, parser: JSONParser = DefaultJSONParser()) throws -> Config {
    let data = try Data(contentsOf: fileURL)
    do {
        return try parser.parseToConfig(data)
    } catch {
        throw Hub.HubClientError.jsonSerialization(
            fileURL: fileURL,
            message: "JSON parsing failed for \(fileURL): \(error.localizedDescription). If this is a private model, verify that HF_TOKEN is set."
        )
    }
}

Then JSONParser could be passed to the HubApi init or an object that is passed into configuration call.

let customParser = YYJSONParser()
let config = try hubApi.configuration(fileURL: someURL, parser: customParser)

Ideally this project would remain pure swift w/ swift dependencies but still allow fast implementations via protocols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea, although the Python transformers library uses the Rust tokenizers library, which uses serde for JSON parsing. I think there is a good argument for just having a fast default like in the Python transformers, especially since what's available in Swift is so slow. People running MLX models in Swift are already using C++ libraries through C bridging. yyjson is in C, so Swift can call it directly with minimal overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DePasqualeOrg Amazing work! I just opened a PR demonstrating the effect of in-situ parsing on speed and memory here: DePasqualeOrg#2

@ZachNagengast I'm sympathetic to the idea of dependency injection, but in this case, it's hard to imagine a scenario in which an API consumer wouldn't opt-in to faster JSON parsing. Assuming the performance is consistently better, and barring segfaults or incorrect behavior, then this seems like a slam dunk.

If the additional dependency is a concern, I suppose we could compromise with a trait that's enabled by default and could be disabled on an opt-out basis.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast default would be great, on the other hand swift apps have the consideration of compilation time and distributable binary size that also should be optimized. Testing the build on this branch appears to add 1.2MB of C code which compresses well to be fair to around 113KB. Do you think this dependency can be transitioned via the protocol to the MLX repo since that is already compiling C code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted before reading your comment, the extra dependency is a concern but it could be isolated with traits or simple compiler flags checking for canImport(yyjson) similar to this WIP branch that pulls jinja out of the compilation: main...ZachNagengast:swift-transformers:optional-jinja-import-for-hub-and-tokenizers

Something like this would allow the Transformers library to import the fast solution by default, but more targeted implementations that just want Hub and Tokenizers could have an optimal dependency footprint

@DePasqualeOrg
Copy link
Contributor Author

Thanks for this, @mattt. I dug into it, and it looks like both methods use identical memory (~68 MB) when measured in separate tests. The 0 KB measurement may have been due to memory reuse between sequential tests. Let me know what you think: https://github.com/DePasqualeOrg/swift-transformers/tree/benchmark-memory-use

@mattt
Copy link
Collaborator

mattt commented Jan 9, 2026

@DePasqualeOrg Running my own benchmarks, I found that YYJSON is actually ~8.7x faster than Foundation for parsing that ~10MB tokenizers.json file:

Metric Foundation YYJSON Improvement
Time (p50) 57.0 ms 6.5 ms 8.7x faster
Peak Memory 242 MB 52 MB 78% less

And according to Swift Benchmark, in-situ parsing correctly showed 0 allocations.

All the more reason for us to move forward, in my opinion.

@pcuenca Any strong feelings about how to proceed?

@DePasqualeOrg
Copy link
Contributor Author

@mattt, I don't fully understand the implications of in-situ parsing, but I'm not sure there's a benefit. Here's the analysis from Claude Code, for the record:

The "0 allocations" result comes from measuring only the parse step, after the buffer is allocated and before the Config conversion. Since convertToConfig immediately copies all strings via String(cString:), the in-situ benefit is negated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants