2.21.2
What's Changed
- add VLM support, refactor common LM code into MLXLMCommon. breaking API changes by @davidkoski in #151
- based on models from https://github.com/Blaizzy/mlx-vlm
- for #132
Xcode 16
Xcode 16 is required to build the example applications and tools. Older Xcode can still build the libraries via swiftpm (so no changes in requirements to any applications or libraries that refer to this).
This change is required because the xcodeproj now refers to the local Package.swift file to get builds consistent with external users. If needed we can switch back to using xcodeproj for library builds (internal) and swiftpm for library builds (external) -- if there is a problem please file an issue and it can be considered.
Additions
There are two new libraries:
MLXVLMcontains vision language models that combine images and text prompts to produce text results, e.g.describe this imageMLXLMCommoncontains theLanguageModelcode that is shared betweenMLXLLMandMLXVLM
The API between LLM and VLM is identical aside from the preparation of the UserInput.
let parameters = GenerateParameters()
// LLM prompt
let input = UserInput(prompt: "tell me a story")
// VLM prompt
let input = UserInput(prompt: "describe the image", images: [.url(url)])
// inference is identical
let result = try await modelContainer.perform { [generate, input] context in
let input = try await context.processor.prepare(input: input)
return try generate(input: input, parameters: parameters, context: context) { token in
// print tokens as they are generated, stop early, etc.
return .more
}
}VLM example code is available in the llm-tool example:
./mlx-run llm-tool eval --help
OVERVIEW: evaluate prompt and images to generate text (VLM)
USAGE: llm-tool eval <options>
OPTIONS:
--model <model> Name of the huggingface model or absolute path to directory
-p, --prompt <prompt> The message to be processed by the model. Use @path,@path to load from files, e.g. @/tmp/prompt.txt
--resize <resize> Resize images to this size (width, height)
--image <image> Paths or urls for input images
...
Breaking Changes
Probably no effect to code external to this repo:
- the mlx-swift-examples.xcodeproj now references the local
Package.swiftto build the libraries - the example code now uses the naming matching external uses of mlx-swift-examples, e.g.
import LLM->import MLXLLM - the library directories are now renamed to match their target names, e.g.
LLM->MLXLLM
Breaking:
- some code will now need to import both
MLXLLMandMLXLMCommon(particularly code that loads models) MLXLMCommoncontains the common API between LLM and VLM
import MLXLLM
import MLXLMCommon- constants for models have moved from
ModelConfigurationtoModelRegistry - this is
MLXLM.ModelRegistryand there is alsoMLXVLM.ModelRegistry
- let modelConfiguration = ModelConfiguration.phi3_5_4bit
+ let modelConfiguration = ModelRegistry.phi3_5_4bit- the
loadModelContainer()function is nowLLMModelFactory.shared.loadContainer() - there is a new
VLMModelFactorywith identical methods for loading VLMs
- let modelContainer = try await LLM.loadModelContainer(configuration: modelConfiguration)
- {
+ let modelContainer = try await LLMModelFactory.shared.loadContainer(
+ configuration: modelConfiguration
+ ) {ModelContainer.performis now throwing (and in MLXLMCommon):
- let result = await modelContainer.perform { model, tokenizer in
- LLM.generate(
+ let result = try await modelContainer.perform { model, tokenizer in
+ try MLXLMCommon.generate(ModelConfigurationpreviously had a way to register new configurations. This is now onLLMModelFactory(andVLMModelFactoryhas the same):
LLMModelFactory.shared.modelRegistry.register(configurations: [modelConfiguration])Deprecations
An example at the end shows all of these deprecations in context.
Prefer to use the ModelContext.processor to prepare prompts. Previously users would pass in a bare [Int] of tokens, but in order to support more complex inputs (VLMs) the use of bare [Int] is deprecated and callers should use UserInput and LMInput.
For example, previously callers might have done something like this:
let messages = [["role": "user", "content": prompt]]
let promptTokens = try await modelContainer.perform { _, tokenizer in
try tokenizer.applyChatTemplate(messages: messages)
}Now that should be:
let input = try await context.processor.prepare(input: .init(prompt: prompt))Which will initialize a UserInput from the prompt text and produce an LMInput that can be used to generate tokens.
This call to generate() is now deprecated:
public func generate(
promptTokens: [Int], parameters: GenerateParameters, model: any LanguageModel,
tokenizer: Tokenizer,
extraEOSTokens: Set<String>? = nil,
didGenerate: ([Int]) -> GenerateDisposition
) throws -> GenerateResultThis consumed the [Int] variety of tokens. Now this is preferred:
public func generate(
input: LMInput, parameters: GenerateParameters, context: ModelContext,
didGenerate: ([Int]) -> GenerateDisposition
) throws -> GenerateResultThis method on ModelContainer is now deprecated:
/// Perform an action on the model and/or tokenizer. Callers _must_ eval any `MLXArray` before returning as
/// `MLXArray` is not `Sendable`.
@available(*, deprecated, message: "prefer perform(_:) that uses a ModelContext")
public func perform<R>(_ action: @Sendable (any LanguageModel, Tokenizer) throws -> R) rethrows
-> Ruse this one instead (though the former still works):
/// Perform an action on the ``ModelContext``. Callers _must_ eval any `MLXArray` before returning as
/// `MLXArray` is not `Sendable`.
public func perform<R>(_ action: @Sendable (ModelContext) async throws -> R) async rethrows -> RExample
Putting all of these deprecations together, previously you might have generated text like this:
let messages = [["role": "user", "content": prompt]]
let promptTokens = try await modelContainer.perform { _, tokenizer in
try tokenizer.applyChatTemplate(messages: messages)
}
let result = await modelContainer.perform { model, tokenizer in
LLM.generate(
promptTokens: promptTokens, parameters: generateParameters, model: model,
tokenizer: tokenizer, extraEOSTokens: modelConfiguration.extraEOSTokens
) { tokens in ... }
}now do this:
let result = try await modelContainer.perform { context in
let input = try await context.processor.prepare(input: .init(prompt: prompt))
return try MLXLMCommon.generate(
input: input, parameters: generateParameters, context: context
) { tokens in ... }
}Full Changelog: 1.18.2...2.21.2