Commit 6ef303b
add VLM support, refactor common LM code into MLXLMCommon. breaking API changes (#151)
* implement VLM
- based on models from https://github.com/Blaizzy/mlx-vlm
There are two new libraries:
- `MLXVLM` contains vision language models that combine images and text prompts to produce text results, e.g. `describe this image`
- `MLXLMCommon` contains the `LanguageModel` code that is shared between `MLXLLM` and `MLXVLM`
The API between `LLM` and `VLM` is identical aside from the preparation of the `UserInput`.
```swift
let parameters = GenerateParameters()
// LLM prompt
let input = UserInput(prompt: "tell me a story")
// VLM prompt
let input = UserInput(prompt: "describe the image", images: [.url(url)])
// inference is identical
let result = try await modelContainer.perform { [generate, input] context in
let input = try await context.processor.prepare(input: input)
return try generate(input: input, parameters: parameters, context: context) { token in
// print tokens as they are generated, stop early, etc.
return .more
}
}
```
VLM example code is available in the `llm-tool` example:
```
./mlx-run llm-tool eval --help
OVERVIEW: evaluate prompt and images to generate text (VLM)
USAGE: llm-tool eval <options>
OPTIONS:
--model <model> Name of the huggingface model or absolute path to directory
-p, --prompt <prompt> The message to be processed by the model. Use @path,@path to load from files, e.g. @/tmp/prompt.txt
--resize <resize> Resize images to this size (width, height)
--image <image> Paths or urls for input images
...
```
Probably no effect to code external to this repo:
- the mlx-swift-examples.xcodeproj now references the local `Package.swift` to build the libraries
- the example code now uses the naming matching external uses of mlx-swift-examples, e.g. `import LLM` -> `import MLXLLM`
- the library directories are now renamed to match their target names, e.g. `LLM` -> `MLXLLM`
Breaking:
- some code will now need to import both `MLXLLM` and `MLXLMCommon` (particularly code that loads models)
- `MLXLMCommon` contains the common API between LLM and VLM
```swift
import MLXLLM
import MLXLMCommon
```
- constants for models have moved from `ModelConfiguration` to `ModelRegistry`
- this is `MLXLM.ModelRegistry` and there is also `MLXVLM.ModelRegistry`
```diff
- let modelConfiguration = ModelConfiguration.phi3_5_4bit
+ let modelConfiguration = ModelRegistry.phi3_5_4bit
```
- the `loadModelContainer()` function is now `LLMModelFactory.shared.loadContainer()`
- there is a new `VLMModelFactory` with identical methods for loading VLMs
```diff
- let modelContainer = try await LLM.loadModelContainer(configuration: modelConfiguration)
- {
+ let modelContainer = try await LLMModelFactory.shared.loadContainer(
+ configuration: modelConfiguration
+ ) {
```
- `ModelContainer.perform` is now throwing (and in MLXLMCommon):
```diff
- let result = await modelContainer.perform { model, tokenizer in
- LLM.generate(
+ let result = try await modelContainer.perform { model, tokenizer in
+ try MLXLMCommon.generate(
```
- `ModelConfiguration` previously had a way to register new configurations. This is now on `LLMModelFactory` (and `VLMModelFactory` has the same):
```swift
LLMModelFactory.shared.modelRegistry.register(configurations: [modelConfiguration])
```
An example at the end shows all of these deprecations in context.
**Prefer to use the `ModelContext.processor` to prepare prompts.** Previously users would pass in a bare `[Int]` of tokens, but in order to support more complex inputs (VLMs) the use of bare `[Int]` is deprecated and callers should use `UserInput` and `LMInput`.
For example, previously callers might have done something like this:
```swift
let messages = [["role": "user", "content": prompt]]
let promptTokens = try await modelContainer.perform { _, tokenizer in
try tokenizer.applyChatTemplate(messages: messages)
}
```
Now that should be:
```swift
let input = try await context.processor.prepare(input: .init(prompt: prompt))
```
Which will initialize a `UserInput` from the prompt text and produce an `LMInput` that can be used to generate tokens.
**This call to `generate()` is now deprecated:**
```swift
public func generate(
promptTokens: [Int], parameters: GenerateParameters, model: any LanguageModel,
tokenizer: Tokenizer,
extraEOSTokens: Set<String>? = nil,
didGenerate: ([Int]) -> GenerateDisposition
) throws -> GenerateResult
```
This consumed the `[Int]` variety of tokens. Now this is preferred:
```swift
public func generate(
input: LMInput, parameters: GenerateParameters, context: ModelContext,
didGenerate: ([Int]) -> GenerateDisposition
) throws -> GenerateResult
```
**This method on `ModelContainer` is now deprecated:**
```swift
/// Perform an action on the model and/or tokenizer. Callers _must_ eval any `MLXArray` before returning as
/// `MLXArray` is not `Sendable`.
@available(*, deprecated, message: "prefer perform(_:) that uses a ModelContext")
public func perform<R>(_ action: @sendable (any LanguageModel, Tokenizer) throws -> R) rethrows
-> R
```
use this one instead (though the former still works):
```swift
/// Perform an action on the ``ModelContext``. Callers _must_ eval any `MLXArray` before returning as
/// `MLXArray` is not `Sendable`.
public func perform<R>(_ action: @sendable (ModelContext) async throws -> R) async rethrows -> R
```
Putting all of these deprecations together, previously you might have generated text like this:
```swift
let messages = [["role": "user", "content": prompt]]
let promptTokens = try await modelContainer.perform { _, tokenizer in
try tokenizer.applyChatTemplate(messages: messages)
}
let result = await modelContainer.perform { model, tokenizer in
LLM.generate(
promptTokens: promptTokens, parameters: generateParameters, model: model,
tokenizer: tokenizer, extraEOSTokens: modelConfiguration.extraEOSTokens
) { tokens in ... }
}
```
now do this:
```swift
let result = try await modelContainer.perform { context in
let input = try await context.processor.prepare(input: .init(prompt: prompt))
return try MLXLMCommon.generate(
input: input, parameters: generateParameters, context: context
) { tokens in ... }
}
```
Co-authored-by: Awni Hannun <[email protected]>1 parent 318044f commit 6ef303b
File tree
65 files changed
+5152
-2628
lines changed- .circleci
- Applications
- LLMEval
- ViewModels
- LoRATrainingExample
- MNISTTrainer
- Libraries
- LLM
- MLXLLM
- Models
- MLXLMCommon
- MLXMNIST
- MLXVLM
- Models
- MNIST
- StableDiffusion
- Tools
- llm-tool
- mnist-tool
- mlx-swift-examples.xcodeproj
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
65 files changed
+5152
-2628
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
39 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | 3 | | |
| 4 | + | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
159 | 160 | | |
160 | 161 | | |
161 | 162 | | |
162 | | - | |
| 163 | + | |
163 | 164 | | |
164 | 165 | | |
165 | 166 | | |
| |||
185 | 186 | | |
186 | 187 | | |
187 | 188 | | |
188 | | - | |
189 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
190 | 192 | | |
191 | 193 | | |
192 | 194 | | |
193 | 195 | | |
194 | 196 | | |
195 | 197 | | |
196 | | - | |
197 | | - | |
198 | | - | |
| 198 | + | |
| 199 | + | |
199 | 200 | | |
200 | 201 | | |
201 | 202 | | |
| |||
217 | 218 | | |
218 | 219 | | |
219 | 220 | | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | 221 | | |
226 | 222 | | |
227 | 223 | | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
232 | 228 | | |
233 | 229 | | |
234 | 230 | | |
235 | | - | |
| 231 | + | |
236 | 232 | | |
237 | 233 | | |
238 | 234 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | 2 | | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | 3 | | |
| 4 | + | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
122 | 123 | | |
123 | 124 | | |
124 | 125 | | |
125 | | - | |
| 126 | + | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
| |||
141 | 142 | | |
142 | 143 | | |
143 | 144 | | |
144 | | - | |
145 | | - | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
146 | 148 | | |
147 | 149 | | |
148 | 150 | | |
| |||
160 | 162 | | |
161 | 163 | | |
162 | 164 | | |
163 | | - | |
| 165 | + | |
164 | 166 | | |
165 | 167 | | |
166 | 168 | | |
| |||
196 | 198 | | |
197 | 199 | | |
198 | 200 | | |
199 | | - | |
| 201 | + | |
200 | 202 | | |
201 | | - | |
| 203 | + | |
202 | 204 | | |
203 | 205 | | |
204 | 206 | | |
| |||
208 | 210 | | |
209 | 211 | | |
210 | 212 | | |
211 | | - | |
| 213 | + | |
212 | 214 | | |
213 | 215 | | |
214 | | - | |
215 | | - | |
| 216 | + | |
| 217 | + | |
216 | 218 | | |
217 | 219 | | |
218 | 220 | | |
| |||
240 | 242 | | |
241 | 243 | | |
242 | 244 | | |
243 | | - | |
| 245 | + | |
244 | 246 | | |
245 | | - | |
| 247 | + | |
| 248 | + | |
246 | 249 | | |
247 | 250 | | |
248 | 251 | | |
| |||
269 | 272 | | |
270 | 273 | | |
271 | 274 | | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | 275 | | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
289 | 285 | | |
290 | | - | |
291 | | - | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
292 | 289 | | |
293 | 290 | | |
294 | 291 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | | - | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | | - | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
0 commit comments