You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To use a defined grammar during text generation, create a logit processor and pass it to `TokenIterator`:
113
+
To use a defined grammar during text generation, use the convenient `generate` method:
114
+
115
+
```swift
116
+
let result =tryawaitgenerate(input: input, context: context, grammar: grammar)
117
+
print(result.output) // Generated text
118
+
```
119
+
120
+
You can also pass a `Generable` type as an argument to generate it:
121
+
122
+
```swift
123
+
let (result, model) =tryawaitgenerate(input: input, context: context, generating: PersonInfo.self)
124
+
print(result.output) // Generated text
125
+
print(model) // Generated model
126
+
```
127
+
128
+
With a `Generable` type, you can use streaming generation, which returns `PartiallyGenerated` content for your type:
129
+
130
+
```swift
131
+
let stream =tryawaitgenerate(input: input, context: context, generating: PersonInfo.self)
132
+
forawait content in stream {
133
+
print("Partially generated:", content)
134
+
}
135
+
```
136
+
137
+
You can also create a logit processor manually and pass it to `TokenIterator`:
73
138
74
139
```swift
75
140
let processor =tryawait GrammarMaskedLogitProcessor.from(configuration: context.configuration, grammar: grammar)
@@ -82,7 +147,7 @@ You can find more usage examples in the `MLXStructuredCLI` target and in the uni
82
147
83
148
### Performance
84
149
85
-
In synthetic tests with the Llama model and a vocabulary of 60,000 tokens, the performance drop was less than 10%. However, with real models the results are worse. In practice, you can expect generation speed to be about 15% slower.
150
+
In synthetic tests with the Llama model and a vocabulary of 60,000 tokens, the performance drop was less than 10%. However, with real models, the results are worse. In practice, you can expect generation speed to be about 15% slower.
86
151
The exact slowdown depends on the model, vocabulary size, and the complexity of your grammar.
@@ -109,10 +174,10 @@ let grammar = try Grammar.schema(.object(
109
174
description: "Movie record",
110
175
properties: [
111
176
"title": .string(),
112
-
"year": .integer(),
177
+
"year": .integer(minimum: 1900, maximum: 2026),
113
178
"genres": .array(items: .string(), maxItems: 3),
114
179
"director": .string(),
115
-
"actors": .array(items: .string(), maxItems: 10)
180
+
"actors": .array(items: .string(), maxItems: 5)
116
181
], required: [
117
182
"title",
118
183
"year",
@@ -123,7 +188,7 @@ let grammar = try Grammar.schema(.object(
123
188
))
124
189
```
125
190
126
-
For large proprietary models like ChatGPT, this is not a problem. With the right prompt, they can successfully generate valid JSON even without constrained decoding. But with smaller models like Gemma3 270M (especially when quantized to 4-bit) the output almost always contains invalid JSON, even if the schema is provided in the prompt.
191
+
For large proprietary models like ChatGPT, this is not a problem. With the right prompt, they can successfully generate valid JSON even without constrained decoding. However, with smaller models like Gemma3 270M (especially when quantized to 4-bit), the output almost always contains invalid JSON, even if the schema is provided in the prompt.
127
192
128
193
```plain
129
194
[
@@ -156,23 +221,22 @@ Here is the output using constrained decoding:
156
221
157
222
```plain
158
223
{
159
-
"director": "Christian Bale",
160
-
"year": 2008,
161
224
"title": "The Dark Knight",
225
+
"year": 2008,
226
+
"genres": [
227
+
"superhero",
228
+
"crime"
229
+
],
230
+
"director": "Christopher Nolan",
162
231
"actors": [
163
232
"Christian Bale",
164
233
"Heath Ledger",
165
234
"Michael Caine"
166
-
],
167
-
"genres": [
168
-
"crime",
169
-
"action",
170
-
"mystery"
171
235
]
172
236
}
173
237
```
174
238
175
-
The order of keys here is random because `Dictionary` in Swift is unordered. I plan to address this in the future. However, the output is fully valid JSON that exactly matches the provided schema. This shows that, with the right approach, even small models like Gemma3 270M 4-bit (which is just 150 MB) can produce correct structured output.
239
+
The output is fully valid JSON that exactly matches the provided schema. This shows that, with the right approach, even small models like Gemma3 270M 4-bit (which is just 150 MB) can produce correct structured output.
0 commit comments