Commit c45afb5
committed
feat: add Gemma 3 Vision-Language Model (VLM) support
- Add SigLIP Vision Encoder (27 transformer layers)
- Add Multi-Modal Projector (AvgPool + Linear projection)
- Add ImageProcessor for loading/resizing/normalizing images
- Add Gemma3VLM model class combining vision + text
- Extend CLI with --image flag for VLM generation
- Add /image command for interactive VLM usage
- Add isVLM() and generateWithImage() to Node.js API
- Auto-detect VLM models via vision_config in config.json
Supports Gemma 3 4B, 12B, 27B vision variants with:
- 896x896 image input
- 256 visual tokens per image
- Streaming output for both text and VLM generation1 parent 740dc9d commit c45afb5
File tree
10 files changed
+1506
-13
lines changed- packages
- node-mlx
- native/src
- src
- swift/Sources
- NodeMLXCore
- Vision
- NodeMLX
10 files changed
+1506
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
| 24 | + | |
| 25 | + | |
22 | 26 | | |
23 | 27 | | |
24 | 28 | | |
| |||
58 | 62 | | |
59 | 63 | | |
60 | 64 | | |
| 65 | + | |
| 66 | + | |
61 | 67 | | |
62 | 68 | | |
63 | 69 | | |
| |||
236 | 242 | | |
237 | 243 | | |
238 | 244 | | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
239 | 320 | | |
240 | 321 | | |
241 | 322 | | |
| |||
282 | 363 | | |
283 | 364 | | |
284 | 365 | | |
| 366 | + | |
| 367 | + | |
285 | 368 | | |
286 | 369 | | |
287 | 370 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
60 | 64 | | |
61 | 65 | | |
| 66 | + | |
62 | 67 | | |
63 | 68 | | |
64 | 69 | | |
| |||
167 | 172 | | |
168 | 173 | | |
169 | 174 | | |
| 175 | + | |
170 | 176 | | |
171 | 177 | | |
172 | 178 | | |
| |||
178 | 184 | | |
179 | 185 | | |
180 | 186 | | |
181 | | - | |
| 187 | + | |
| 188 | + | |
182 | 189 | | |
183 | 190 | | |
184 | 191 | | |
| |||
235 | 242 | | |
236 | 243 | | |
237 | 244 | | |
238 | | - | |
239 | | - | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
240 | 255 | | |
241 | 256 | | |
242 | 257 | | |
| |||
374 | 389 | | |
375 | 390 | | |
376 | 391 | | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
377 | 415 | | |
378 | 416 | | |
379 | 417 | | |
380 | 418 | | |
381 | 419 | | |
382 | | - | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
383 | 426 | | |
384 | 427 | | |
385 | 428 | | |
386 | 429 | | |
387 | 430 | | |
388 | 431 | | |
389 | 432 | | |
390 | | - | |
391 | | - | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
392 | 447 | | |
393 | 448 | | |
394 | 449 | | |
| |||
407 | 462 | | |
408 | 463 | | |
409 | 464 | | |
| 465 | + | |
410 | 466 | | |
411 | 467 | | |
412 | 468 | | |
413 | 469 | | |
414 | 470 | | |
415 | 471 | | |
| 472 | + | |
416 | 473 | | |
417 | 474 | | |
418 | 475 | | |
| |||
431 | 488 | | |
432 | 489 | | |
433 | 490 | | |
| 491 | + | |
| 492 | + | |
434 | 493 | | |
435 | 494 | | |
436 | 495 | | |
| |||
446 | 505 | | |
447 | 506 | | |
448 | 507 | | |
449 | | - | |
| 508 | + | |
450 | 509 | | |
451 | 510 | | |
452 | 511 | | |
453 | 512 | | |
454 | | - | |
| 513 | + | |
455 | 514 | | |
456 | 515 | | |
457 | 516 | | |
| |||
486 | 545 | | |
487 | 546 | | |
488 | 547 | | |
489 | | - | |
| 548 | + | |
490 | 549 | | |
491 | 550 | | |
492 | 551 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
27 | 34 | | |
28 | 35 | | |
29 | 36 | | |
| |||
155 | 162 | | |
156 | 163 | | |
157 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
158 | 171 | | |
159 | 172 | | |
160 | 173 | | |
| |||
302 | 315 | | |
303 | 316 | | |
304 | 317 | | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
305 | 346 | | |
306 | 347 | | |
307 | 348 | | |
| |||
0 commit comments