Commit a4057ab
Janus-Pro Mixed-task SFT Training (mindspore-lab#911)
* fixe unnessary flake8 check about the colon space
* fix typo and refactor readme
* fix mul issues during conversion tests
* fix mul issues during conversion tests
* fix redundancy
* sv3d conversion test
* pdat
* update linting
* rm flake8 change
* update sampling script: rm unnecessary device_id arg
* update sampling script: rm unnecessary device_id arg
* init janus
* update
* add vq16
* init januspro models ph
* vq load ckpt
* init commit 4 siglip
* update siglip
* same
* fix interpolate
* test vq fp32 mre<1e-5
* fix norm, decode image ok in bf16
* update siglip
* siglip testing, mae around 3e-2
* for merge
* same
* add VLChatProcessor and test ok
* add chat process test
* use pil resize
* support loading model.bin directly from `from_pretrained`, vlm under test
* add mlp projector (und. vision aligner)
* vlm under test: LlamaForCausalLM discprency with the hf
* add vqa test
* fix conflicts
* get correct GELU setup
* LlamaModel return_dict=True
* Merge from HaFred/mindone/janus
* gen_inf done, w/o kv cache, siglip precision needed to be aligned
* tqdm better vis
* vqa infer runnable, generated answer not compelete
* reshape tensor
* fix length and graph mode for VQA, answer is complete and better aligned
* setup static cache, but self(input_embeds) output tokens still not correct
* fix t2v w/o kv cache, gen ok
* fix dim
* housekeeping t2i
* fix temperature=0 bug
* add readme and slightly refactor inference
* Update README.md
* rm file
* add file
* support `use_cache==True` with an explicit init of static cache in tuples, but the speed is even slower than `use_cache==False`
* add gradio and pyproject.toml
* add file for gradio
* update readme and fix gradio
* Update README.md
* small fix
* fix bug in graph mode vqa
* rm formula example
* patching others
* support loading sharded ckpt, undo saving .safetensors
* add vit block test
* rm attrdict and use addict for py3.10
* use ms.Tensor instead of mint.Tensor for compat
* fix vanilla attn in siglip
* use mint GELU in sigLIP
* fix above
* add vit block test
* align the naming of vqa_inference
* update readme
* add throughput calc
* t2i support graph mode w/ dynamic shape input
* make t2i generation printing time traces, and support graph mode now with ops.multinomial
* refine multinomal
* add multinomial test
* padded kv cache for graph mode t2i
* auto select multinomial
* fix transformers FA dtype
* fix merge conflict
* fix vision encode precision, formula parsing ok
* revert llama min_dtype to eliminate en/zh mix in vqa
* for t2i llama_model forward, init a static cache outside the comp graph
* fp32, this is for the prev commit
* fix typo
* fix typo
* fix typo
* llamamodel.construct() handling dynamic shape in graph mode
* fix ms2.5 pynative error
* acc infer by mint.topk
* transformers use mint.multinomial for ms2.5
* better precision aligned (mint.nn.Conv2d), requires ms2.5
* update readme
* support t2i graph mode kv cache, bringing into a speed boosting
* fix an initial negligence
* t2i graph mode only `set_inputs` for no cache cases, coz with cache it's slower to `set_inputs` for dynamic shape
* fix typo and default
* logging the token/s for vqa excluding graph compilation time,
* include performance tables
* update the correct perforrmance for the graph mode-compatiable code
* revert: no more .bin loading
* update readme, ms_compiler_cache enabled
* lint files
* lint files
* update main page readme
* lint files
* lint files
* linting
* update both inference files for faster model init
* fix linting
* update readme
* merge samit code
* match with new np
* dev loss
* mixed-task training `bs=6` ok, `bs=8` oom
* mixed-task sft training, support slice dataset or weightrandsamp dataset
* mixed-task sft training, support slice dataset or weightrandsamp dataset
* lint
* update comment
* support graph mode mixed-data sft, with `use_value_and_grad==False`
* support graph mode mixed-data sft, with `use_value_and_grad==False`
* rm redundant code in siglip
* rn no grad trunc normal used in siglip
* lint
* graph mode sft launch script
* lint
* lint
* lint
* lint
* rm redundancy
* return precommit
* update readme and set vlm.construct() default as pynative version for any-task training
* fix readme
* update discrepancy
* train mode selection
* update modeling_vlm which already works under graph to further support pynative at the same time
* fix ci
* fix ci
* update for graphmode mixed-task sft
* update req
* update datasets
* update training script to support all testing examples in training.md
* fix lint
* fix lint
* refactor
* update readme, train script for better readability
* update readme, train script for better readability
* doc: put into the right order for doing single task sft graph mode patching
---------
Co-authored-by: SamitHuang <285365963@qq.com>1 parent 6df5641 commit a4057ab
File tree
21 files changed
+858
-256
lines changed- examples/janus
- docs
- janus
- models
- timm
- train
- scripts
- tests
21 files changed
+858
-256
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
25 | 34 | | |
26 | 35 | | |
27 | 36 | | |
28 | 37 | | |
29 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
30 | 50 | | |
31 | 51 | | |
32 | 52 | | |
| |||
35 | 55 | | |
36 | 56 | | |
37 | 57 | | |
38 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
39 | 63 | | |
40 | 64 | | |
41 | | - | |
| 65 | + | |
42 | 66 | | |
43 | 67 | | |
44 | | - | |
| 68 | + | |
45 | 69 | | |
46 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
47 | 75 | | |
| 76 | + | |
48 | 77 | | |
49 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
50 | 86 | | |
51 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
52 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
53 | 97 | | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
54 | 102 | | |
55 | 103 | | |
56 | 104 | | |
| |||
64 | 112 | | |
65 | 113 | | |
66 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
Whitespace-only changes.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
289 | 289 | | |
290 | 290 | | |
291 | 291 | | |
292 | | - | |
| 292 | + | |
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
| |||
321 | 321 | | |
322 | 322 | | |
323 | 323 | | |
324 | | - | |
| 324 | + | |
325 | 325 | | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
330 | 332 | | |
331 | 333 | | |
332 | 334 | | |
| |||
342 | 344 | | |
343 | 345 | | |
344 | 346 | | |
345 | | - | |
| 347 | + | |
346 | 348 | | |
347 | 349 | | |
348 | 350 | | |
| |||
404 | 406 | | |
405 | 407 | | |
406 | 408 | | |
407 | | - | |
| 409 | + | |
408 | 410 | | |
409 | | - | |
410 | | - | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
411 | 414 | | |
412 | 415 | | |
413 | | - | |
414 | 416 | | |
415 | 417 | | |
416 | 418 | | |
| |||
420 | 422 | | |
421 | 423 | | |
422 | 424 | | |
423 | | - | |
424 | 425 | | |
425 | 426 | | |
426 | 427 | | |
| |||
435 | 436 | | |
436 | 437 | | |
437 | 438 | | |
438 | | - | |
| 439 | + | |
439 | 440 | | |
440 | 441 | | |
441 | 442 | | |
| |||
472 | 473 | | |
473 | 474 | | |
474 | 475 | | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
475 | 608 | | |
476 | 609 | | |
477 | 610 | | |
| |||
0 commit comments