Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
7440660 to
f19cff8
Compare
Dolomite has become a legacy performance and compatibility dependency required before Granite was upstreamed to Huggingface. For >=python3.12, Dolomite has been found to not work correctly. Therefore, we're dropping support. Signed-off-by: James Kunstle <jkunstle@redhat.com>
f19cff8 to
0d977c9
Compare
| ) | ||
|
|
||
| block_checkpointing(model, **kwargs) | ||
| # # this function is for supporting gradient checkpointing for padding free |
| default="HYBRID_SHARD", | ||
| help="Sharding strategy to be used for FSDP distributed training.", | ||
| ) | ||
| parser.add_argument("--use_dolomite", action="store_true") |
There was a problem hiding this comment.
Above, I think we can remove
if train_args.use_dolomite:
command.append("--use_dolomite")
| model, | ||
| lora_config, | ||
| args.distributed_training_framework, | ||
| gradient_checkpointing=not args.use_dolomite, |
There was a problem hiding this comment.
now that it's always True, I think you could do some code simplification in prepare_peft_model by removing the argument and hardcoding the True value inside the function.
| ) | ||
| args.lora_config = lora_config | ||
| elif not args.use_dolomite: | ||
| model.gradient_checkpointing_enable() |
There was a problem hiding this comment.
before the change, in case of lora_r > 0, we were not calling gradient_checkpointing_enable. Now we do. Is it an intentional change?
| if train_args.use_dolomite and train_args.disable_flash_attn: | ||
| raise RuntimeError( | ||
| "ERROR: Trying to use dolomite padding-free transformer without flash attention is not supported" | ||
| if train_args.use_dolomite: |
There was a problem hiding this comment.
consider using https://docs.pydantic.dev/latest/concepts/fields/#deprecated-fields instead
|
|
||
| def check_flash_attn_enabled(disable_flash_attn: bool, use_dolomite: bool) -> bool: | ||
| def check_flash_attn_enabled(disable_flash_attn: bool) -> bool: | ||
| if not disable_flash_attn: |
There was a problem hiding this comment.
this code is now quite silly. :) how about
if disable_flash_attn:
return False
if supports...():
return True
raise ...
|
Not approving to clarify if we should indeed call |
|
This pull request has merge conflicts that must be resolved before it can be |
|
this will need a rebase after model class landed. |
|
Will re-open this PR based off of changes from #572 |
Fixes issue instructlab#3465 Deletes python 3.12 tests as since dolomite is being removed in instructlab/training#589 Signed-off-by: Jon Thomas <jrthms@users.noreply.github.com>
Dolomite has become a legacy performance and compatibility dependency required before Granite was upstreamed to Huggingface. For >=python3.12, Dolomite has been found to not work correctly. Therefore, we're dropping support.