-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Motivation: users may not always know how to best use fms-acceleration plugins. While there exists quite a bit of documentation on how to use the plugins, typical users will not read carefully. The hope is that we can reduce the burden on users to apply plugins if "there is no loss (and only gain) from doing so".
Users: These include various types:
- Users of the product Docker images.
- Users of
fms-hf-tuningas a tuning library.
Idea: Implement a default behavior:
- that can be tied to specified plugins that have been installed.
- if a plugin is not installed, it is construed that the plugin is unwanted and no
defaultbehavior should be considered. - when
defaultbehavior is activated on an installed plugin, the plugin will activate without any concious behavior from the user.
To clarify what we mean by no concious behavior, it means that the installed plugin with default behavior will activate,
even if the user has not specified any command line arguments to activate it.
Various options for instantiating default behavior
Manual
provide a set of recommended set command line args, or append to command line args that a user provides.
This is the simplest form of default behavior.
- requires no code change to
fms-accelerationand its integration infms-hf-tuning. - will however require code changes at the Docker packaging level to provide switches to append the commad line args.
- it is very explicit and clear. Easy to debug because there will be command line traces.
- there is no graceful failing. If the default args fail the run, then the user will have to re-run without the recommended defaults.
Automatic
augment the AccelerationFrameworkConfig integration of fms-accel into fms-hf-tuning to automatically run an installed plugin with defaults.
This is going to be much more complicated and various considerations must be carefully made.
- requires quit abit code change to
fms-accelerationand its integration infms-hf-tuning, as the integration was not originally planned for default behaviors in mind. - may require very little changes on the Docker packaging.
- We lose the explicit command line arguments. Debugging will require peering into the stderr logs. For example, we should set
TRANSFORMERS_VERBOSITY=infoto log whichAccelerationFrameworkPluginshad been activated. - need to account for graceful failing the best we can.
- need to be able to disable default behavior. If the run fails we need to allow the user to take over.
Some of the additional code items would be:
- implement a way to override default behavior, this includes:
- user manually activates the command args and sets (no default values).
- user does not want automatic default behavior, and wants an installed plugin to remain inactive if no command line args are specified.
- implement feedback to indicate to that a plugin has been deactivated because it had failed to instantiate default behavior.
- implement logic to allow for a plugin to be set default
- select plugins that should have default behavior. These must:
- be able to fail over safely relatively most of the time.
Plugins that could be default
These may include:
padding_free: this plugin switching out the data collator, or patching the model (if transformers < 0.44).fused-ops-and-kernels: this plugin includes some model-specific rules to replace functions with kernelized versions. If the model-rules do not match the model archiecture, it will fail over quite safely.- however one complication is that the fused-op for quantized lora requires a manual setting of the 4bit method (e.g., auto_gptq or bnb), because there is not really a simple way to infer it at the moment.
- One way to handle such complications, is to simply do not make lora fused op a default, just the other kernels.