-
Notifications
You must be signed in to change notification settings - Fork 774
feature: deferred loading and requirement pruning #1199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t checks via plugin cache
Signed-off-by: Leon Derczynski <[email protected]>
jmartin-tech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final testing in progress, landing this may present some risk as testing resources for every generator changed is not feasible.
| self.client = self.replicate | ||
|
|
||
| def _clear_client(self): | ||
| self._clear_deps() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to watch these for #1471 related issues.
| except ImportError as e: | ||
| logging.critical("Missing libraries for audio modules.", exc_info=e) | ||
| raise GarakException("Missing Libraries for audio modules.") | ||
| from datasets import load_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a change in behavior here, with extra_dependency_names loading soundfile it becomes a required dependency for this probe. Before this PR it was only required if no user provided files already exist in the data_path.
Not a blocker, just noting for awareness.
| quoted_module_list = "'" + "', '".join(absent_modules) + "'" | ||
| module_list = " ".join(absent_modules) | ||
| msg = f"⛔ Plugin '{calling_module}' requires Python modules which aren't installed/available: {quoted_module_list}" | ||
| hint = f"💡 Try 'pip install {module_list}' to get missing module." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting, it might be nice to have this reference a group in the pyproject.toml. This may be added in #1475 when the as these groups are introduced.
garak/_plugins.py
Outdated
| ".".join(dependency_module_name.split(".")[: n + 1]) | ||
| for n in range(dependency_module_name.count(".") + 1) | ||
| ]: | ||
| if importlib.util.find_spec(dependency_path) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing shows find_spec needs to be provided the runtime package name, currently dependency_path entries are the pypi package name.
This can be tested by attempting to load generators.huggingface.LLaVA.
⛔ Plugin 'generators.huggingface.LLaVA' requires Python modules which aren't installed/available: 'pillow'
💡 Try 'pip install pillow' to get missing module.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/jemartin/Projects/nvidia/garak/garak/__main__.py", line 14, in <module>
main()
File "/home/jemartin/Projects/nvidia/garak/garak/__main__.py", line 9, in main
cli.main(sys.argv[1:])
File "/home/jemartin/Projects/nvidia/garak/garak/cli.py", line 596, in main
generator = _plugins.load_plugin(
^^^^^^^^^^^^^^^^^^^^^
File "/home/jemartin/Projects/nvidia/garak/garak/_plugins.py", line 428, in load_plugin
_import_failed(absent_modules, full_plugin_name)
File "/home/jemartin/Projects/nvidia/garak/garak/_plugins.py", line 493, in _import_failed
raise ModuleNotFoundError(msg)
ModuleNotFoundError: ⛔ Plugin 'generators.huggingface.LLaVA' requires Python modules which aren't installed/available: 'pillow'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this code in particular, I don't think this validation is needed here at this time. It may be more appropriate have handle a missing import exception around the module instantiation call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this code in particular, I don't think this validation is needed here at this time. It may be more appropriate have handle a missing import exception around the module instantiation call.
I believe the intent here is to summarise in one pass a list of all missing module names, which is determined using find_spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simply not the right place for this, load_plugin is called at instantiation so this is only evaluating for this plugin which is fully processed by _load_deps so not needed here.
In the next iteration PR #1475 we might want to add an early preprocessor that takes a comprehensive look at the full run config to determine if all dependencies required for the full run are available however I am thinking that might turn out to be an overly complex goal that may get deferred or shelved in favor of allowing the run to skip probes that happen to be missing dependencies instead for blocking start of the run. More discussion of that can happen in that PR in later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this was extra code added after initial PR was made when a feature was requested to list all missing modules rather than one at a time. Is that additional feature no longer a requirement to land?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion offered removes the early extra deps check and defers the load and exception to the actual instance creation step.
This pattern is highly coupled to the current extra_dependency_names design, however no more so than the original AFAICT. One drawback is this always reports all extra_dependency_names in the hint vs only the specific one that triggered the raise.
garak/_plugins.py
Outdated
| if category in PLUGIN_TYPES: | ||
| extra_dependency_names = PluginCache.instance()[category][full_plugin_name][ | ||
| "extra_dependency_names" | ||
| ] | ||
| if len(extra_dependency_names) > 0: | ||
| absent_modules = [] | ||
| for dependency_module_name in extra_dependency_names: | ||
| for ( | ||
| dependency_path | ||
| ) in [ # support both plain names and also multi-point names e.g. langchain.llms | ||
| ".".join(dependency_module_name.split(".")[: n + 1]) | ||
| for n in range(dependency_module_name.count(".") + 1) | ||
| ]: | ||
| if importlib.util.find_spec(dependency_path) is None: | ||
| absent_modules.append(dependency_module_name) | ||
| if len(absent_modules): | ||
| _import_failed(absent_modules, full_plugin_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the goal of load_plugin is to return an instance no need to check the dependencies early just let the instance creation raise:
| if category in PLUGIN_TYPES: | |
| extra_dependency_names = PluginCache.instance()[category][full_plugin_name][ | |
| "extra_dependency_names" | |
| ] | |
| if len(extra_dependency_names) > 0: | |
| absent_modules = [] | |
| for dependency_module_name in extra_dependency_names: | |
| for ( | |
| dependency_path | |
| ) in [ # support both plain names and also multi-point names e.g. langchain.llms | |
| ".".join(dependency_module_name.split(".")[: n + 1]) | |
| for n in range(dependency_module_name.count(".") + 1) | |
| ]: | |
| if importlib.util.find_spec(dependency_path) is None: | |
| absent_modules.append(dependency_module_name) | |
| if len(absent_modules): | |
| _import_failed(absent_modules, full_plugin_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in tension with the requested feature of listing all missing modules at once rather than piecemeal? I realise the granularity is different, but don't we want to cause the minimum number of user round trips between execution and dep installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, load_plugin is only called when actually instantiating a plugin, testing in this location we will not evaluate all plugins required for the run. Since generators are the primary plugin type using this pattern just removing this is acceptable for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should
- merge these threads
- get clear about reqs for this PR
I don't mind if we advise all missing modules at once or piecemeal. The latter has better UX. Agree this feature should belong in the right PR and code location, if the feature is going to manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, based on the targeted changes in scope this PR should remove this block.
Also the revisions already to made to load_deps will provide a consistent verbose list of all packages required for the plugin that is attempting to load. This covers the same scope as what this block does with added context available.
Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>
Signed-off-by: Jeffrey Martin <[email protected]>
87b69a2 to
0be265b
Compare
Support construction-time loading of optional modules. Includes
optional requirements moved to- deferred to feature: disable optional imports by default #1475pyproject.tomloptions and pruned fromrequirements.txt_load_depsand_clear_depspattern, used in generator constructor and_load_client/_clear_clientTodo / in scope:
garak._plugins? How - assign func member? create mixin?)generators.coheregenerators.langchaingenerators.litellmgenerators.mistralgenerators.nemoguardrailsgenerators.nemollmgenerators.ollamagenerators.optimum(huggingface.OptimumPipeline) - exception fires, pkg install seems borkedgenerators.replicateprobes.audioNot done:
generatorsOut of scope:
pyproject.tomlResolves #101