Skip to content

Conversation

@leondz
Copy link
Collaborator

@leondz leondz commented May 7, 2025

Support construction-time loading of optional modules. Includes

  • many generators now using this pattern
  • pattern for loading modules at run-time and failing if absent
  • optional requirements moved to pyproject.toml options and pruned from requirements.txt - deferred to feature: disable optional imports by default #1475
  • _load_deps and _clear_deps pattern, used in generator constructor and _load_client / _clear_client
  • tests to check that optional deps are in the right place and not in the wrong place

Todo / in scope:

  • skipping tests requiring absent optional imports
  • take a stance on whether to skip or stop on missing module, per plugin type (likely skip for probe, stop for main generator, beyond that tba)
  • Consider moving load/clear dep function defs up and out for all classes (where - garak._plugins? How - assign func member? create mixin?)
  • validation: exception throws correctly / doesn't throw if dependency present; module still runs
    • generators.cohere
    • generators.langchain
    • generators.litellm
    • generators.mistral
    • generators.nemoguardrails
    • generators.nemollm
    • generators.ollama
    • generators.optimum (huggingface.OptimumPipeline) - exception fires, pkg install seems borked
    • generators.replicate
    • probes.audio

Not done:

  • load/clear deps for plugin types other than generators
  • gh actions for testing optional components
  • deeper validation

Out of scope:

  • handling of versioning outside of pyproject.toml

Resolves #101

@leondz leondz requested a review from jmartin-tech May 7, 2025 11:00
@leondz leondz added the architecture Architectural upgrades label May 7, 2025
@leondz leondz requested review from jmartin-tech and removed request for jmartin-tech May 7, 2025 11:04
@leondz leondz added this to the 0.11.0 milestone May 8, 2025
@leondz leondz self-assigned this May 8, 2025
@leondz leondz removed this from the release 0.11.0 milestone May 8, 2025
@leondz leondz assigned jmartin-tech and unassigned leondz Dec 1, 2025
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final testing in progress, landing this may present some risk as testing resources for every generator changed is not feasible.

self.client = self.replicate

def _clear_client(self):
self._clear_deps()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to watch these for #1471 related issues.

except ImportError as e:
logging.critical("Missing libraries for audio modules.", exc_info=e)
raise GarakException("Missing Libraries for audio modules.")
from datasets import load_dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a change in behavior here, with extra_dependency_names loading soundfile it becomes a required dependency for this probe. Before this PR it was only required if no user provided files already exist in the data_path.

Not a blocker, just noting for awareness.

quoted_module_list = "'" + "', '".join(absent_modules) + "'"
module_list = " ".join(absent_modules)
msg = f"⛔ Plugin '{calling_module}' requires Python modules which aren't installed/available: {quoted_module_list}"
hint = f"💡 Try 'pip install {module_list}' to get missing module."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting, it might be nice to have this reference a group in the pyproject.toml. This may be added in #1475 when the as these groups are introduced.

".".join(dependency_module_name.split(".")[: n + 1])
for n in range(dependency_module_name.count(".") + 1)
]:
if importlib.util.find_spec(dependency_path) is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing shows find_spec needs to be provided the runtime package name, currently dependency_path entries are the pypi package name.

This can be tested by attempting to load generators.huggingface.LLaVA.

⛔ Plugin 'generators.huggingface.LLaVA' requires Python modules which aren't installed/available: 'pillow'
💡 Try 'pip install pillow' to get missing module.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/jemartin/Projects/nvidia/garak/garak/__main__.py", line 14, in <module>
    main()
  File "/home/jemartin/Projects/nvidia/garak/garak/__main__.py", line 9, in main
    cli.main(sys.argv[1:])
  File "/home/jemartin/Projects/nvidia/garak/garak/cli.py", line 596, in main
    generator = _plugins.load_plugin(
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jemartin/Projects/nvidia/garak/garak/_plugins.py", line 428, in load_plugin
    _import_failed(absent_modules, full_plugin_name)
  File "/home/jemartin/Projects/nvidia/garak/garak/_plugins.py", line 493, in _import_failed
    raise ModuleNotFoundError(msg)
ModuleNotFoundError: ⛔ Plugin 'generators.huggingface.LLaVA' requires Python modules which aren't installed/available: 'pillow'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this code in particular, I don't think this validation is needed here at this time. It may be more appropriate have handle a missing import exception around the module instantiation call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this code in particular, I don't think this validation is needed here at this time. It may be more appropriate have handle a missing import exception around the module instantiation call.

I believe the intent here is to summarise in one pass a list of all missing module names, which is determined using find_spec.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply not the right place for this, load_plugin is called at instantiation so this is only evaluating for this plugin which is fully processed by _load_deps so not needed here.

In the next iteration PR #1475 we might want to add an early preprocessor that takes a comprehensive look at the full run config to determine if all dependencies required for the full run are available however I am thinking that might turn out to be an overly complex goal that may get deferred or shelved in favor of allowing the run to skip probes that happen to be missing dependencies instead for blocking start of the run. More discussion of that can happen in that PR in later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this was extra code added after initial PR was made when a feature was requested to list all missing modules rather than one at a time. Is that additional feature no longer a requirement to land?

Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion offered removes the early extra deps check and defers the load and exception to the actual instance creation step.

This pattern is highly coupled to the current extra_dependency_names design, however no more so than the original AFAICT. One drawback is this always reports all extra_dependency_names in the hint vs only the specific one that triggered the raise.

Comment on lines 412 to 428
if category in PLUGIN_TYPES:
extra_dependency_names = PluginCache.instance()[category][full_plugin_name][
"extra_dependency_names"
]
if len(extra_dependency_names) > 0:
absent_modules = []
for dependency_module_name in extra_dependency_names:
for (
dependency_path
) in [ # support both plain names and also multi-point names e.g. langchain.llms
".".join(dependency_module_name.split(".")[: n + 1])
for n in range(dependency_module_name.count(".") + 1)
]:
if importlib.util.find_spec(dependency_path) is None:
absent_modules.append(dependency_module_name)
if len(absent_modules):
_import_failed(absent_modules, full_plugin_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the goal of load_plugin is to return an instance no need to check the dependencies early just let the instance creation raise:

Suggested change
if category in PLUGIN_TYPES:
extra_dependency_names = PluginCache.instance()[category][full_plugin_name][
"extra_dependency_names"
]
if len(extra_dependency_names) > 0:
absent_modules = []
for dependency_module_name in extra_dependency_names:
for (
dependency_path
) in [ # support both plain names and also multi-point names e.g. langchain.llms
".".join(dependency_module_name.split(".")[: n + 1])
for n in range(dependency_module_name.count(".") + 1)
]:
if importlib.util.find_spec(dependency_path) is None:
absent_modules.append(dependency_module_name)
if len(absent_modules):
_import_failed(absent_modules, full_plugin_name)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in tension with the requested feature of listing all missing modules at once rather than piecemeal? I realise the granularity is different, but don't we want to cause the minimum number of user round trips between execution and dep installation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, load_plugin is only called when actually instantiating a plugin, testing in this location we will not evaluate all plugins required for the run. Since generators are the primary plugin type using this pattern just removing this is acceptable for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should

  1. merge these threads
  2. get clear about reqs for this PR

I don't mind if we advise all missing modules at once or piecemeal. The latter has better UX. Agree this feature should belong in the right PR and code location, if the feature is going to manifest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, based on the targeted changes in scope this PR should remove this block.

Also the revisions already to made to load_deps will provide a consistent verbose list of all packages required for the plugin that is attempting to load. This covers the same scope as what this block does with added context available.

jmartin-tech added a commit to jmartin-tech/garak that referenced this pull request Dec 12, 2025
@jmartin-tech jmartin-tech force-pushed the update/optional_imports branch from 87b69a2 to 0be265b Compare December 12, 2025 21:51
@jmartin-tech jmartin-tech merged commit c360fee into NVIDIA:main Dec 12, 2025
15 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

architecture Architectural upgrades

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests: test garak's loading performance improved / consistent plugin lazy loading

3 participants