@@ -268,6 +268,201 @@ bash submit.sh --help
268268 - Testing duration $\ge$ 10 mins.
269269 - Sample concatenation permutation is enabled.
270270
271+ ## Plugin System for ` mlperf-inf-mm-q3vl benchmark `
272+
273+ The ` mlperf-inf-mm-q3vl ` package supports a plugin system that allows third-party
274+ packages to register additional subcommands under ` mlperf-inf-mm-q3vl benchmark ` . This
275+ uses Python's standard entry points mechanism.
276+
277+ The purpose of this feature is to allow benchmark result submitters to customize and fit
278+ ` mlperf-inf-mm-q3vl ` to the inference system that they would like to benchmark,
279+ ** without** direct modification to the source code of ` mlperf-inf-mm-q3vl ` which is
280+ frozen after the benchmark being finalized.
281+
282+ ### How it works
283+
284+ 1 . ** Plugin Discovery** : When the CLI starts, it automatically discovers all registered
285+ plugins via the ` mlperf_inf_mm_q3vl.benchmark_plugins ` entry point group.
286+ 2 . ** Plugin Loading** : Each plugin's entry point function is called to retrieve either a
287+ single command or a Typer app.
288+ 3 . ** Command Registration** : The plugin's commands are automatically added to the
289+ ` benchmark ` subcommand group.
290+
291+ ### Example: creating a ` mlperf-inf-mm-q3vl-foo ` plugin package for ` mlperf-inf-mm-q3vl benchmark foo `
292+
293+ #### Step 1: Package Structure
294+
295+ Create a new Python package with the following structure:
296+
297+ ```
298+ mlperf-inf-mm-q3vl-foo/
299+ ├── pyproject.toml
300+ └── src/
301+ └── mlperf_inf_mm_q3vl_foo/
302+ ├── __init__.py
303+ ├── schema.py
304+ ├── deploy.py
305+ └── plugin.py
306+ ```
307+
308+ Note that this is only a minimalistically illustrative example. The users are free to
309+ structure and name their Python packages and modules in any way that they wish.
310+
311+ #### Step 2: Implement the ` mlperf-inf-mm-q3vl-foo ` plugin
312+
313+ Create your plugin entry point function in ` plugin.py ` :
314+
315+ ``` python
316+ """ Plugin to support benchmarking the Foo inference system."""
317+
318+ from typing import Annotated
319+ from collections.abc import Callable
320+ from loguru import logger
321+ from pydantic_typer import Typer
322+ from typer import Option
323+ from mlperf_inf_mm_q3vl.schema import Settings, Dataset, Endpoint, Verbosity
324+ from mlperf_inf_mm_q3vl.log import setup_loguru_for_benchmark
325+
326+ from .schema import FooEndpoint
327+
328+ def register_foo_benchmark () -> Callable:
329+ """ Entry point for the plugin to benchmark the Foo inference system.
330+
331+ This function is called when the CLI discovers the plugin.
332+ It should return either:
333+ - A single command function (decorated with appropriate options)
334+ - A tuple of (Typer app, command name) for more complex hierarchies
335+ """
336+
337+ def benchmark_foo (
338+ * ,
339+ settings : Settings,
340+ dataset : Dataset,
341+ # Add your foo-specific parameters here
342+ foo : FooEndpoint,
343+ custom_param : Annotated[
344+ int ,
345+ Option(help = " Custom parameter for foo backend" ),
346+ ] = 2 ,
347+ random_seed : Annotated[
348+ int ,
349+ Option(help = " The seed for the random number generator." ),
350+ ] = 12345 ,
351+ verbosity : Annotated[
352+ Verbosity,
353+ Option(help = " The verbosity level of the logger." ),
354+ ] = Verbosity.INFO ,
355+ ) -> None :
356+ """ Deploy and benchmark using Foo backend.
357+
358+ This command deploys a model using the Foo backend
359+ and runs the MLPerf benchmark against it.
360+ """
361+ from .deploy import FooDeployer
362+
363+ setup_loguru_for_benchmark(settings = settings, verbosity = verbosity)
364+ logger.info(
365+ f " Start to benchmark the Foo inference system with endpoint spec {} and custom param {} " ,
366+ foo,
367+ custom_param,
368+ )
369+ # Your implementation here
370+ with FooDeployer(endpoint = foo, settings = settings, custom_param = custom_param):
371+ # FooDeployer will make sure that Foo is deployed and currently healthy.
372+ # Run benchmark using the core run_benchmark function
373+ run_benchmark(
374+ settings = settings,
375+ dataset = dataset,
376+ endpoint = vllm,
377+ random_seed = random_seed,
378+ )
379+
380+ # Return the command function
381+ # The entry point name will be used as the subcommand name
382+ return benchmark_foo
383+ ```
384+
385+ #### Step 3: Configure ` pyproject.toml `
386+
387+ Register the plugin in its package's ` pyproject.toml ` :
388+
389+ ``` toml
390+ [project ]
391+ name = " mlperf-inf-mm-q3vl-foo"
392+ version = " 0.1.0"
393+ description = " Enable mlperf-inf-mm-q3vl to benchmark the Foo inference system."
394+ requires-python = " >=3.12"
395+ dependencies = [
396+ " mlperf-inf-mm-q3vl @ git+https://github.com/mlcommons/inference.git#subdirectory=multimodal/qwen3-vl/" ,
397+ # Add your backend-specific dependencies here
398+ ]
399+
400+ [project .entry-points ."mlperf_inf_mm_q3vl .benchmark_plugins" ]
401+ # The key here becomes the subcommand name.
402+ foo = " mlperf_inf_mm_q3vl_foo.plugin:register_foo_benchmark"
403+
404+ [build-system ]
405+ requires = [" setuptools>=80" ]
406+ build-backend = " setuptools.build_meta"
407+ ```
408+
409+ #### Step 4: Install and use ` mlperf-inf-mm-q3vl benchmark foo `
410+
411+ ``` bash
412+ # Install your plugin package
413+ pip install mlperf-inf-mm-q3vl-foo
414+
415+ # The new subcommand is now available
416+ mlperf-inf-mm-q3vl benchmark foo --help
417+ mlperf-inf-mm-q3vl benchmark foo \
418+ --settings-file settings.toml \
419+ --dataset shopify-global-catalogue \
420+ --custom-param 3
421+ ```
422+
423+ #### Advanced: Nested Subcommands
424+
425+ If you want to create multiple subcommands under a single plugin (e.g.,
426+ ` mlperf-inf-mm-q3vl benchmark foo standard ` and
427+ ` mlperf-inf-mm-q3vl benchmark foo optimized ` ), return a tuple of ` (Typer app, name) ` :
428+
429+ ``` python
430+ def register_foo_benchmark () -> tuple[Typer, str ]:
431+ """ Entry point that creates nested subcommands."""
432+ from pydantic_typer import Typer
433+
434+ # Create a Typer app for your plugin
435+ foo_app = Typer(help = " Benchmarking options for the Foo inference systems." )
436+
437+ @foo_app.command (name = " standard" )
438+ def foo_standard (...) -> None :
439+ """ Run standard Foo benchmark."""
440+ # Implementation
441+ ...
442+
443+ @foo_app.command (name = " optimized" )
444+ def foo_optimized (...) -> None :
445+ """ Run optimized Foo benchmark with max performance."""
446+ # Implementation
447+ ...
448+
449+ # Return tuple of (app, command_name)
450+ return (foo_app, " foo" )
451+ ```
452+
453+ This will create:
454+ - ` mlperf-inf-mm-q3vl benchmark foo standard `
455+ - ` mlperf-inf-mm-q3vl benchmark foo optimized `
456+
457+ ### Best Practices
458+
459+ 1 . Dependencies: Declare ` mlperf-inf-mm-q3vl ` as a dependency in your plugin package.
460+ 2 . Documentation: Provide clear docstrings for your plugin commands - they appear in
461+ ` --help ` output.
462+ 3 . Schema Reuse: Reuse the core ` Settings ` , ` Dataset ` , and other schemas from
463+ ` mlperf_inf_mm_q3vl.schema ` for consistency and minimizing boilerplate code.
464+ 4 . Lazy Imports: If your plugin has heavy dependencies, import them inside functions
465+ rather than at module level to avoid slowing down CLI startup
271466
272467## Developer Guide
273468
0 commit comments