Question: Specifying a custom set of metrics to helm-run.

I'm interested in running a subset of the HEIM benchmarks, but I'm not interested in all of the metrics. Currently I'm running:

```bash
helm-run --run-entries mscoco:model=huggingface/stable-diffusion-v1-4 \
    --suite my-heim-suite \
    --max-eval-instances 10 --num-threads 1
```

I see in helm.benchmark.run_speces.heim_run_specs there is `get_mscoco_spec` which is registered as a run spec function and ultimately calls into `get_core_heim_metric_specs` to create the metric class instances. In this example the final constructed run spec contains these metrics:

```
metric_specs=[MetricSpec(class_name='helm.benchmark.metrics.image_generation.lpips_metrics.LearnedPerceptualImagePatchSimilarityMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.multi_scale_ssim_metrics.MultiScaleStructuralSimilarityIndexMeasureMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.psnr_metrics.PeakSignalToNoiseRatioMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.uiqi_metrics.UniversalImageQualityIndexMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.aesthetics_metrics.AestheticsMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.clip_score_metrics.CLIPScoreMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.efficiency_metrics.EfficiencyMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.fractal_dimension_metric.FractalDimensionMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.watermark_metrics.WatermarkMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.BasicGenerationMetric', args={'names': []}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.BasicReferenceMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.InstancesPerSplitMetric', args={})]
```

But I don't think there is any way for me to modify that run entry description to impact the metrics? I think the only items I can specify there are one of the keys from the RUN_EXPANDERS (i.e.)

```
    'instructions'
    'prompt'
    'newline'
    'stop'
    'format_prompt'
    'follow_format_instructions'
    'add_to_stop'
    'global_prefix'
    'num_train_trials'
    'max_train_instances'
    'max_eval_instances'
    'num_outputs'
    'num_trials'
    'model'
    'model_deployment'
    'data_augmentation'
    'tokenizer'
    'num_prompt_tokens'
    'num_output_tokens'
    'chatml'
    'eval_split'
    'output_format_instructions'
    'temperature'
    'increase_temperature'
    'increase_max_tokens'
    'process_output'
```

Or a key that agrees with an argument from the function wrapped with `run_spec_function("mscoco")`, which in this case is `get_mscoco_spec`. 

So my thought is I have to create a custom run spec, and I worked on following the information here: https://crfm-helm.readthedocs.io/en/latest/adding_new_scenarios/#custom-run-spec-function (this doc could use a bit more detail, and ideally a small tutorial). 

I created a file in the current working directory:

```bash
echo '
from helm.benchmark.run_spec import RunSpec, run_spec_function
from helm.benchmark.adaptation.adapter_spec import AdapterSpec
from helm.benchmark.metrics.metric import MetricSpec
from helm.benchmark.run_specs.classic_run_specs import get_basic_metric_specs
from helm.benchmark.scenarios.scenario import ScenarioSpec
from typing import List
from helm.benchmark.run_specs.heim_run_specs import get_image_generation_adapter_spec


def get_my_core_heim_metric_specs() -> List[MetricSpec]:
    """Evaluate every image with these set of metrics."""
    return [
        MetricSpec(class_name="helm.benchmark.metrics.image_generation.clip_score_metrics.CLIPScoreMetric", args={}),
    ] + get_basic_metric_specs(names=[])


@run_spec_function("my_mscoco")
def get_my_mscoco_spec(
    for_efficiency: bool = False,
    compute_fid: bool = False,
    run_human_eval: bool = False,
    num_human_examples: int = 100,
    use_perturbed: bool = False,
    skip_photorealism: bool = False,
    skip_subject: bool = False,
) -> RunSpec:
    scenario_spec = ScenarioSpec(
        class_name="helm.benchmark.scenarios.image_generation.mscoco_scenario.MSCOCOScenario", args={}
    )

    adapter_spec: AdapterSpec
    metric_specs: List[MetricSpec]
    run_spec_name: str

    adapter_spec = get_image_generation_adapter_spec(num_outputs=4)
    metric_specs = get_my_core_heim_metric_specs()
    run_spec_name = "my_mscoco"

    return RunSpec(
        name=run_spec_name,
        scenario_spec=scenario_spec,
        adapter_spec=adapter_spec,
        metric_specs=metric_specs,
        groups=[run_spec_name],
    )
' > helm_my_run_specs.py
```

Using helm-run by itself did not see the new file even though it conformed to the naming scheme, but I think that is a python issue. When I explicitly set `PYTHONPATH=.` it picked up the new spec. And I was able to run:

```
PYTHONPATH=. helm-run --run-entries my_mscoco:model=huggingface/stable-diffusion-v1-4 --suite my-heim-suite --max-eval-instances 10 --num-threads 1 --log-config ./helm_debug_log_config.yaml
```

But this does output the results in a different folder. I'm not sure if setting the name to "mscoco" would cause issues or not. 


I'm wondering if:

1. Is this the best way to compute a subset of metrics for an existing run spec? 
2. Would it be worth adding something to HELM to make it easier to point at a specific file that contains the custom run specs rather than relying on a fragile naming scheme? (Also, I'm not even sure where the `helm_*_run_specs` pattern matching is happening, the only thing I see to discover run specs is in `discover_run_spec_functions`). But it might be a lot more intuitive if there was an argument or environment variable that let the user specify the module to import that contained their customized run specs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Specifying a custom set of metrics to helm-run. #3915

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: Specifying a custom set of metrics to helm-run. #3915

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions