-
Notifications
You must be signed in to change notification settings - Fork 364
Description
I'm interested in running a subset of the HEIM benchmarks, but I'm not interested in all of the metrics. Currently I'm running:
helm-run --run-entries mscoco:model=huggingface/stable-diffusion-v1-4 \
--suite my-heim-suite \
--max-eval-instances 10 --num-threads 1I see in helm.benchmark.run_speces.heim_run_specs there is get_mscoco_spec which is registered as a run spec function and ultimately calls into get_core_heim_metric_specs to create the metric class instances. In this example the final constructed run spec contains these metrics:
metric_specs=[MetricSpec(class_name='helm.benchmark.metrics.image_generation.lpips_metrics.LearnedPerceptualImagePatchSimilarityMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.multi_scale_ssim_metrics.MultiScaleStructuralSimilarityIndexMeasureMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.psnr_metrics.PeakSignalToNoiseRatioMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.uiqi_metrics.UniversalImageQualityIndexMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.aesthetics_metrics.AestheticsMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.clip_score_metrics.CLIPScoreMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.efficiency_metrics.EfficiencyMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.fractal_dimension_metric.FractalDimensionMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.image_generation.watermark_metrics.WatermarkMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.BasicGenerationMetric', args={'names': []}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.BasicReferenceMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.InstancesPerSplitMetric', args={})]
But I don't think there is any way for me to modify that run entry description to impact the metrics? I think the only items I can specify there are one of the keys from the RUN_EXPANDERS (i.e.)
'instructions'
'prompt'
'newline'
'stop'
'format_prompt'
'follow_format_instructions'
'add_to_stop'
'global_prefix'
'num_train_trials'
'max_train_instances'
'max_eval_instances'
'num_outputs'
'num_trials'
'model'
'model_deployment'
'data_augmentation'
'tokenizer'
'num_prompt_tokens'
'num_output_tokens'
'chatml'
'eval_split'
'output_format_instructions'
'temperature'
'increase_temperature'
'increase_max_tokens'
'process_output'
Or a key that agrees with an argument from the function wrapped with run_spec_function("mscoco"), which in this case is get_mscoco_spec.
So my thought is I have to create a custom run spec, and I worked on following the information here: https://crfm-helm.readthedocs.io/en/latest/adding_new_scenarios/#custom-run-spec-function (this doc could use a bit more detail, and ideally a small tutorial).
I created a file in the current working directory:
echo '
from helm.benchmark.run_spec import RunSpec, run_spec_function
from helm.benchmark.adaptation.adapter_spec import AdapterSpec
from helm.benchmark.metrics.metric import MetricSpec
from helm.benchmark.run_specs.classic_run_specs import get_basic_metric_specs
from helm.benchmark.scenarios.scenario import ScenarioSpec
from typing import List
from helm.benchmark.run_specs.heim_run_specs import get_image_generation_adapter_spec
def get_my_core_heim_metric_specs() -> List[MetricSpec]:
"""Evaluate every image with these set of metrics."""
return [
MetricSpec(class_name="helm.benchmark.metrics.image_generation.clip_score_metrics.CLIPScoreMetric", args={}),
] + get_basic_metric_specs(names=[])
@run_spec_function("my_mscoco")
def get_my_mscoco_spec(
for_efficiency: bool = False,
compute_fid: bool = False,
run_human_eval: bool = False,
num_human_examples: int = 100,
use_perturbed: bool = False,
skip_photorealism: bool = False,
skip_subject: bool = False,
) -> RunSpec:
scenario_spec = ScenarioSpec(
class_name="helm.benchmark.scenarios.image_generation.mscoco_scenario.MSCOCOScenario", args={}
)
adapter_spec: AdapterSpec
metric_specs: List[MetricSpec]
run_spec_name: str
adapter_spec = get_image_generation_adapter_spec(num_outputs=4)
metric_specs = get_my_core_heim_metric_specs()
run_spec_name = "my_mscoco"
return RunSpec(
name=run_spec_name,
scenario_spec=scenario_spec,
adapter_spec=adapter_spec,
metric_specs=metric_specs,
groups=[run_spec_name],
)
' > helm_my_run_specs.pyUsing helm-run by itself did not see the new file even though it conformed to the naming scheme, but I think that is a python issue. When I explicitly set PYTHONPATH=. it picked up the new spec. And I was able to run:
PYTHONPATH=. helm-run --run-entries my_mscoco:model=huggingface/stable-diffusion-v1-4 --suite my-heim-suite --max-eval-instances 10 --num-threads 1 --log-config ./helm_debug_log_config.yaml
But this does output the results in a different folder. I'm not sure if setting the name to "mscoco" would cause issues or not.
I'm wondering if:
- Is this the best way to compute a subset of metrics for an existing run spec?
- Would it be worth adding something to HELM to make it easier to point at a specific file that contains the custom run specs rather than relying on a fragile naming scheme? (Also, I'm not even sure where the
helm_*_run_specspattern matching is happening, the only thing I see to discover run specs is indiscover_run_spec_functions). But it might be a lot more intuitive if there was an argument or environment variable that let the user specify the module to import that contained their customized run specs.