Skip to content

Fix generated builtin evaluator issues flagged in PR #531 #547

@nadheesh

Description

@nadheesh

Context

Several CodeRabbit findings from PR #531 relate to generated code (builtin_evaluators.go) or the generator scripts / SDK that produce it. These require changes to the Python SDK evaluators + regeneration, or behavioral changes that warrant their own review cycle.

Issues to address

Generated code (builtin_evaluators.go — generated by scripts/generate-builtin-evaluators.sh)

  1. Guard zero/negative task constraints before division (Major)

    • Lines 34, 131: Division by max_iterations / max_tokens can raise ZeroDivisionError if a task overrides with 0 or negative.
    • Fix: Add guard if max_iterations is None or max_iterations <= 0 in the Python SDK evaluator source.
  2. instruction_following prompt drops success criteria (Minor)

    • Line 338: Prints Success criteria: but doesn't interpolate the value.
  3. Literal \n instead of real newlines in prompt text (Minor)

    • Lines 357, 376: \\nTask: renders as literal backslash-n, not a line break.
  4. context config defined but not interpolated in Safety/Tone prompts (Major)

    • Lines 414, 434: context config param exists but prompt renders empty Context:.

Generator script (generate-builtin-evaluators.sh)

  1. Prompt template extraction silently drops valid build_prompt implementations (Major)

    • Line 181: Only extracts f-string return values; return prompt patterns produce empty Source.
  2. Output directory not created before heredoc redirection (Major)

    • Line 457: LLM_JUDGE_BASE_FILE path may not exist if --output points to custom location.

SDK (amp_evaluation)

  1. Enforce required function-level Params during initialization (Major, base.py:740)

    • _init_function_params() doesn't raise when a required Param (no default) isn't provided, delaying errors to runtime.
  2. f-string docs clarification (Minor, evaluator_schema.py:1356)

    • AI copilot guide says "loops" are supported inside {} — only comprehension expressions work, not loop statements.

Source PR

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions