Skip tasks with insufficient samples for pass@k instead of raising ex… by evtimovi · Pull Request #113 · facebookresearch/prompt-siren

evtimovi · 2026-02-04T17:07:48Z

When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing.

Changes:

Replace ValueError with warning log when n_samples < k for a task
Add logging module import and logger instance
Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count)
Add check for empty DataFrame after filtering in aggregate_results
Update docstrings to reflect new behavior (Note instead of Raises)
Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment
Add generic variant_name support alongside legacy template_short_name

…ception When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing. Changes: - Replace ValueError with warning log when n_samples < k for a task - Add logging module import and logger instance - Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count) - Add check for empty DataFrame after filtering in aggregate_results - Update docstrings to reflect new behavior (Note instead of Raises) - Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment - Add generic variant_name support alongside legacy template_short_name

dedeswim · 2026-02-04T19:22:03Z

I fear this might lead to silently computing wrong metrics? Should we have a non-strict mode that behaves like this via a flag instead?

meta-codesync · 2026-02-05T12:46:36Z

@evtimovi has imported this pull request. If you are a Meta employee, you can view this in D92393526.

#113) Summary: When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing. Changes: - Replace ValueError with warning log when n_samples < k for a task - Add logging module import and logger instance - Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count) - Add check for empty DataFrame after filtering in aggregate_results - Update docstrings to reflect new behavior (Note instead of Raises) - Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment - Add generic variant_name support alongside legacy template_short_name Differential Revision: D92393526 Pulled By: evtimovi

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Skip tasks with insufficient samples for pass@k instead of raising ex…#113

Skip tasks with insufficient samples for pass@k instead of raising ex…#113
evtimovi wants to merge 1 commit intomainfrom
permissive_results_at_k

evtimovi commented Feb 4, 2026

Uh oh!

dedeswim commented Feb 4, 2026

Uh oh!

meta-codesync bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

evtimovi commented Feb 4, 2026

Uh oh!

dedeswim commented Feb 4, 2026

Uh oh!

meta-codesync bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants