Skip to content

Comments

Skip tasks with insufficient samples for pass@k instead of raising ex…#113

Open
evtimovi wants to merge 1 commit intomainfrom
permissive_results_at_k
Open

Skip tasks with insufficient samples for pass@k instead of raising ex…#113
evtimovi wants to merge 1 commit intomainfrom
permissive_results_at_k

Conversation

@evtimovi
Copy link
Contributor

@evtimovi evtimovi commented Feb 4, 2026

When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing.

Changes:

  • Replace ValueError with warning log when n_samples < k for a task
  • Add logging module import and logger instance
  • Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count)
  • Add check for empty DataFrame after filtering in aggregate_results
  • Update docstrings to reflect new behavior (Note instead of Raises)
  • Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment
  • Add generic variant_name support alongside legacy template_short_name

…ception

When computing pass@k metrics with k > 1, tasks that have fewer than k
samples are now gracefully skipped with a warning log message instead of
raising a ValueError that would terminate the entire results processing.

Changes:
- Replace ValueError with warning log when n_samples < k for a task
- Add logging module import and logger instance
- Collect skipped groups and log them with full context (dataset, agent,
  attack, task_id, and sample count)
- Add check for empty DataFrame after filtering in aggregate_results
- Update docstrings to reflect new behavior (Note instead of Raises)
- Also includes refactoring: remove job_name from group_cols to allow
  aggregating across multiple runs of the same experiment
- Add generic variant_name support alongside legacy template_short_name
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 4, 2026
@dedeswim
Copy link
Collaborator

dedeswim commented Feb 4, 2026

I fear this might lead to silently computing wrong metrics? Should we have a non-strict mode that behaves like this via a flag instead?

@meta-codesync
Copy link

meta-codesync bot commented Feb 5, 2026

@evtimovi has imported this pull request. If you are a Meta employee, you can view this in D92393526.

facebook-github-bot pushed a commit that referenced this pull request Feb 6, 2026
#113)

Summary:
When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing.

Changes:
- Replace ValueError with warning log when n_samples < k for a task
- Add logging module import and logger instance
- Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count)
- Add check for empty DataFrame after filtering in aggregate_results
- Update docstrings to reflect new behavior (Note instead of Raises)
- Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment
- Add generic variant_name support alongside legacy template_short_name


Differential Revision: D92393526

Pulled By: evtimovi
facebook-github-bot pushed a commit that referenced this pull request Feb 6, 2026
#113)

Summary:
When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing.

Changes:
- Replace ValueError with warning log when n_samples < k for a task
- Add logging module import and logger instance
- Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count)
- Add check for empty DataFrame after filtering in aggregate_results
- Update docstrings to reflect new behavior (Note instead of Raises)
- Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment
- Add generic variant_name support alongside legacy template_short_name


Differential Revision: D92393526

Pulled By: evtimovi
facebook-github-bot pushed a commit that referenced this pull request Feb 6, 2026
#113)

Summary:
When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing.

Changes:
- Replace ValueError with warning log when n_samples < k for a task
- Add logging module import and logger instance
- Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count)
- Add check for empty DataFrame after filtering in aggregate_results
- Update docstrings to reflect new behavior (Note instead of Raises)
- Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment
- Add generic variant_name support alongside legacy template_short_name


Differential Revision: D92393526

Pulled By: evtimovi
facebook-github-bot pushed a commit that referenced this pull request Feb 6, 2026
#113)

Summary:
When computing pass@k metrics with k > 1, tasks that have fewer than k samples are now gracefully skipped with a warning log message instead of raising a ValueError that would terminate the entire results processing.

Changes:
- Replace ValueError with warning log when n_samples < k for a task
- Add logging module import and logger instance
- Collect skipped groups and log them with full context (dataset, agent, attack, task_id, and sample count)
- Add check for empty DataFrame after filtering in aggregate_results
- Update docstrings to reflect new behavior (Note instead of Raises)
- Also includes refactoring: remove job_name from group_cols to allow aggregating across multiple runs of the same experiment
- Add generic variant_name support alongside legacy template_short_name


Differential Revision: D92393526

Pulled By: evtimovi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants