Skip to content

Fix get_target_trial_index for LILO experiments (#5038)#5038

Closed
ItsMrLin wants to merge 2 commits intofacebook:mainfrom
ItsMrLin:export-D96574746
Closed

Fix get_target_trial_index for LILO experiments (#5038)#5038
ItsMrLin wants to merge 2 commits intofacebook:mainfrom
ItsMrLin:export-D96574746

Conversation

@ItsMrLin
Copy link
Contributor

@ItsMrLin ItsMrLin commented Mar 16, 2026

Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is pairwise_pref_query — a derived metric that only
LILO labeling trials carry data for. get_target_trial_index() then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data). The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:

  1. Exclude LILO labeling trials (trial_type == LILO_LABELING) from
    the target trial candidate set.
  2. For LILO experiments, accept INCOMPLETE metric availability so that
    non-LILO trials (which have base-metric data but lack the pairwise
    preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746

@meta-cla meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 16, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 16, 2026

@ItsMrLin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96574746.

ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 16, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of Sobol trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, add a fallback that checks metric availability
   against all experiment metrics (not just opt config), so Sobol trials
   are found even when the opt config includes the pairwise metric.

Differential Revision: D96574746
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 17, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of Sobol trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, add a fallback that checks metric availability
   against all experiment metrics (not just opt config), so Sobol trials
   are found even when the opt config includes the pairwise metric.

Reviewed By: Balandat

Differential Revision: D96574746
@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 94.44444% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.74%. Comparing base (2d4750a) to head (7f1a99c).

Files with missing lines Patch % Lines
ax/storage/sqa_store/decoder.py 60.00% 2 Missing ⚠️
ax/storage/sqa_store/encoder.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5038   +/-   ##
=======================================
  Coverage   96.73%   96.74%           
=======================================
  Files         606      606           
  Lines       66242    66296   +54     
=======================================
+ Hits        64080    64136   +56     
+ Misses       2162     2160    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ItsMrLin ItsMrLin force-pushed the export-D96574746 branch 2 times, most recently from 1064d78 to 649afea Compare March 18, 2026 00:10
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 18, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 18, 2026
Summary:
Pull Request resolved: facebook#5038

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
@meta-codesync meta-codesync bot changed the title Fix get_target_trial_index for LILO experiments Fix get_target_trial_index for LILO experiments (#5038) Mar 18, 2026
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 18, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 18, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 18, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
Summary:

Move `LLMMessage` dict conversion from the `experiment.llm_messages`
getter/setter to the storage encoders/decoders, following Ax convention
that domain objects hold domain types and serialization happens at the
storage boundary.

**`experiment.py`**: The setter now stores `LLMMessage` objects directly
in `_properties`. The getter handles both `LLMMessage` objects (new path)
and plain dicts (backward compat with previously stored data).

**JSON store**: No explicit changes needed — the encoder's generic
dataclass fallback auto-serializes `LLMMessage` with a `__type` tag,
and `LLMMessage` is already registered in `CORE_DECODER_REGISTRY`.

**SQA store**: The encoder converts `LLMMessage` → dict via
`dataclasses.asdict()` in the properties copy before DB write (same
pattern as `pruning_target_parameterization`). The decoder converts
dicts → `LLMMessage` after loading properties, in both
`_init_experiment_from_sqa` and `_init_mt_experiment_from_sqa`.

Reviewed By: lena-kashtelyan

Differential Revision: D96434290
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 18, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
ItsMrLin added a commit to ItsMrLin/Ax that referenced this pull request Mar 18, 2026
Summary:

In LILO (LLM-In-the-Loop Optimization) experiments, the optimization
config objective is `pairwise_pref_query` — a derived metric that only
LILO labeling trials carry data for.  `get_target_trial_index()` then
selects these labeling trials (which have COMPLETE pairwise data) as
the relativization reference instead of non-LILO trials (which have base
metric data).  The target trial's SQ then lacks base metrics, causing
TransformToNewSQ and downstream model fitting to fail.

Fix:
1. Exclude LILO labeling trials (`trial_type == LILO_LABELING`) from
   the target trial candidate set.
2. For LILO experiments, accept INCOMPLETE metric availability so that
   non-LILO trials (which have base-metric data but lack the pairwise
   preference metric) can serve as relativization references.

Reviewed By: Balandat

Differential Revision: D96574746
@meta-codesync meta-codesync bot closed this in 0ecc5d9 Mar 18, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 18, 2026

This pull request has been merged in 0ecc5d9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants