-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
analysisData analysis and metricsData analysis and metricsdifficulty:intermediateIntermediate difficulty - requires domain knowledgeIntermediate difficulty - requires domain knowledgeenhancementNew feature or requestNew feature or requestpriority:mediumMedium priority - important but not blockingMedium priority - important but not blockingpublicationPublication preparation and research paperPublication preparation and research paper
Description
Implement query difficulty stratified analysis for publication results section.
Goal: Determine if different bandits excel at different query difficulty levels.
Implementation Tasks:
- Difficulty Annotation: Annotate queries with difficulty scores
- Stratified Analysis: Compare bandit performance by difficulty bucket
- Interaction Effects: Test if bandit ranking changes with difficulty
Difficulty Scoring Approaches:
- Option 1: Oracle variance (high variance = difficult query where models disagree)
- Option 2: Oracle quality (low quality = difficult query where even best model struggles)
- Option 3: Hybrid: variance + quality
Implementation Location: conduit_bench/analysis/difficulty_analysis.py
Required Functions:
def annotate_difficulty(
oracle_results: BenchmarkResults,
method: str = "variance"
) -> Dict[str, float]:
"""Assign difficulty score to each query."""
def stratified_performance_analysis(
results: BenchmarkResults,
difficulty_scores: Dict[str, float],
buckets: int = 5
) -> pd.DataFrame:
"""Compare bandit performance by difficulty quintile."""
def test_interaction_effects(
results: BenchmarkResults,
difficulty_scores: Dict[str, float]
) -> Dict[str, float]:
"""Test if bandit ranking changes significantly by difficulty."""Visualization Requirements:
- Performance by difficulty quintile (grouped bar chart)
- Difficulty distribution (histogram)
- Interaction heatmap (algorithm × difficulty → quality)
Success Criteria:
- Difficulty annotation implemented and validated
- Stratified analysis functions implemented
- Statistical interaction testing implemented
- Visualization functions ready
- Documentation and examples added
Timeline: 6 hours
Dependencies: Issue #33 (experiment results with oracle)
Publication Impact: Addresses "do bandits have different strengths/weaknesses?" research question
Key Insight: LinUCB might excel at difficult queries (uses context), while UCB1 might excel at easy queries (faster convergence).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
analysisData analysis and metricsData analysis and metricsdifficulty:intermediateIntermediate difficulty - requires domain knowledgeIntermediate difficulty - requires domain knowledgeenhancementNew feature or requestNew feature or requestpriority:mediumMedium priority - important but not blockingMedium priority - important but not blockingpublicationPublication preparation and research paperPublication preparation and research paper