Skip to content

Implement query difficulty analysis #36

@evanvolgas

Description

@evanvolgas

Implement query difficulty stratified analysis for publication results section.

Goal: Determine if different bandits excel at different query difficulty levels.

Implementation Tasks:

  1. Difficulty Annotation: Annotate queries with difficulty scores
  2. Stratified Analysis: Compare bandit performance by difficulty bucket
  3. Interaction Effects: Test if bandit ranking changes with difficulty

Difficulty Scoring Approaches:

  • Option 1: Oracle variance (high variance = difficult query where models disagree)
  • Option 2: Oracle quality (low quality = difficult query where even best model struggles)
  • Option 3: Hybrid: variance + quality

Implementation Location: conduit_bench/analysis/difficulty_analysis.py

Required Functions:

def annotate_difficulty(
    oracle_results: BenchmarkResults,
    method: str = "variance"
) -> Dict[str, float]:
    """Assign difficulty score to each query."""
    
def stratified_performance_analysis(
    results: BenchmarkResults,
    difficulty_scores: Dict[str, float],
    buckets: int = 5
) -> pd.DataFrame:
    """Compare bandit performance by difficulty quintile."""
    
def test_interaction_effects(
    results: BenchmarkResults,
    difficulty_scores: Dict[str, float]
) -> Dict[str, float]:
    """Test if bandit ranking changes significantly by difficulty."""

Visualization Requirements:

  • Performance by difficulty quintile (grouped bar chart)
  • Difficulty distribution (histogram)
  • Interaction heatmap (algorithm × difficulty → quality)

Success Criteria:

  • Difficulty annotation implemented and validated
  • Stratified analysis functions implemented
  • Statistical interaction testing implemented
  • Visualization functions ready
  • Documentation and examples added

Timeline: 6 hours

Dependencies: Issue #33 (experiment results with oracle)

Publication Impact: Addresses "do bandits have different strengths/weaknesses?" research question

Key Insight: LinUCB might excel at difficult queries (uses context), while UCB1 might excel at easy queries (faster convergence).

Metadata

Metadata

Assignees

No one assigned

    Labels

    analysisData analysis and metricsdifficulty:intermediateIntermediate difficulty - requires domain knowledgeenhancementNew feature or requestpriority:mediumMedium priority - important but not blockingpublicationPublication preparation and research paper

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions