Skip to content

Latest commit

 

History

History
235 lines (189 loc) · 7.5 KB

File metadata and controls

235 lines (189 loc) · 7.5 KB

Integration Test Fix Implementation Plan

Created: 2025-01-26
Status: Ready for Implementation
Total Failures: 130 out of 941 integration tests

Executive Summary

Following the successful resolution of authentication and HTTP client infrastructure issues, 130 integration tests remain failing due to test-specific issues. This plan outlines a systematic approach to fix all failures through 5 phases, prioritizing high-impact fixes first.

Failure Analysis

Failure Distribution:
├─ Type/Assertion Issues (40%) ─── ~52 tests
├─ Missing Functionality (30%) ──── ~39 tests  
├─ OAuth2 Tests (20%) ──────────── ~26 tests
└─ Test Implementation (10%) ────── ~13 tests

Phase-by-Phase Implementation Plan

Phase 1: Quick Wins - Type Mapping & Assertion Fixes

Target: Fix ~40-50 tests with minimal code changes
Estimated Impact: Reduce failures from 130 to ~90

Issues to Address:

  1. Struct vs Map Mismatches

    • Problem: Tests expect %ExLLM.Types.Model{}, providers return plain maps
    • Solution: Update assertions OR normalize provider responses
    • Example: assert %ExLLM.Types.Model{} = model fails when model is a map
  2. Finish Reason Normalization

    • Problem: Provider-specific values ("stop" vs "end_turn" vs "stop_sequence")
    • Solution: Create mapping layer or update test expectations
    • Affected providers: OpenAI, Anthropic, Gemini
  3. Error Type Flexibility

    • Problem: Tests expect :context_length_error, get :invalid_messages
    • Solution: Use assert error in [...] pattern for valid alternatives

Implementation Strategy:

# Create shared test helpers
defmodule ExLLM.TestHelpers do
  def assert_model_response(response) do
    # Handle both struct and map responses
  end
  
  def normalize_finish_reason(reason, provider) do
    # Map provider-specific reasons to standard ones
  end
end

Phase 2: Core Functionality Implementation

Target: Fix ~30-40 tests by implementing missing features
Estimated Impact: Reduce failures from ~90 to ~50

Issues to Address:

  1. Cost Calculation Returning nil

    • Root Cause: Pricing data not loaded or calculation not implemented
    • Files: lib/ex_llm/core/cost.ex, provider pricing configs
    • Solution:
      • Ensure pricing YAML files are loaded
      • Implement calculation logic for all token types
      • Add fallback for missing pricing data
  2. Multimodal Token Estimation

    • Error: FunctionClauseError for %{type: "text", text: "..."}
    • File: lib/ex_llm/core/cost.ex:61
    • Solution:
      def estimate_tokens(%{type: "text", text: text}) when is_binary(text) do
        estimate_tokens(text)
      end
  3. Streaming Metrics Collection

    • Problem: No metrics collected during streaming tests
    • Files: enhanced_streaming_coordinator.ex, metrics_plug.ex
    • Solution: Debug middleware initialization and event propagation

Phase 3: Test Implementation Fixes

Target: Fix ~10-15 tests with code corrections
Estimated Impact: Reduce failures from ~50 to ~35

Issues to Address:

  1. Stream Chunk Access Pattern

    # Wrong - causes "ExLLM.Types.StreamChunk.fetch/2 is undefined"
    chunk[:tool_calls]
    
    # Correct - use struct field access
    chunk.tool_calls
  2. Overly Strict Assertions

    # Wrong - too specific
    assert response.finish_reason == "stop"
    
    # Correct - allow valid variations
    assert response.finish_reason in ["stop", "end_turn", "stop_sequence"]

Phase 4: OAuth2 Test Strategy

Target: Handle ~20-30 OAuth2-dependent tests
Estimated Impact: Reduce failures from ~35 to ~15

Decision Matrix:

OAuth2 Complexity Assessment:
├─ Token refresh implemented? → No
├─ Setup scripts available? → Yes (scripts/setup_oauth2.exs)
├─ CI/CD complexity? → High
└─ Recommendation → Skip in CI, document manual testing

Implementation Options:

  1. Option A: Full implementation (Complex)

    • Implement token refresh in test helper
    • Add CI secrets for OAuth credentials
    • Estimated effort: 4-6 hours
  2. Option B: Mock OAuth2 (Medium)

    • Create mock responses for OAuth endpoints
    • Maintain test coverage without real auth
    • Estimated effort: 2-3 hours
  3. Option C: Skip with documentation (Simple)

    • Tag tests with :requires_oauth2
    • Document manual testing process
    • Estimated effort: 30 minutes

Recommendation: Start with Option C, plan for Option B if needed

Phase 5: Edge Cases & Cleanup

Target: Fix remaining ~10-15 edge case failures
Estimated Impact: Reduce failures from ~15 to 0

Typical Edge Cases:

  • Provider-specific timeout handling
  • Rate limit responses during tests
  • Malformed response handling
  • Async test race conditions

Implementation Workflow

START
  │
  ├─→ [1] Diagnostic Run
  │     ├─→ Generate failure report
  │     └─→ Categorize by error type
  │
  ├─→ [2] Phase 1: Type/Assertion Fixes
  │     ├─→ Create test helper module
  │     ├─→ Find common patterns
  │     └─→ Batch apply fixes
  │
  ├─→ [3] Phase 2: Core Functionality
  │     ├─→ Fix cost calculation
  │     ├─→ Add multimodal patterns
  │     └─→ Debug metrics pipeline
  │
  ├─→ [4] Phase 3: Test Implementation
  │     ├─→ Fix access patterns
  │     └─→ Update assertions
  │
  ├─→ [5] Phase 4: OAuth2 Handling
  │     ├─→ Assess complexity
  │     └─→ Implement chosen strategy
  │
  └─→ [6] Phase 5: Final Cleanup
        ├─→ Handle edge cases
        └─→ Verify all tests pass

Success Metrics

Phase Expected Remaining Fixed Success Criteria
Start 130 - Baseline established
1 ~90 40 Type issues resolved
2 ~50 40 Core functions work
3 ~35 15 Test code corrected
4 ~15 20 OAuth2 handled
5 0 15 All tests pass

Key Files to Modify

Phase 1 Files:

  • test/support/shared/provider_integration_test.exs
  • test/ex_llm/providers/*_public_api_test.exs
  • Create: test/support/test_helpers.ex

Phase 2 Files:

  • lib/ex_llm/core/cost.ex
  • config/models/*.yml (pricing data)
  • lib/ex_llm/providers/shared/streaming_coordinator.ex

Phase 3 Files:

  • Various test files with syntax issues
  • test/ex_llm/providers/*_test.exs

Phase 4 Files:

  • test/test_helper.exs (exclusion rules)
  • OAuth2-specific test files

First Actions Checklist

  • Run diagnostic to categorize failures
  • Create test helper module structure
  • Identify first type mismatch to fix
  • Implement and measure impact
  • Document patterns for similar fixes

Risk Mitigation

  1. Risk: Fixes break currently passing tests

    • Mitigation: Run full test suite after each phase
  2. Risk: OAuth2 complexity delays progress

    • Mitigation: Skip strategy ready as fallback
  3. Risk: Missing functionality requires major refactoring

    • Mitigation: Implement minimal viable solutions first

Notes

  • Authentication and HTTP infrastructure issues have been resolved
  • This plan focuses only on test-specific issues
  • OAuth2 tests may be deferred to avoid blocking progress
  • Success is measured by reduction in test failures