Skip to content

Latest commit

 

History

History
227 lines (189 loc) · 9.98 KB

File metadata and controls

227 lines (189 loc) · 9.98 KB

Combat Parser Development & Optimization Log

🎯 Project Overview

Building an enhanced combat log parser for WoW Arena gameplay analysis that extracts precise performance metrics from terabytes of video data and corresponding combat logs.

📊 Data Landscape

  • Video Archive: 11,355+ matches dating back to 2023
  • Combat Logs: Available from January 2025 onwards only
  • Challenge: Match videos to correct combat logs with precise timing

🔧 Key Development Phases

Phase 1: Enhanced Timestamp Matching ✅

Problem: Original parse_logs_fast.py used unreliable timestamp matching with 5-10 minute errors.

Solution: Created test_timestamp_matching.py and enhanced matching system:

  • Method Detection: Auto-detect new format (has 'start' field) vs old format (needs combat log parsing)
  • High Reliability: JSON 'start' field (±30 second windows)
  • Medium Reliability: Combat log parsing (±2 minute windows)
  • Low Reliability: Filename estimation (±5 minute windows)
  • Result: 100% success rate on enhanced timestamp matching

Key Files:

  • improved_timestamp_matcher.py - Core matching logic
  • master_index_enhanced.csv - Output with precise timestamps and reliability scores

Phase 2: Combat Log Format Analysis ✅

Discovery: Combat log timestamp format different than expected.

Original Expected: M/DD HH:MM:SS.ffffff (combined with filename date) Actual Format: M/D/YYYY HH:MM:SS.fff-Z (full date with timezone)

Fix: Updated parse_log_line_timestamp() to handle:

# Parse full timestamp: "1/2/2025 18:04:33.345-5"
timestamp_clean = full_timestamp.split('-')[0].strip()
result = datetime.strptime(timestamp_clean, "%m/%d/%Y %H:%M:%S.%f")

Phase 3: Smart Arena Boundary Detection ✅

Problem: Parser was matching events from WRONG arena matches (previous/next matches).

Root Cause: Simple time window approach caught events from adjacent matches.

Solution: Implemented smart arena matching algorithm:

Algorithm Steps:

  1. Extract Expected Info from filename:

    • 2025-01-01_20-21-29_-_Phlurbotomy_-_3v3_Ruins_of_Lordaeron_(Win).mp4
    • Expected: 3v3 bracket on Ruins of Lordaeron map
  2. Collect Arena Events in extended window (±10 minutes):

    • Find all ARENA_MATCH_START and ARENA_MATCH_END events
    • Parse zone IDs to map names (572 = Ruins of Lordaeron, 1825 = Hook Point, etc.)
  3. Smart Matching Strategy:

    • Strategy 1: Look backward from video timestamp for most recent matching arena start
    • Strategy 2: If no backward match, look forward from video timestamp
    • Validation: Match both bracket (3v3) AND map name (Ruins of Lordaeron)
  4. Find Precise Boundaries:

    • Once correct start found, find corresponding ARENA_MATCH_END
    • Use these EXACT boundaries for event counting

Zone ID Mapping (Complete):

zone_map = {
    '980': "Tol'viron",
    '1552': "Ashamane's Fall", 
    '2759': "Cage of Carnage",
    '1504': "Black Rook",
    '2167': "Robodrome",
    '2563': "Nokhudon", 
    '1911': "Mugambala",
    '2373': "Empyrean Domain",
    '1134': "Tiger's Peak",
    '1505': "Nagrand",
    '1825': "Hook Point",
    '2509': "Maldraxxus",
    '572': "Ruins of Lordaeron",
    '617': "Dalaran Sewers",
    '2547': "Enigma Crucible"
}

🎯 Solo Shuffle Handling ✅

Challenge: Solo Shuffle behaves differently from standard arena:

  • 6 Rounds: Each round triggers ARENA_MATCH_START but no ARENA_MATCH_END between rounds
  • Single Session: All 6 rounds played on same arena map
  • Different JSON Structure: Contains soloShuffleTimeline with round-by-round results
  • Complex Scoring: 3+ rounds won = overall win, <3 rounds = loss
  • Bracket Name: "Rated Solo Shuffle" in combat logs vs "Solo Shuffle" in JSON

Solution:

  1. Enhanced Bracket Matching: Handle "Solo Shuffle""Rated Solo Shuffle" equivalence
  2. Session-Level Matching: Match entire shuffle session instead of individual rounds
  3. Round Timeline Correlation: Verify shuffle timeline against multiple ARENA_MATCH_START events
  4. Death Correlation: Cross-verify deaths across all 6 rounds

Phase 4: Enhanced Event Detection ✅

Improvements Made:

Pet vs Player Cast Separation:

  • Problem: Pet casts counted as player casts
  • Solution: Only count src == player_name for cast_success_own
  • Exception: Track pet Devour Magic dispels separately as purges_own

Spell Tracking:

  • spells_cast: Track which spells player cast ("Shadow Bolt", "Fear", etc.)
  • spells_purged: Track which auras pet dispelled ("Sun Sear", "Blessing of Protection", etc.)

Purge Detection Fix:

  • Wrong: Looking for SPELL_CAST_SUCCESS with "Devour Magic"
  • Correct: Looking for SPELL_DISPEL events:
if event_type == 'SPELL_DISPEL' and len(parts) >= 13:
    if pet_name and src == pet_name and spell_name == "Devour Magic":
        purged_aura = parts[12].strip('"')  # "Sun Sear"
        features['purges_own'] += 1
        features['spells_purged'].append(purged_aura)

Combat Log Event Structure:

  • SPELL_DISPEL: parts[12] contains the aura that was purged
  • SPELL_CAST_SUCCESS: parts[10] contains the spell that was cast
  • SPELL_INTERRUPT: parts[10] contains the interrupt spell used

🐛 Debug Process ✅

Created debug_enhanced_combat_parser_fixed.py with extensive logging:

  • Phase 1: Smart arena boundary detection with validation
  • Phase 2: Event parsing within precise boundaries
  • Debug Output: First 10 events of each type with timestamps
  • Validation: Shows expected vs found arena info, spell lists

📈 Expected Metrics Tracked

features = {
    'cast_success_own': 0,           # Player spells cast successfully
    'interrupt_success_own': 0,      # Interrupts you performed  
    'times_interrupted': 0,          # Times you were interrupted
    'precog_gained_own': 0,          # Precognition buffs you gained
    'precog_gained_enemy': 0,        # Precognition buffs enemies gained
    'purges_own': 0,                 # Auras your pet dispelled
    'spells_cast': [],               # List of spells you cast
    'spells_purged': []              # List of auras your pet purged
}

🚀 Phase 5: Production-Ready Version ✅

Final Implementation: Created enhanced_combat_parser_production.py

Key Production Features:

  • Efficient Processing: Progress tracking every 100 matches, error logging to file
  • Smart Log Selection: Time-based matching to find correct combat logs
  • Reliability-Based Processing: Different time windows based on timestamp reliability
  • Complete Feature Set: All fixes applied from debug version
  • Scalable: Designed to process full 11,355+ match dataset

Production Optimizations:

  1. Batch Progress Reporting: Updates every 100 matches processed
  2. Error Logging: Errors logged to parsing_errors.log instead of console spam
  3. Efficiency Checks: Skip already processed logs
  4. Memory Management: Process files one at a time to avoid memory issues
  5. Filtered Dataset: Only process 2025+ matches (when combat logs available)

Processing Strategy by Reliability:

  • High Reliability (JSON start field): ±30 second windows with smart arena detection
  • Medium Reliability (Combat log parsing): ±2 minute windows with smart arena detection
  • Low Reliability (Filename estimation): ±5 minute windows with smart arena detection

🔄 Current Status ✅

Completed:

  • ✅ Enhanced timestamp matching (100% success)
  • ✅ Combat log format analysis and fixes
  • ✅ Smart arena boundary detection algorithm
  • ✅ Enhanced event detection with spell tracking
  • ✅ Solo Shuffle support
  • ✅ Production-ready parser with all optimizations

Ready to Run:

  • 🚀 Production parser ready for full dataset processing
  • 📊 Will process 2025+ matches (when combat logs available)
  • 💾 Output: match_features_enhanced.csv with enhanced metrics
  • 🔧 All fixes applied and tested

Next Steps:

  1. Run production parser: python enhanced_combat_parser_production.py
  2. Validate output in match_features_enhanced.csv
  3. Begin AI model training with enhanced feature dataset
  4. Build analytics dashboard for performance insights

🗃️ Key Files Created

  • improved_timestamp_matcher.py - Smart timestamp matching ✅
  • debug_enhanced_combat_parser_fixed.py - Debug version with extensive logging ✅
  • enhanced_combat_parser_production.py - PRODUCTION VERSION
  • master_index_enhanced.csv - Enhanced video index with precise timestamps ✅
  • match_features_enhanced.csv - Enhanced combat features (output target) 🎯
  • parse_logs_fast.py - Original parser (reference for working event detection) ✅

💡 Key Learnings

  1. Precision Matters: 5-10 minute timestamp errors cause wrong arena matching
  2. Combat Log Structure: Full date format, not time-only with filename date
  3. Arena Validation: Must match both bracket type AND map name
  4. Event Types: Different events have different field structures (SPELL_DISPEL vs SPELL_CAST_SUCCESS)
  5. Pet Handling: Pets need special logic for casts vs purges vs interrupts
  6. Production Efficiency: Progress tracking and error handling essential for large datasets
  7. Reliability-Based Processing: Different strategies needed based on timestamp quality

🎯 Production Deployment

Command to Run:

cd "E:/Footage/Footage/WoW - Warcraft Recorder/Wow Arena Matches"
python enhanced_combat_parser_production.py

Expected Output:

  • Processing ~3,000+ matches from 2025 (combat log availability period)
  • Progress updates every 100 matches
  • Errors logged to parsing_errors.log
  • Results in match_features_enhanced.csv
  • Processing time: Estimated 30-60 minutes for full dataset

Success Metrics:

  • High success rate on arena boundary detection
  • Accurate event counting within precise time windows
  • Enhanced feature extraction (casts, interrupts, purges, spells)
  • Ready for AI model training pipeline