Skip to content

Latest commit

 

History

History
127 lines (99 loc) · 3.9 KB

File metadata and controls

127 lines (99 loc) · 3.9 KB

Data Processing Utilities

Origin: Patterns extracted from extendo and NadeStacked data processing workflows
Dependencies: Python standard library
Purpose: Reusable utilities for cleaning, transforming, and aggregating gaming data

Modules

stats_aggregators.py

Functions for calculating gaming statistics from segment data:

  • win_rate(), pick_rate(), ban_rate() - Win/pick/ban percentages
  • kdr(), adr(), kpr() - Kill/Death ratio, Average Damage, Kills per Round
  • headshot_percentage(), accuracy_percentage() - Shooting accuracy stats
  • StatsAggregator class for multi-segment analysis

transformers.py

Data cleaning and transformation utilities:

  • DataCleaner - Safe type conversion and cleaning
  • DataNormalizer - Standardize data from different API sources
  • DataGrouper - Group data by teams, maps, time periods
  • DataFilter - Filter data by skill level, recency, custom criteria
  • DataAggregator - Calculate frequencies, averages, aggregations

Usage Examples

Statistics Calculation

from voodoo_box.data.stats_aggregators import adr, kdr, StatsAggregator

# Calculate individual stats
player_adr = adr(player_segment)
player_kd = kdr(player_segment)

# Aggregate across multiple segments
aggregator = StatsAggregator()
for segment in player_segments:
    aggregator.add_segment(segment)

avg_rating = aggregator.average_stat("Rating")
total_matches = aggregator.total_stat("Matches")

Data Transformation

from voodoo_box.data.transformers import DataNormalizer, DataGrouper, DataFilter

# Normalize player data from API
normalized_players = [
    DataNormalizer.normalize_player_data(player) 
    for player in raw_api_response
]

# Group players by team
teams = DataGrouper.group_players_by_team(normalized_players)

# Filter by skill level
high_skill = DataFilter.filter_players_by_skill(
    normalized_players, 
    min_level=7, 
    max_level=10
)

Data Cleaning

from voodoo_box.data.transformers import DataCleaner

cleaner = DataCleaner()

# Safe conversions with fallbacks
elo = cleaner.safe_int(player_data.get('elo'), default=1000)
accuracy = cleaner.normalize_percentage(player_data.get('accuracy'))
nickname = cleaner.clean_string(player_data.get('name'))

# Extract nested values
skill_level = cleaner.extract_nested_value(
    player_data, 
    'games.cs2.skill_level', 
    default=1
)

Common Patterns

Player Data Pipeline

# Complete data processing pipeline
def process_player_data(raw_players):
    # 1. Normalize structure
    normalized = [DataNormalizer.normalize_player_data(p) for p in raw_players]
    
    # 2. Filter out errors
    valid_players = [p for p in normalized if not p.get('has_error')]
    
    # 3. Group by teams
    teams = DataGrouper.group_players_by_team(valid_players)
    
    # 4. Calculate team statistics
    for team_name, players in teams.items():
        aggregator = StatsAggregator()
        for player in players:
            if player.get('stats'):
                aggregator.add_segment({'stats': player['stats']})
        
        team_avg_rating = aggregator.average_stat('Rating')
        print(f"{team_name} average rating: {team_avg_rating}")
    
    return teams

Match Analysis

# Analyze recent match performance
recent_matches = DataFilter.filter_recent_matches(all_matches, days=7)
map_groups = DataGrouper.group_by_map(recent_matches)

for map_name, matches in map_groups.items():
    win_rate = sum(1 for m in matches if m.get('won')) / len(matches) * 100
    print(f"{map_name}: {win_rate:.1f}% win rate ({len(matches)} matches)")

Integration Notes

  • All functions handle missing/invalid data gracefully with sensible defaults
  • Utilities are pure functions (no side effects) for easy testing
  • Designed to work with Faceit API response formats but adaptable to other sources
  • Can be used individually or combined in data processing pipelines