Origin: Patterns extracted from extendo and NadeStacked data processing workflows
Dependencies: Python standard library
Purpose: Reusable utilities for cleaning, transforming, and aggregating gaming data
Functions for calculating gaming statistics from segment data:
win_rate(),pick_rate(),ban_rate()- Win/pick/ban percentageskdr(),adr(),kpr()- Kill/Death ratio, Average Damage, Kills per Roundheadshot_percentage(),accuracy_percentage()- Shooting accuracy statsStatsAggregatorclass for multi-segment analysis
Data cleaning and transformation utilities:
DataCleaner- Safe type conversion and cleaningDataNormalizer- Standardize data from different API sourcesDataGrouper- Group data by teams, maps, time periodsDataFilter- Filter data by skill level, recency, custom criteriaDataAggregator- Calculate frequencies, averages, aggregations
from voodoo_box.data.stats_aggregators import adr, kdr, StatsAggregator
# Calculate individual stats
player_adr = adr(player_segment)
player_kd = kdr(player_segment)
# Aggregate across multiple segments
aggregator = StatsAggregator()
for segment in player_segments:
aggregator.add_segment(segment)
avg_rating = aggregator.average_stat("Rating")
total_matches = aggregator.total_stat("Matches")from voodoo_box.data.transformers import DataNormalizer, DataGrouper, DataFilter
# Normalize player data from API
normalized_players = [
DataNormalizer.normalize_player_data(player)
for player in raw_api_response
]
# Group players by team
teams = DataGrouper.group_players_by_team(normalized_players)
# Filter by skill level
high_skill = DataFilter.filter_players_by_skill(
normalized_players,
min_level=7,
max_level=10
)from voodoo_box.data.transformers import DataCleaner
cleaner = DataCleaner()
# Safe conversions with fallbacks
elo = cleaner.safe_int(player_data.get('elo'), default=1000)
accuracy = cleaner.normalize_percentage(player_data.get('accuracy'))
nickname = cleaner.clean_string(player_data.get('name'))
# Extract nested values
skill_level = cleaner.extract_nested_value(
player_data,
'games.cs2.skill_level',
default=1
)# Complete data processing pipeline
def process_player_data(raw_players):
# 1. Normalize structure
normalized = [DataNormalizer.normalize_player_data(p) for p in raw_players]
# 2. Filter out errors
valid_players = [p for p in normalized if not p.get('has_error')]
# 3. Group by teams
teams = DataGrouper.group_players_by_team(valid_players)
# 4. Calculate team statistics
for team_name, players in teams.items():
aggregator = StatsAggregator()
for player in players:
if player.get('stats'):
aggregator.add_segment({'stats': player['stats']})
team_avg_rating = aggregator.average_stat('Rating')
print(f"{team_name} average rating: {team_avg_rating}")
return teams# Analyze recent match performance
recent_matches = DataFilter.filter_recent_matches(all_matches, days=7)
map_groups = DataGrouper.group_by_map(recent_matches)
for map_name, matches in map_groups.items():
win_rate = sum(1 for m in matches if m.get('won')) / len(matches) * 100
print(f"{map_name}: {win_rate:.1f}% win rate ({len(matches)} matches)")- All functions handle missing/invalid data gracefully with sensible defaults
- Utilities are pure functions (no side effects) for easy testing
- Designed to work with Faceit API response formats but adaptable to other sources
- Can be used individually or combined in data processing pipelines