You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/pages/portfolio/statsbomb.mdx
+15-13Lines changed: 15 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ import Accordion from '../../components/Accordion.astro';
20
20
</Button>
21
21
22
22
<Headinglevel={1}as="h1"class="mb-4">Statsbomb Sports Data Collection</Heading>
23
-
<Bodysize="lg"as="p"class="text-neutral mb-6">Real-time sports data collection scaled from 100 to 1000+ collectors through architecture as data: domain logic as configuration, UI workflows as state machines, and claims-based metadata. What we built worked—expanding to new sports without rewriting code. The lesson came from what took longer: building distributed team ownership two years late meant racing time when external constraints intervened.</Body>
23
+
<Bodysize="lg"as="p"class="text-neutral mb-6">Real-time sports data collection achieved 10x growth through architecture-as-data: domain logic as configuration, UI workflows as state machines, and claims-based metadata. What we built worked—expanding to new sports without rewriting code. The lesson came from what took longer: building distributed team ownership two years late meant racing time when external constraints intervened.</Body>
24
24
25
25
<divclass="flex flex-wrap gap-2 mb-4">
26
26
{/* System Characteristics */}
@@ -72,7 +72,7 @@ Years of daily conversations with collectors revealed their actual workflows and
72
72
73
73
The dataspec itself needed to be data, not code, so product managers could express rules in one place that drove the entire system.
74
74
75
-
Event storming sessions mapped the domain: match metadata, event collection, people coordination, media management, contextual aggregation. One insight stood out: arbitrary aggregation from atomic facts. Individual
75
+
Event storming sessions mapped the domain. Five bounded contexts emerged: match metadata, event collection, people coordination, media management, contextual aggregation. One insight stood out: arbitrary aggregation from atomic facts. Individual
76
76
passes and dribbles were atomic events. Multiple dribbles by the same player became a
77
77
"carry"—a player-level durational fact. Team possession spanned all team durational facts
78
78
(carry, foul-won, opponent-out) into a higher-level aggregation. Turnovers derived from possession changes.
@@ -93,7 +93,7 @@ Product managers couldn't define new collection requirements without engineering
93
93
94
94
Statsbomb provided granular sports analytics to professional clubs, broadcasters, and betting operators—markets where data velocity creates competitive advantage. Teams analyzing opponent patterns hours after matches finished fell behind. Broadcasters needed same-day insights for Monday coverage. Betting markets priced faster with real-time event feeds.
95
95
96
-
The 75% efficiency gain (16 hours → 4 hours) wasn't just operational—it unlocked new revenue streams. Faster collection meant Monday match analysis by Tuesday morning. Multi-sport expansion (soccer to American football) without proportional engineering cost meant entering adjacent markets with existing infrastructure. Scaling from 100 to 1000+ collectors without linear staffing growth preserved margins while growing coverage.
96
+
The 75% efficiency gain wasn't just operational—it unlocked new revenue streams. Faster collection meant Monday match analysis by Tuesday morning. Multi-sport expansion (soccer to American football) without proportional engineering cost meant entering adjacent markets with existing infrastructure. 10x operational scale without linear staffing growth preserved margins while growing coverage.
97
97
98
98
The architectural separation between rules and execution created a strategic moat: product managers could customize data specifications per client without engineering rewrites. A Premier League club wanting detailed positioning data and a Championship club needing only basic events both ran on the same system—different configurations, same codebase.
99
99
</section>
@@ -459,14 +459,14 @@ Freeze frames (positioning data for all 22 players) were semi-automated. Collect
459
459
460
460
</Accordion>
461
461
462
-
This tool became the foundation. It demonstrated that architecture as data worked in practice: the dataspec drove UI validation, the DSL defined legal event sequences, the state machines enforced correctness. 16 sequential hours → 4 man-hours during the live match with near real-time (~20s latency). The backend scaled to support what the UX tool showed collectors needed.
462
+
This tool became the foundation. The dataspec drove UI validation, the DSL defined legal event sequences, the state machines enforced correctness. Concurrent collection during live matches with sub-minute latency. The backend scaled to support what the UX tool showed collectors needed.
463
463
</div>
464
464
465
465
{/* Event Collection Subsection */}
466
466
<divclass="mb-10">
467
467
<Headinglevel={3}as="h3"class="mb-4"><span>Backend Evolution: From Batch to Real-Time</span></Heading>
468
468
469
-
Breaking matches down by decision rather than team increased correctness without additional effort. Computer vision assisted input, contextual keyboard mappings reduced cognitive load, and linting caught 99% of errors automatically. Collectors focused on judgment over correction—handling the 1% edge cases where human expertise mattered rather than catching preventable mistakes.
469
+
Breaking matches down by decision rather than team increased correctness without additional effort. Computer vision assisted input, contextual keyboard mappings reduced cognitive load, and automated linting caught preventable errors. Collectors focused on judgment over correction—handling edge cases where human expertise mattered rather than catching mistakes.
@@ -643,15 +643,17 @@ The system prevented duplicate entities by design, caught conflicts automaticall
643
643
644
644
<Accordionsummary="Quantitative Results">
645
645
646
-
**Real-time collection with 75% efficiency gain:**Collection shifted from 16 sequential hours post-match to 4 man-hours during the live match with near real-time (~20s latency).
646
+
**Real-time collection:**Concurrent collection during live matches replaced post-match sequential processing.
647
647
648
648
**Minimal engineering bottleneck:** When we expanded to American football, product managers wrote new dataspecs and grouping rules with minimal code changes.
649
649
650
-
**99% error prevention:** Linting and contextual validation caught errors automatically.
651
-
Collectors focused on the 1% judgment calls where human expertise mattered—event type conflicts,
650
+
**Automated validation:** Linting and contextual validation caught errors automatically.
651
+
Collectors focused on judgment calls where human expertise mattered—event type conflicts,
652
652
ambiguous data points—rather than catching preventable mistakes.
653
653
654
-
**Non-linear leverage:** Operations scaled from 100 to 1000+ collectors without proportional staffing increases.
654
+
**Non-linear leverage:** Operations scaled 10x without proportional staffing increases.
655
+
656
+
The system scaled horizontally. Product managers shipped features.
655
657
656
658
</Accordion>
657
659
@@ -674,9 +676,9 @@ The architecture worked. But building it was only half the story.
674
676
675
677
<Accordionsummary="Year One: Finding the Core Partnership">
676
678
677
-
Week one was me and Ali hand-writing sequencing rules. Early months: Adham joined and became my right hand. The foundational architecture—data as configuration, state machines, DSLs—emerged from our collaborative exploration. Adham pushed me to chase ideas I wasn't confident enough to pursue alone.
679
+
Week one: Ali and I hand-wrote sequencing rules. Early months: Adham joined as co-architect. The foundational architecture—data as configuration, state machines, DSLs—emerged from our collaborative exploration. Adham pushed me to chase ideas I wasn't confident enough to pursue alone.
678
680
679
-
Month one backend: a single Go endpoint called `sync`—batch, offline. We started building the live-collection-app together with an aircraft engineer friend who was shifting careers to software. No formal process. Just the urgent need to replace Dartfish.
681
+
Month one backend: a single Go endpoint called `sync`—batch, offline. We started building the live-collection-app with a junior engineer transitioning from aerospace. No formal process. Just the urgent need to replace Dartfish.
680
682
681
683
</Accordion>
682
684
@@ -755,7 +757,7 @@ Architecture shapes what's possible. But people make it real.
755
757
756
758
**What Worked in This Context**
757
759
758
-
Architecture emerged from observing where collectors struggled—watching hesitation, workarounds, and slowdowns. Separation of rules from execution paid off at 1000+ collectors (felt like over-engineering at 100). Designing for 80% common cases with flexible extension points prevented premature optimization.
760
+
Architecture emerged from observing where collectors struggled—watching hesitation, workarounds, and slowdowns. Separation of rules from execution paid off at 1000+ collectors. At 100, it felt like over-engineering. Designing for 80% common cases with flexible extension points prevented premature optimization.
759
761
760
762
**What I'd Change**
761
763
@@ -786,7 +788,7 @@ The human side of building production systems: our team in Cairo (2018-2022), wh
*Photos from 2018-2021: Team collaboration, technical discussions, and the people who took foundational concepts beyond their initial vision. (Captions to be added - photo gallery component coming soon.)*
791
+
*Photos from 2018-2021: Team collaboration, technical discussions, and the people who took foundational concepts beyond their initial vision.*
0 commit comments