Developer Dashboard

"Games are won by players who focus on the playing field—not by those whose eyes are glued to the scoreboard." — Warren Buffett, who never had to justify a DevRel budget

Developer Relations has a metrics problem. Not a shortage of them—Lord knows we've got GitHub stars, Discord members, and enough UTM-tagged links to wallpaper a conference hall. The problem is that none of them actually mean anything.

The industry is slowly waking up. According to the State of Developer Relations 2024, the percentage of teams that "don't measure" dropped from 14% to 9%. Progress. But here's what practitioners are actually saying: they want "codified best practices or tooling for producing data and metrics to argue for ROI/impact." They want data-driven DevRel with metrics that matter to stakeholders. They want, in short, to not get laid off.

This dashboard is our answer to that prayer—a comprehensive analytics platform that tracks the full developer journey from first commit to on-chain transaction. Not vanity metrics. Not vibes. Actual, defensible, board-deck-ready numbers.

Built for the Sui blockchain ecosystem, it ingests data from Electric Capital, enriches it with GitHub and blockchain APIs, and produces the metrics that DevRel teams have been begging for:

Monthly Active Developers (MADs) — the North Star metric everyone eventually settles on
Retention cohorts — because acquisition without retention is just expensive churn
Event ROI — hackathon to activated developer, tracked and measured
Package deployment to transaction volume — the elusive "developer to revenue" pipeline

Is it perfect? No. We can't measure whether that conference talk sparked joy in a developer's heart. We can, however, tell you if they shipped code afterward. And in a world where DevRel budgets live or die by accountability, that's the metric that matters.

Screenshots

View	Description
	Developer Analytics — Searchable, filterable list of every contributor with activity levels, tenure, and location. Find your power users, identify churn risks, or discover who's been quietly shipping while you were busy with keynotes.
	Activity Heatmap — When are developers actually working? This day-of-week × hour-of-day visualization reveals timezone distribution and working patterns. Turns out, developers don't actually commit at 2 AM as much as we romanticize.
	Geographic Distribution — 122 countries, continental breakdowns, and city-level data. Finally, evidence for that "we have a global community" claim you've been making in pitch decks.
	Developer Profiles — Individual contributor deep-dives showing commit history, repository involvement, and activity timelines. Because sometimes you need to know exactly who built what.

Local development (when DB_HOST=localhost) automatically bypasses authentication for convenience.

Features & Functionality

Report Dashboard

A comprehensive overview of ecosystem health and developer engagement trends.

Key metrics: Monthly Active Developers (MADs), full-time developers, total repositories, and commits
Interactive time-series charts showing MAD trends (exclusive vs multichain developers)
Developer segmentation by activity level (full-time, part-time, one-time)
Developer segmentation by tenure (newcomers, emerging, established)
Monthly commits visualization by developer type
Year-over-year growth analysis with historical comparisons

Complex Queries: Aggregates data across multiple ecosystems (Sui, Walrus, SNS) using the eco_mads rollup table (or eco_mads_excl_employees materialized view when filtering employees), which pre-calculates daily metrics from 28-day rolling windows across the eco_developer_activities table.

Developer Analytics

Who is building in the ecosystem and track individual contributor profiles.

Searchable, filterable list of all developers with pagination
Advanced filters: GitHub username, name, country, city, contribution rank
Sortable columns: repos, commits, tenure, activity level
Individual developer profiles showing:
- Complete commit history with repository breakdown
- Geographic location
- Activity timeline
- Contribution rank and tenure
- Related repositories
Employee badges identifying Mysten Labs team members

Data Enrichment: Developer profiles are enriched from GitHub GraphQL API (fetch-profiles script) to add avatars, display names, and account metadata. Location data comes from Electric Capital's canonical_developer_locations table.

Repository Analytics

Track which projects are active and understand the technology stack of the ecosystem.

Repository growth time-series chart
Searchable, filterable repository list
Filters: programming language, license type, topics, parent repository
Repository metadata: stars, forks, watchers, contributors
Individual repository profiles showing:
- Commit activity over time
- Top contributors with commit counts
- Related repositories (by shared developers)

Data Enrichment: Repository metadata is fetched from GitHub GraphQL API (fetch-repos script) to add descriptions, languages, licenses, topics, and update status.

Geographic Analytics

Understanding the global distribution of the developer community.

Commits by continent over time (stacked area chart)
Top countries ranked by developer count (paginated)
Top cities ranked by developer count (paginated)
Continental distribution pie chart
Click-through filtering to view developers by location

Complex Queries: Joins canonical_developer_locations with eco_developer_activities to aggregate developer counts by geographic regions, filtering through the ecosystem hierarchy.

Package Analytics

Track Sui blockchain packages and their on-chain activity using a hybrid data system that combines comprehensive registry data with usage analytics.

Features:

Comprehensive package browsing (278K+ packages from Move Registry)
Transaction count and usage analytics for top 1,000 packages
Package version history and dependency tracking
GitHub repository attribution with multi-source linking
Developer attribution via repository contributions and publisher addresses
Package detail pages showing:
- All versions with analytics and dependency counts
- Linked GitHub repositories from multiple sources
- Contributors and commit history
- Publisher addresses and verification status

Data Sources & Architecture

The dashboard uses a hybrid package system that merges data from two complementary sources:

1. MVR (Move Registry) via Snowflake

Coverage: 278,574 packages (comprehensive Sui package registry)
Strengths: Complete package catalog, direct git links (when published), dependency graph, daily call analytics
Import: npx tsx scripts/import-mvr-data.ts (daily from Snowflake)
Tables: mvr_packages, mvr_git_infos, mvr_package_infos, mvr_package_analytics, mvr_pkg_depends
Use Case: Package discovery, analytics, dependency analysis

2. Blockberry API

Coverage: Top 1,000 packages by transaction count
Strengths: Real-world usage data (transaction counts), excellent GitHub repo linking via fuzzy matching + Move Registry API
Import: npm run sui-packages-pipeline (weekly recommended)
Tables: packages, packages_repos, sui_addresses
Use Case: Usage analytics, repo attribution, event tracking

Unified Views (automatically merge both sources):

packages_complete_unified - Master view with MVR + Blockberry + Analytics + Repos
unified_package_repos - Combined repo links from both sources (MVR direct + Blockberry fuzzy)
mvr_packages_enriched - MVR data enhanced with Blockberry metadata
See scripts/create-unified-packages-views.sql for full schema

Package-Repository Linking

The system uses multiple strategies to link packages to GitHub repositories:

Source	Method	Packages	Confidence	Priority
MVR Direct	Repository URL in package metadata	98	1.00	Highest
Move Registry API	Reverse resolution via Blockberry	43	1.00	High
Fuzzy Name Match	String similarity matching	429	0.40-1.00	Medium
Total Unique	Combined from all sources	529	0.79 avg	-

Improvement: The unified system provides 20% more packages with repo links (529 vs 439) compared to using Blockberry alone.

Data Quality Indicators

Each package in the unified system includes quality metadata:

data_source_quality: 'both' (MVR + Blockberry) | 'mvr_only' | 'blockberry_only' | 'mvr_minimal'
has_blockberry_data: Transaction counts and project metadata available
has_mvr_git_link: Direct git link from package publisher
repo_sources: Array of linking methods used (['mvr_direct', 'name_match'])
confidence_score: Repo link confidence (0.40-1.00)

Current Coverage

Total packages: 278,843 (comprehensive MVR catalog)
With transaction data: 1,000 (Blockberry top packages)
With repo links: 529 unique packages → 290 repositories
With analytics: 6,507 packages tracked
Total transactions tracked: 829M+ (Blockberry data)

Events Tracking

Why: Measure the impact of hackathons, workshops, and community events on developer engagement.

What:

CSV import tool with drag-and-drop upload
Real-time import progress with live logging
Enhanced column scanning: Parser scans ALL columns for GitHub URLs and emails, not just named columns
Intelligent multi-step data resolution:
- Matches GitHub usernames to canonical developers
- Matches repository URLs to tracked repos
- Email matching against developer database
- Username extraction from failed repository URLs (e.g., extracts "username" from github.com/username/deleted-repo)
- Creates new developer/repo records from GitHub API if not found
- Suggests related repositories based on developer activity
- Suggests related developers based on repository commits
CSV storage and version history:
- Raw CSV content stored in database (event_csv_uploads table)
- Multiple upload versions preserved with timestamps
- Row-level resolution tracking (event_csv_row_status table)
- View CSV data in event detail page with status icons
- Download original CSV files
- Re-import capability using stored data
Event detail pages showing:
- All participants with their profiles
- All repositories created/contributed to
- Developer engagement metrics
- "View CSV" tab: Interactive table with resolved/unresolved status per row
- Clickable links from resolved rows to developer/repository detail pages
- Resolution statistics (resolved vs unresolved counts)

Data Enrichment: The CSV import process (via /api/sui/events/upload) uses event-data-resolver.ts to match rows to the database, and github-api-resolver.ts to fetch missing data from GitHub. Server-Sent Events (SSE) provide real-time progress updates. Each row's resolution outcome is tracked in event_csv_row_status for audit and debugging.

Wallet Activity Tracking

Goal: Connect on-chain wallet behavior back to developers and packages so we can measure transaction counts, gas usage, and app coverage alongside developer engagement.

Schema: scripts/create-wallet-activity-schema.sql creates the foundational tables:
- wallet_entities: maps wallet addresses to developers, events, packages, repos, etc.
- wallet_daily_activity: raw daily counts pulled from Snowflake daily_active_wallets
- Materialized views wallet_activity_summary, developer_wallet_activity, package_wallet_activity
Snowflake Import (npm run import-daily-wallets):
- Streams daily chunks (default ~120 days lookback, override with WALLET_IMPORT_LOOKBACK_DAYS)
- Deduplicates (wallet, day) rows server-side and upserts into PostgreSQL
- Refreshes the materialized views after each run
Event Wallet Extraction (npm run extract-event-wallets):
- Replays every CSV saved in event_csv_uploads, scanning each row for 0x... addresses
- Ties wallet hits back to event_name, csv_upload_id, and row_index
- Links to canonical_developer_id when the CSV row resolved successfully
MVR Package Senders (npm run link-mvr-wallets):
- Loads mvr_packages.sender values into wallet_entities with package metadata
- Allows us to attribute on-chain activity directly to packages/repos tracked elsewhere in the dashboard
Querying:
- developer_wallet_activity → total transactions/gas per canonical developer (for linked wallets)
- package_wallet_activity → same metrics grouped by package sender wallet
- wallet_activity_summary → raw wallet rollups for auditing or exploratory analysis

Unified Search

Quickly find developers or repositories across the entire ecosystem.

Full-text search across developers (name, login, email) and repositories (name, description)
Keyboard shortcut: Press / anywhere in the app
Relevance-ranked results
Click-through navigation to profiles

Database: Uses a materialized view (search_index) with PostgreSQL full-text search (tsvector/tsquery) combining data from developer_profiles and repository_profiles. Must be refreshed daily with npm run refresh-search-index.

Analytics & Insights

Advanced analytics dashboards providing deep insights into ecosystem health, developer engagement, and repository quality.

Developer Insights (`/insights/developers`)

Comprehensive developer engagement and retention analytics:

DAU/WAU/MAU Metrics: Daily, Weekly, and Monthly Active Developers with health indicators
- WAU/MAU ratio tracking (healthy threshold: ≥0.35)
- Trend analysis vs. prior week
- Health badges (green for healthy, red for unhealthy)
- Time-series charts with dual Y-axes
Short-Term Retention Analysis: 30/60/90-day cohort retention rates
- Tabbed interface for different time windows
- Bar charts showing retention percentage by cohort month
- Tables with cohort size, retained count, and churn metrics
- Color-coded retention rates (green ≥50%, yellow 30-49%, red <30%)
Long-Term Cohort Analysis: Yearly retention curves
- Developer cohorts by joining year (2020+)
- Retention tracking over multiple years
- Visual cohort performance comparison
Commit Activity Heatmap: Day-of-week × hour-of-day patterns
Geographic Distribution: Developer timezone analysis
Code Quality Metrics: Commit patterns and bot activity

Backend APIs:

/api/sui/insights/developers/active-ratios - DAU/WAU/MAU calculations
/api/sui/insights/developers/retention-short-term - 30/60/90-day retention
/api/sui/insights/developers/cohorts - Long-term cohort retention
/api/sui/insights/developers/commit-heatmap - Activity heatmap
/api/sui/insights/developers/timezones - Timezone distribution
/api/sui/insights/developers/code-quality - Code quality metrics

Ecosystem Analytics (`/insights/ecosystem`)

Ecosystem-wide health, talent pool, and contributor funnel analysis:

Contributor Growth & Funneling: Monthly contributor dynamics
- Stacked bars: New vs. Retained contributors
- Line overlays: Repos touched, New repos created
- Contributors per repo ratio tracking
- Repo conversion rate: % pushing to newly created projects (>50% highlighted)
- Last 24 months of funnel metrics
- Identifies whether contributors are concentrating on existing repos or fueling new projects
Developer Churn Risk: Prediction model for at-risk developers
Talent Pool Insights: Multichain vs. exclusive developer analysis
Ecosystem Health Score: Composite health metrics
Event ROI Analysis: Impact measurement for community events

Backend APIs:

/api/sui/insights/developers/contributor-funnel - Contributor funneling metrics
/api/sui/insights/ecosystem/churn-risk - Churn predictions
/api/sui/insights/ecosystem/talent-pool - Talent pool analysis
/api/sui/insights/ecosystem/health - Health scores
/api/sui/insights/ecosystem/event-roi - Event impact

Repository Insights (`/insights/repositories`)

Repository lifecycle, technology distribution, and health analysis:

Repository Age Distribution: Lifecycle metrics and creation velocity
- Age histogram with color-coded buckets (<3mo, 3-6mo, 6-12mo, 1-2y, 2y+)
- Monthly new repositories timeline (last 12 months)
- Summary statistics: Total repos, Median age, % created last 6 months
- Young projects percentage (<6 months old)
Repository Health Scorecard: Quality assessment
- Health scores (0-100) based on 4 factors:
  - License (20%): Valid SPDX license
  - Recent Activity (30%): Pushed within 90 days
  - Status (25%): Not archived
  - Documentation (25%): Description and topics
- Distribution: Healthy (≥75), Moderate (50-74), Poor (<50)
- Component score breakdown for transparency
License Distribution: Pie chart of license adoption
Technology Stack: Bar chart of programming languages

Backend APIs:

/api/sui/insights/repositories/age-distribution - Age buckets and velocity
/api/sui/insights/health-scores - Repository health scores
/api/sui/insights/licenses - License distribution
/api/sui/insights/languages - Language distribution

Package Insights (`/insights/packages`)

Package deployment trends and usage analytics:

Top Packages by Usage: Transaction count rankings
Package Activity Timeline: Monthly call patterns
Package Statistics: Deployment trends and module counts

Backend APIs:

/api/sui/insights/packages/top-packages - Top packages ranking
/api/sui/insights/packages/activity-timeline - Usage timeline
/api/sui/insights/packages/stats - Package statistics

Event Impact Tracker

Dedicated "Impact" tab on each event profile (/events/[name]) measuring participation funnel and developer activation:

Participation Funnel: 6-stage visualization
- Total Participants → Resolved → New Developers → Activated (7d/14d/28d/60d)
- Percentages for each stage with visual flow indicators
Developer Activation Timeline: Bar chart comparing commits 60 days before vs. after event
- Percentage change with color-coded indication
- Event date as zero-day marker
Geographic Distribution: Countries and cities table with developer counts
New Repositories: Repos created ±14/28 days around event
- Creation date, GitHub links, tier classification
- Highlights "new project" tier based on temporal criteria
Impacted Packages: Packages with high-confidence repo links
- Package name, project, transaction count
- Clickable rows to package detail pages
Customizable Windows: ?window=7|14|28|30|60|90 parameter for activation analysis

Backend API:

/api/sui/events/[name]/impact - Complete event impact metrics

Methodology:

New Developers: First Sui activity within 14 days before event
Activated Developers: New developers with activity 1-N days after event
New Projects: Repos created 14 days before to 28 days after event
High-Confidence Links: Uses confidence_score = 1.0 for package attribution

Employee Filtering

Separate community developers from company employees for accurate ecosystem analysis.

Global filter to include or exclude Mysten Labs employees
Persists across page navigation (localStorage)
Visual employee badges on developer profiles
Identification by email domain (@sui.io, @mystenlabs.com) or GitHub org membership

Data Source: employees table populated by identify-employees script, which checks developer emails and GitHub organization memberships.

Performance: When employee filtering is enabled, queries use the eco_mads_excl_employees materialized view for 80-90% faster performance. See MATERIALIZED_VIEWS.md for details.

Technical Stack

Framework: Next.js 16 (App Router, React Server Components)
Language: TypeScript 5.9
UI Library: React 19
Database: PostgreSQL
Styling: Tailwind CSS 3.4
Components: Shadcn/ui (Radix UI primitives)
Charts: Recharts 2.15
Icons: Lucide React
Database Client: node-postgres (pg)
CSV Parsing: PapaParse
Theme: next-themes (dark mode support)

Prerequisites

Node.js: Version 18 or higher
PostgreSQL: Database with Electric Capital schema
GitHub Token: Personal access token with read:user and public_repo scopes
Database Access: Credentials for the EC database (host, port, username, password)

Vercel Password Protection

This application is deployed on Vercel and utilizes Vercel's built-in password protection feature for staging and preview deployments. This provides an additional layer of access control before the application's internal authentication (NextAuth) takes over. This is an intentional security measure and not a vulnerability.

Installation

1. Clone the Repository

git clone <repository-url>
cd ec-developer-dashboard

2. Install Dependencies

npm install

3. Configure Environment Variables

Create a .env file in the root directory:

# Database Configuration
DB_NAME=ec-data
DB_USER=ec
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=5432

# GitHub API Token
GITHUB_TOKEN=ghp_your_token_here

To get a GitHub token:

Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
Generate new token with scopes: read:user, public_repo
Copy token to your .env file

Why you need it: GitHub API rate limits are 60 requests/hour without authentication vs 5,000 requests/hour with a token. The profile and repository enrichment scripts require this higher limit.

Electric Capital Data Import (One-Time)

Before setting up the database optimizations, you need to download and import the Electric Capital data. This is a one-time process that populates your database with the foundational developer activity data.

1. Download Parquet Data Files

npm run download-data

What it does: Downloads all required parquet data files from Electric Capital's R2 storage to the ./data directory Runtime: 10-30 minutes depending on internet speed (~5-10 GB total) Files downloaded: 21 parquet files including:

Ecosystem metadata (ecosystems, organizations, repos)
Developer data (canonical_developers, developer_activities)
Commit history (commits)
Aggregated metrics (eco_mads, eco_developer_activities, etc.)
Geographic data (canonical_developer_locations, country_continent_mapping)

URL source: URLs are defined in urls.txt in the project root

Progress tracking: Shows download progress with percentage completion and file sizes

2. Import Data to PostgreSQL

npm run import-data

What it does: Uses DuckDB to efficiently load parquet files into PostgreSQL, filtering data for Sui, Walrus, and SNS ecosystems Runtime: 30-60 minutes depending on database performance Requirements:

.env file with database credentials (DB_NAME, DB_USER, DB_PASSWORD, DB_HOST, DB_PORT)
Downloaded parquet files in ./data directory (run download-data first)
@duckdb/node-api package (auto-installed with dependencies)

How it works:

Connects to PostgreSQL through DuckDB's Postgres extension
Loads base ecosystem tables (ecosystems, ecosystems_child_ecosystems)
Resolves target ecosystem IDs for Sui (9547), Walrus (16687), and SNS (11444)
Imports ecosystem-scoped tables filtered by ecosystem_id
Derives canonical developer IDs from imported ecosystem data
Imports developer-scoped tables filtered by canonical_developer_id
Imports repository and organization reference tables
Imports supporting lookup tables (country_continent_mapping)

Filtering: The script intelligently filters data to only import records relevant to the Sui ecosystem family, significantly reducing database size and import time compared to importing all Electric Capital data.

Advanced Options:

# Custom data directory
npx tsx scripts/import-data.ts --data-dir /path/to/data

# Custom PostgreSQL schema
npx tsx scripts/import-data.ts --schema custom_schema

# Import different ecosystems
npx tsx scripts/import-data.ts --target-ecosystems "Ethereum" "Base" "Optimism"

Idempotency: The script uses TRUNCATE and INSERT (or CREATE TABLE) so it can be run multiple times safely. Running it again will replace all existing data.

Data Architecture: The import follows Electric Capital's 4-layer pipeline:

Source Layer: Raw data (ecosystems, canonical_developers)
Base Layer: Cleaned data (commits, ecosystems_repos)
Stage Layer: Transformations (ecosystems_repos_recursive, developer_activities)
Rollup Layer: Aggregated metrics (eco_mads, eco_developer_28d_activities)

Database Setup (One-Time)

After importing the Electric Capital data, run these scripts once to set up database optimizations:

1. Create Performance Indexes

npm run create-indexes

What it does: Creates ~15 indexes on frequently-queried columns to optimize dashboard performance. Runtime: 10-30 minutes depending on database size When to run: During initial setup or during low-traffic periods

2. Create Search Index

npm run create-search-index

What it does: Creates a materialized view combining developer and repository data for full-text search. Runtime: ~1 second When to run: Once during initial setup

3. Create Performance Materialized Views

npm run create-mvs

What it does: Creates pre-aggregated materialized views for faster dashboard queries (80-90% performance improvement when filtering employees). Runtime: ~5 seconds When to run: Once during initial setup Details: See MATERIALIZED_VIEWS.md for complete documentation

Initial Data Enrichment (One-Time)

These scripts fetch additional metadata from GitHub to enhance the dashboard:

1. Fetch Developer Profiles

npm run fetch-profiles

What it does: Fetches GitHub profile data (avatar, name, email) for all developers Runtime: ~10 minutes for initial run (batched, rate-limited) API: GitHub GraphQL API (50 users per request, 16 requests/minute) Idempotent: Safe to run multiple times (only fetches missing profiles)

2. Fetch Repository Metadata

npm run fetch-repos

What it does: Fetches GitHub repository metadata (description, language, license, topics) Runtime: ~10 minutes for ~8,000 repositories API: GitHub GraphQL API (50 repos per request, 16 requests/minute) Idempotent: Safe to run multiple times (only fetches missing data)

3. Identify Employees

npm run identify-employees

What it does: Identifies Mysten Labs/Sui employees by email domain and GitHub org membership Runtime: ~1 minute Output: Populates employees table When to run: After initial profile fetch, and periodically to catch new hires

4. Import Package Data (Hybrid System)

The dashboard uses a dual-source package system that combines comprehensive registry data (MVR) with usage analytics (Blockberry):

4a. MVR (Move Registry) Import - Comprehensive Data

npm run create-mvr-schema          # One-time: Create database tables
npx tsx scripts/import-mvr-data.ts # Import from Snowflake

What it does:

Imports comprehensive Sui package registry (278K+ packages)
Includes direct git links, dependency graphs, and call analytics
Queries Snowflake database for latest Move Registry data

Runtime: ~10-20 minutes (large dataset) Requirements: Snowflake credentials in environment variables Output: Populates mvr_packages, mvr_git_infos, mvr_package_infos, mvr_package_analytics, mvr_pkg_depends tables When to run: Initial setup, then daily to get latest registry updates Idempotent: Safe to run multiple times (deduplicates data)

Environment Variables Required:

SNOWFLAKE_ACCOUNT=your_account
SNOWFLAKE_USERNAME=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_DATABASE=your_database
SNOWFLAKE_SCHEMA=your_schema

4b. Blockberry Import - Usage Analytics

npm run create-sui-schema          # One-time: Create database tables (if not exists)
npm run sui-packages-pipeline      # Run full pipeline (import, link, report)

Individual Steps:

npm run import-sui-packages        # Fetch top 1K packages from Blockberry API
npm run link-sui-packages          # Link packages to GitHub repos (fuzzy matching)
npm run sui-package-report         # Generate ecosystem report

What it does:

Fetches top 1,000 Sui packages by transaction count from Blockberry API
Links packages to GitHub repositories using fuzzy name matching + Move Registry API
Tracks publisher addresses and transaction statistics
Generates detailed ecosystem reports

Runtime: ~5-10 minutes for full pipeline Requirements: BLOCKBERRY_API_KEY in environment variables Output: Populates packages, packages_repos, and sui_addresses tables When to run: Initial setup, then weekly to refresh usage data Idempotent: Safe to run multiple times (replaces existing data)

Environment Variables Required:

BLOCKBERRY_API_KEY=your_api_key_here

API Limits: Blockberry free tier limits to 1,000 packages. Contact Blockberry for higher limits.

4c. Create Unified Views - Merge Data Sources

psql -U ec -d ec-data -f scripts/create-unified-packages-views.sql

What it does:

Creates views that intelligently merge MVR and Blockberry data
Combines repo links from both sources (529 unique packages vs 439 from Blockberry alone)
Provides maximum resolution package data with quality indicators

Views Created:

unified_package_repos - Combined repo links (MVR direct + Blockberry fuzzy)
mvr_packages_enriched - MVR data enhanced with Blockberry metadata
packages_complete_unified - Master view with all data sources merged
package_data_coverage - Statistics on data quality and coverage

When to run: After both MVR and Blockberry imports are complete Runtime: <1 second (creates views, doesn't move data)

Regular Maintenance (Scheduled)

Daily: Update All Data

Electric Capital updates their data files daily. To keep your dashboard current, run the complete update workflow:

npm run update

What it does (in order):

Downloads fresh Electric Capital parquet files (~5-10 GB)
Imports updated data to PostgreSQL (filtering for Sui ecosystems)
Fetches updated GitHub developer profiles (avatars, names, emails)
Fetches updated repository metadata (descriptions, languages, topics)
Refreshes search index (full-text search materialized view)
Identifies new Mysten Labs/Sui employees
Refreshes performance materialized views (eco_mads_excl_employees)
Updates Sui package data from Blockberry (top 1K packages)
Imports Move Registry data from Snowflake (packages, analytics, dependencies)
Links Sui packages to GitHub repositories (uses MVR + Blockberry)
Extracts wallet addresses from all stored event CSV uploads
Links Move Registry package senders into wallet_entities
Imports daily wallet activity from Snowflake into wallet_daily_activity
Refreshes MVR summary/list SQL reports

Runtime: 1-2 hours depending on network speed and database performance

Schedule: Set up as a daily cron job at 2 AM UTC (after EC data pipeline updates)

Example cron entry:

0 2 * * * cd /path/to/dashboard && npm run update >> /var/log/dashboard-update.log 2>&1

Note: The npm run update script still runs the Blockberry import/link steps (now #8 and #10). If Blockberry data only needs weekly refreshes in your environment, move those commands to a separate cron and comment them out of the daily pipeline.

Weekly: Update Package Usage Analytics (Optional)

Update Blockberry package transaction counts and fuzzy repo linking:

npm run sui-packages-pipeline  # Import, link, and report

What it does:

Fetches top 1,000 packages by transaction count from Blockberry API
Updates transaction counts and usage statistics
Refreshes fuzzy name matching to GitHub repositories
Generates ecosystem report

Runtime: ~5-10 minutes When: Weekly is sufficient (transaction ranks change slowly)

Individual Update Steps

If you need to run specific updates independently:

# Daily updates
npm run download-data              # Download EC parquet files
npm run import-data                # Import to PostgreSQL
npm run fetch-profiles             # Update developer profiles
npm run fetch-repos                # Update repository metadata
npm run refresh-search-index       # Refresh search index
npm run identify-employees         # Identify new employees
npm run refresh-mvs                # Refresh materialized views
npm run import-mvr-data            # Update MVR package data (daily, Snowflake)
npm run link-sui-packages          # Refresh package-to-repo links (uses MVR data)
npm run extract-event-wallets      # Re-scan CSV uploads for wallet addresses
npm run link-mvr-wallets           # Sync Move Registry senders into wallet_entities
npm run import-daily-wallets       # Sync wallet activity from Snowflake

# Weekly updates
npm run sui-packages-pipeline      # Update Blockberry packages (weekly)

Development

Start Development Server

npm run dev

Open http://localhost:3000 in your browser.

Build for Production

npm run build
npm start

Lint Code

npm run lint

Testing

The dashboard uses Playwright for end-to-end testing with a comprehensive test suite covering chart rendering, accessibility, visual regression, and functional tests.

Running Tests

# Run all tests
npx playwright test

# Run tests in headed mode (see browser)
npx playwright test --headed

# Run specific test file
npx playwright test tests/insights-charts.spec.ts

# Run tests matching pattern
npx playwright test --grep "Developer Insights"

# Update visual regression snapshots
npx playwright test --update-snapshots

# Show test report
npx playwright show-report

Test Organization

The test suite is organized into several categories:

1. Chart Rendering Tests (tests/insights-charts.spec.ts)

Validates all Recharts visualizations render correctly
Checks for chart data presence (pie slices, bar counts, line curves)
Ensures no console errors during chart rendering

2. Functional Tests (tests/new-features.spec.ts)

API endpoint validation
UI component interaction
Tab navigation
Data integrity checks

3. Page Object Models (tests/page-objects/)

Reusable page objects for all insights pages
Type-safe element selectors using data-testid attributes
Helper methods for common interactions
Example: tests/insights-with-page-objects.spec.ts

4. Visual Regression Tests (tests/visual-regression.spec.ts)

Screenshot comparisons for charts and full pages
Detects unintended visual changes
Configurable diff thresholds for anti-aliasing tolerance

5. Accessibility Tests (tests/accessibility.spec.ts)

WCAG 2.0/2.1 Level A & AA compliance using axe-core
Keyboard navigation verification
ARIA attributes validation
Color contrast checking
Mobile touch target sizing

Test ID Convention

All testable UI elements use semantic data-testid attributes for stable selectors:

// KPI Cards
<Card data-testid="daily-active-devs-card">...</Card>

// Charts
<ResponsiveContainer data-testid="developer-activity-metrics-chart">...</ResponsiveContainer>

// Tables
<Table data-testid="top-packages-table">...</Table>

Naming Pattern: {feature}-{component-type}

Examples: license-distribution-chart, event-roi-table, top-repositories-card

Writing New Tests

When adding new features, always update tests:

Add data-testid attributes to new UI elements following the naming convention
Update page objects in tests/page-objects/ with new elements
Add functional tests verifying the feature works correctly
Add visual tests if charts or complex UI are involved
Run accessibility tests to catch WCAG violations early

Example workflow:

# 1. Add new chart component with data-testid
<Card data-testid="new-metric-card">
  <ResponsiveContainer data-testid="new-metric-chart">...</ResponsiveContainer>
</Card>

# 2. Add to page object
readonly newMetricCard: Locator
this.newMetricCard = page.getByTestId('new-metric-card')

# 3. Write test
test('New metric chart renders correctly', async ({ page }) => {
  const insightsPage = new DeveloperInsightsPage(page)
  await insightsPage.navigate()
  await expect(insightsPage.newMetricCard).toBeVisible()
})

# 4. Run tests
npx playwright test tests/insights-charts.spec.ts

Continuous Integration

Tests run automatically on:

Pull requests (all test suites)
Main branch commits (full suite + visual regression)

Local pre-commit hook recommended:

# .git/hooks/pre-commit
#!/bin/sh
npx playwright test --reporter=list

Test Data Requirements

Tests require a running development server with database access:

# Terminal 1: Start dev server
npm run dev

# Terminal 2: Run tests
npx playwright test

Note: Tests use real data from your local database. Ensure data is populated using npm run update before running tests.

Debugging Failed Tests

# Run in debug mode with Playwright Inspector
npx playwright test --debug

# Run specific test with trace
npx playwright test --trace on

# View trace file
npx playwright show-trace trace.zip

Accessibility Testing Setup

Install axe-core for accessibility tests:

npm install -D @axe-core/playwright

Accessibility tests check for:

Proper heading hierarchy
ARIA labels and roles
Color contrast ratios
Keyboard navigation
Screen reader compatibility

Database Architecture

Key Tables & Relationships

The database follows a 4-layer pipeline architecture:

1. Source Layer (Raw Data)

raw_commits: Unprocessed commit data from repositories
canonical_developers: Deduplicated unique developers (master record)
ecosystems_child_ecosystems: Direct parent-child ecosystem relationships

2. Base Layer (Cleaned Data)

commits: Standardized commit records
ecosystems_repos: Cleaned ecosystem-to-repository mappings
canonical_developer_locations: Enriched developer location data

3. Stage Layer (Transformations)

ecosystems_repos_recursive: 🔑 Critical! Resolves parent-child hierarchy to list ALL repositories including descendants
developer_activities: Links each commit to a canonical developer (foundational "who did what" log)
developer_profiles: GitHub profile data (login, avatar, email)
repository_profiles: GitHub repository metadata

4. Rollup Layer (Final Metrics)

eco_mads: 🔑 Primary output! Daily MAD counts with segmentation by activity, tenure, and exclusivity
eco_mads_excl_employees: 🚀 Materialized view! Same as eco_mads but excluding employees (80-90% faster queries)
eco_developer_activities: Maps every developer commit to every ecosystem it belongs to (including all parents)
eco_developer_28d_activities: 28-day rolling window aggregations for MADs calculation
eco_developer_contribution_ranks: Developer activity classification (full-time, part-time, one-time)
eco_developer_tenures: Developer tenure categories (0-1y, 1-2y, 2y+)

Application Tables

event_data: Imported hackathon/workshop participant data
event_csv_uploads: Raw CSV storage with version history (content, filename, uploaded_at)
event_csv_row_status: Per-row resolution tracking (resolved status, developer_id, repo_id, notes)
employees: Identified Mysten Labs/Sui employees with metadata
sui_move_registry: Move package registry data

Query Patterns & Methodology

Ecosystem Selection: Sui vs Walrus

The dashboard tracks three specific ecosystems stored in the database:

Sui (ecosystem_id: 9547) - Main Sui blockchain
Walrus (ecosystem_id: 16687) - Walrus storage network
Sui Name Service (ecosystem_id: 11444) - SNS naming service

"Sui (All)" filter: Aggregates data across all three ecosystems by summing metrics:

SELECT SUM(all_devs) FROM eco_mads
WHERE ecosystem_id IN (9547, 16687, 11444)
GROUP BY day

"Walrus" filter: Shows only Walrus data:

SELECT all_devs FROM eco_mads
WHERE ecosystem_id = 16687

Why we sum: While Walrus and SNS are children of Sui in the hierarchy, the eco_mads table stores them as separate entries. We sum across all three to get aggregate metrics. Minor developer overlap (<5%) is acceptable given Walrus's small size (~74 developers).

Monthly Active Developers (MADs)

Definition: A developer who authored at least one commit within a rolling 28-day period.

Calculation Method:

For each day, count developers active in that day + previous 27 days
Segment by activity type (exclusive/multichain), tenure (0-1y/1-2y/2y+), and contribution level (full-time/part-time/one-time)
Store daily counts in eco_mads table

Monthly Reporting: We use the AVERAGE of daily counts within complete calendar months (not single-day snapshots). This smooths volatility and matches Electric Capital's methodology.

SELECT
  DATE_TRUNC('month', day) as month,
  AVG(all_devs) as avg_mads,
  AVG(exclusive_devs) as avg_exclusive,
  AVG(multichain_devs) as avg_multichain
FROM eco_mads
WHERE ecosystem_id IN (9547, 16687, 11444)
  AND day < DATE_TRUNC('month', CURRENT_DATE)  -- Exclude current partial month
GROUP BY month
ORDER BY month DESC

Growth Calculations

Baseline Date: January 2023

All growth percentages (1-year, 2-year, 3-year) compare current values to January 2023 as a fixed baseline. This date represents when the Sui ecosystem moved from test/early phase to stable operations.

Why January 2023:

Data became stable and consistent
Avoids artificially inflated growth from extremely small baseline numbers (Sept 2022 had only 20 established developers)
Provides meaningful "since ecosystem launch" metrics

Formula:

ROUND(((current_value::numeric / baseline_value) - 1) * 100, 1)

Example:

Baseline (Jan 2023): 500 developers
Current (Oct 2025): 1000 developers
Growth: ((1000 / 500) - 1) × 100 = 100%

Note: All three growth columns (1Y, 2Y, 3Y) show the same percentage because they all reference the same January 2023 baseline. This is intentional and represents growth "since launch."

Developer Classification

By Activity Level

Electric Capital's methodology (from their FAQ):

Points System (over 84-day rolling window):

10+ active days in a 28-day window = 10 points
2-9 active days in a 28-day window = 4 points
1 active day in a 28-day window = 1 point

Classifications (stored in eco_developer_contribution_ranks):

Full-Time (14+ points): Consistent daily/weekly contributors
Part-Time (2-13 points): Regular but not daily contributors
One-Time (1 point): Single contribution in 84-day window

The dashboard uses the values Electric Capital assigns in the database rather than recalculating them.

By Tenure

Based on time since developer's first commit to any blockchain ecosystem (not Sui specifically):

Newcomer (devs_0_1y): 0-365 days since first blockchain commit
Emerging (devs_1_2y): 365-730 days since first blockchain commit
Established (devs_2y_plus): 730+ days since first blockchain commit

Stored in eco_developer_tenures table.

By Ecosystem Engagement

Calculated daily based on which ecosystems a developer committed to:

Exclusive: Contributed to only one blockchain ecosystem on that day
Multichain: Contributed to multiple blockchain ecosystems on that day

Stored in eco_developer_activities with is_exclusive boolean flag.

Commit Counting

Important: The database contains ALL commits from developers active in our ecosystems, including their work on other projects.

Note on Forks and Clones: Electric Capital does not count clones or forks of repositories in their data collection. Only original repositories are tracked. This means forked repositories do not appear in the database and their commits do not contribute to any metrics in Electric Capital's reporting or this dashboard.

Our Methodology:

Get complete list of repositories belonging to Sui ecosystems using ecosystems_repos_recursive (includes parent-child hierarchy)
Count only commits in those repositories
Filter out employee commits when "Exclude Employees" is selected

WITH OurRepos AS (
  SELECT DISTINCT repo_id
  FROM ecosystems_repos_recursive
  WHERE ecosystem_id IN (9547, 16687, 11444)
)
SELECT COUNT(*) AS total_commits
FROM commits
WHERE repo_id IN (SELECT repo_id FROM OurRepos)
  AND committed_at >= '2022-01-01'

Expected Values (as of October 2025):

Sui ecosystem commits: ~900,000
Global database commits: 11,433,162

Recursive Hierarchy Queries

Challenge: Ecosystems have parent-child relationships (e.g., "All Web3" → "Ethereum" → "Base" → "Cryptex"). Activity in child ecosystems should roll up to parents.

Solution: The ecosystems_repos_recursive and ecosystems_child_ecosystems_recursive tables use recursive CTEs to flatten the hierarchy.

Example: Finding all repositories in the Sui ecosystem (including Walrus and SNS):

-- This automatically includes repos from Walrus (16687) and SNS (11444)
-- because they are children of Sui (9547)
SELECT DISTINCT repo_id
FROM ecosystems_repos_recursive
WHERE ecosystem_id = 9547

How it works: During database materialization, a recursive query walks the ecosystems_child_ecosystems tree and copies all descendant repositories up to parent ecosystems.

Data Quality & Assumptions

Known Limitations

Developer Overlap: Developers working on multiple ecosystems (e.g., both Sui and Walrus) are counted in each ecosystem's metrics. When summed, they appear once per ecosystem. Given Walrus's small size (~74 devs), overlap is minimal.
Commit Attribution: The database contains ALL commits from developers active in our ecosystems, including their work on other projects. We filter to only count commits in repos belonging to our three ecosystems.
Forks and Clones Excluded: Electric Capital does not track forked or cloned repositories. Only original repositories appear in the database. This means activity in forks does not contribute to developer counts or commit metrics.
Early Data Volatility: Data before January 2023 shows high volatility due to test/development phase. This is why we use January 2023 as our growth baseline.
Established Developer Definition: "Established" means 2+ years of activity in crypto development (not specific to Sui). Early in the ecosystem lifecycle (2022-early 2023), very few developers met this criteria, leading to high growth percentages.

Data Verification

Expected Metrics (as of October 2025):

Total commits (Sui ecosystems): ~900,000
Total commits (all ecosystems in DB): 11,433,162
Monthly Active Developers: ~1,000-1,100
Total countries with developers: 122
Total repositories (Sui): ~8,000

Verification Query:

-- Should return ~900K
WITH OurRepos AS (
  SELECT DISTINCT repo_id
  FROM ecosystems_repos_recursive
  WHERE ecosystem_id IN (9547, 16687, 11444)
)
SELECT COUNT(*) FROM commits
WHERE repo_id IN (SELECT repo_id FROM OurRepos);

Comparison with Electric Capital Reports

Our dashboard aligns closely with Electric Capital's public reports:

✅ Commit counts: ~900K (EC reports ~1M, difference due to data timing)
✅ 1-year MAD growth: Results closely match EC's published percentages
✅ Monthly averaging methodology matches EC's approach
✅ Use of complete months (not partial) matches EC's reporting

Advanced Features

Real-Time Import Progress (Server-Sent Events)

The event import feature uses SSE to stream progress updates from the server:

Client-side (event-import-modal.tsx):

const eventSource = new EventSource(`/api/sui/events/upload?...`)
eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data)
    if (data.type === "progress") setProgress(data.progress)
    if (data.type === "log") appendLog(data.message)
}

Server-side (app/api/sui/events/upload/route.ts):

const stream = new ReadableStream({
    start(controller) {
        controller.enqueue(`data: ${JSON.stringify({ type: "progress", progress: 50 })}\n\n`)
        controller.enqueue(`data: ${JSON.stringify({ type: "log", message: "Processing..." })}\n\n`)
    },
})

This provides a responsive UX for long-running data import operations.

Intelligent Data Resolution

When importing event CSV data, the system performs multi-pass resolution:

Database Lookup: Check for existing developers/repos by username/URL
GitHub API Fetch: Fetch missing profiles/repos from GitHub
Relationship Suggestions:
- For each developer, suggest repos they've committed to
- For each repo, suggest developers who've committed to it
Deduplication: Prevent duplicate event entries via unique constraint

Implementation: See lib/event-data-resolver.ts and lib/github-api-resolver.ts

State Preservation on Navigation

To provide a seamless UX, filter and pagination state is preserved when navigating to detail pages:

Before Navigation (e.g., in developers/page.tsx):

const handleRowClick = (developerId: number) => {
    const state = {
        githubIdSearch,
        nameSearch,
        countrySearch,
        citySearch,
        rankFilter,
        ecosystem,
        page: developersPage,
        sortBy,
        sortOrder,
        fromDevelopersPage: true,
    }
    window.history.replaceState(state, "")
    router.push(`/developers/${developerId}`)
}

On Return:

useEffect(() => {
    const historyState = window.history.state
    if (historyState?.fromDevelopersPage) {
        setGithubIdSearch(historyState.githubIdSearch || "")
        setNameSearch(historyState.nameSearch || "")
        // ... restore all filters
    }
}, [])

Users can click into developer profiles and use the back button to return to their exact filtered/paginated view.

Troubleshooting

GitHub API Rate Limits

Symptom: Scripts fail with "API rate limit exceeded"

Solution:

Ensure GITHUB_TOKEN is set in .env
Without token: 60 requests/hour
With token: 5,000 requests/hour
Check current limit: curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit

Database Connection Issues

Symptom: "Connection refused" or "Authentication failed"

Solution:

Verify PostgreSQL is running: psql -U ec -d ec-data -c "SELECT 1"
Check .env credentials match database setup
Ensure DB_HOST and DB_PORT are correct (defaults: localhost:5432)
Check firewall rules if connecting to remote database

Slow Query Performance

Symptom: Dashboard pages load slowly (>5 seconds)

Solution:

Ensure indexes are created: npm run create-indexes
Check PostgreSQL query performance: EXPLAIN ANALYZE <query>
Consider running VACUUM ANALYZE on large tables
Monitor database connection pool size (max 20 connections)

Search Not Working

Symptom: Search returns no results or stale data

Solution:

Refresh search index: npm run refresh-search-index
Verify materialized view exists: SELECT * FROM search_index LIMIT 1
Check for errors in search API: Check browser console and server logs

Package Data Issues

Symptom: Missing package data, low repo linking coverage, or outdated transaction counts

Solution:

For MVR Data (comprehensive packages):

Run MVR import: npx tsx scripts/import-mvr-data.ts
Check Snowflake credentials in .env
Verify tables exist: SELECT COUNT(*) FROM mvr_packages
Expected: 278K+ packages

For Blockberry Data (usage analytics):

Run Blockberry import: npm run sui-packages-pipeline
Check Blockberry API key in .env
Verify tables exist: SELECT COUNT(*) FROM packages
Expected: 1,000 packages (top by transaction count)

For Unified Views (merged data):

Create/refresh views: psql -U ec -d ec-data -f scripts/create-unified-packages-views.sql
Check coverage: SELECT * FROM package_data_coverage
Expected: 529+ packages with repo links

For Low Repo Linking Coverage:

Verify both imports ran: Check both mvr_packages and packages tables
Check unified view: SELECT COUNT(*) FROM unified_package_repos
Run fuzzy linking: npm run link-sui-packages
Adjust confidence threshold in queries if needed (default 0.7)

Documentation

Core Documentation

CLAUDE.md: Project guidance for AI-assisted development
DATA_STRUCTURE.md: Database schema details
EXAMPLE_QUERIES.md: SQL query patterns
MATERIALIZED_VIEWS.md: Performance optimization guide

Feature Documentation

CSV_STORAGE_FEATURE.md: CSV storage and version history for event tracking
MOBILE_TESTING_GUIDE.md: Mobile responsive testing guide
PACKAGE_LIST_PERFORMANCE_FIX.md: Package list performance optimization details

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.playwright-mcp		.playwright-mcp
app		app
components		components
docs		docs
hooks		hooks
lib		lib
public		public
screenshots		screenshots
scripts		scripts
tests		tests
.eslint.md		.eslint.md
.gitattributes		.gitattributes
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
components.json		components.json
eslint.config.js		eslint.config.js
middleware.ts.disabled		middleware.ts.disabled
next.config.ts		next.config.ts
overflow-debug.png		overflow-debug.png
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

Developer Dashboard

Screenshots

Features & Functionality

Report Dashboard

Developer Analytics

Repository Analytics

Geographic Analytics

Package Analytics

Data Sources & Architecture

Package-Repository Linking

Data Quality Indicators

Current Coverage

Events Tracking

Wallet Activity Tracking

Unified Search

Analytics & Insights

Developer Insights (/insights/developers)

Ecosystem Analytics (/insights/ecosystem)

Repository Insights (/insights/repositories)

Package Insights (/insights/packages)

Event Impact Tracker

Employee Filtering

Technical Stack

Prerequisites

Vercel Password Protection

Installation

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

Electric Capital Data Import (One-Time)

1. Download Parquet Data Files

2. Import Data to PostgreSQL

Database Setup (One-Time)

1. Create Performance Indexes

2. Create Search Index

3. Create Performance Materialized Views

Initial Data Enrichment (One-Time)

1. Fetch Developer Profiles

2. Fetch Repository Metadata

3. Identify Employees

4. Import Package Data (Hybrid System)

4a. MVR (Move Registry) Import - Comprehensive Data

4b. Blockberry Import - Usage Analytics

4c. Create Unified Views - Merge Data Sources

Regular Maintenance (Scheduled)

Daily: Update All Data

Weekly: Update Package Usage Analytics (Optional)

Individual Update Steps

Development

Start Development Server

Build for Production

Lint Code

Testing

Running Tests

Test Organization

Test ID Convention

Writing New Tests

Continuous Integration

Test Data Requirements

Debugging Failed Tests

Accessibility Testing Setup

Database Architecture

Key Tables & Relationships

1. Source Layer (Raw Data)

2. Base Layer (Cleaned Data)

3. Stage Layer (Transformations)

4. Rollup Layer (Final Metrics)

Application Tables

Query Patterns & Methodology

Ecosystem Selection: Sui vs Walrus

Monthly Active Developers (MADs)

Growth Calculations

Developer Classification

By Activity Level

By Tenure

By Ecosystem Engagement

Developer Insights (`/insights/developers`)

Ecosystem Analytics (`/insights/ecosystem`)

Repository Insights (`/insights/repositories`)

Package Insights (`/insights/packages`)