A comprehensive web dashboard for visualizing and analyzing the performance of Large Language Models (LLMs) across low-resource languages. This application provides interactive visualizations, comparative analysis, and detailed metrics for evaluating AI-generated text quality.
This project aims to close the gap in AI development and evaluation by systematically evaluating the quality of sentences generated by different LLMs across multiple African languages. Through careful assessment of AI-generated text across critical dimensions, we gather essential feedback from expert reviewers that helps us understand the current strengths and weaknesses of AI-generated text in these languages.
- 8 Nigerian Languages: Bura-Pabir, Fulani, Hausa, Igbo, Marghi, Nigerian Pidgin, Shuwa Arabic, Yoruba
- 3 Metrics (1-7 scale):
- Clarity: How easy it is to read and understand the text
- Naturalness: Whether the sentence sounds like native speech
- Correctness: Technical accuracy (spelling, grammar, verb tenses)
- Primary/Secondary Data Sources: Compare evaluations from different reviewers
- Interactive Charts: Bar charts with error bars for each language
- Model Filtering: Filter by provider (Anthropic, Google, OpenAI) and individual models
- Language Navigation: Browse by individual language or view all languages
- 12 African Languages organized by geographic regions:
- East African: Amharic, Swahili, Luo
- West African: Yoruba, Hausa, Kanuri, Twi, Wolof, Yemba
- Southern African: Chichewa
- Central African: Luganda, Ewondo
- 5 Metrics:
- Readability (1-7 scale): How easy it is to read and understand the translation
- Adequacy (1-7 scale): How accurately the translation captures the original meaning
- Grammatical Correct (%) (0-100%): Percentage of grammatically correct sentences
- Real Words (%) (0-100%): Percentage of sentences with only real words
- Notable Error (%) (0-100%): Percentage of sentences with notable errors (lower is better)
- Primary/Secondary Reviewer: Compare evaluations from different reviewers
- Language Group Filtering: Browse by geographic region or individual language
- Interactive Charts: Bar charts with error bars, percentage scales for percentage metrics
- Dark/Light Theme: Toggle between themes
- Responsive Design: Optimized for desktop and mobile devices
- Summary Statistics:
- Overall Leader (best performing model)
- Languages Analyzed count
- Models Compared count
- Total Samples count
- Data Export: Download filtered data as CSV or JSON
- Round Toggle: Switch between Round 2 and Round 3 analyses
- Metric Tooltips: Hover over metrics to see definitions
web/
├── public/
│ ├── data/ # Round 3 CSV files
│ └── round2/ # Round 2 CSV files (by language)
├── src/
│ ├── components/
│ │ ├── Controls.jsx # Round 3 filtering controls
│ │ ├── ControlsRound2.jsx # Round 2 filtering controls
│ │ ├── LanguageChart.jsx # Round 3 chart component
│ │ ├── LanguageChartRound2.jsx # Round 2 chart component
│ │ └── Toast.jsx # Toast notification component
│ ├── pages/
│ │ ├── LandingPage.jsx # Landing page with analysis cards
│ │ ├── Dashboard.jsx # Round 3 dashboard
│ │ └── DashboardRound2.jsx # Round 2 dashboard
│ ├── utils/
│ │ ├── data.js # Round 3 data loading and processing
│ │ └── dataRound2.js # Round 2 data loading and processing
│ ├── App.jsx # Main app component with routing
│ └── main.jsx # Entry point
└── package.json
- Node.js (v18 or higher)
- npm or yarn
- Navigate to the
webdirectory:
cd web- Install dependencies:
npm install
# or
yarn installStart the development server:
npm run dev
# or
yarn devThe application will be available at http://localhost:5173
Build the application:
npm run build
# or
yarn buildPreview the production build:
npm run preview
# or
yarn previewThe landing page provides:
- Project Motivation: Overview of the project goals
- Available Analyses: Cards for each round of analysis
- Click on a round card to see a preview with:
- Languages included
- Model performance summary
- Navigation options (View Full Analysis, Browse by Language/Group)
- Click on a round card to see a preview with:
- Metrics: Toggle between Clarity, Naturalness, and Correctness
- Data Source: Switch between Primary and Secondary evaluations
- Providers: Filter by Anthropic, Google, or OpenAI
- Models: Select specific models within each provider
- Language Selector: When viewing a single language, switch between languages without returning to the landing page
- Metrics: Toggle between Readability, Adequacy, Grammatical Correct (%), Real Words (%), and Notable Error (%)
- Reviewer: Switch between Primary and Secondary reviewers
- Providers: Filter by Anthropic, Google, or OpenAI
- Models: Select specific models within each provider
- Language Group/Language Selector: Browse by geographic region or individual language
- Interactive Charts: Hover over bars to see detailed statistics
- Summary Cards: View overall leader, language count, model count, and total samples
- Download Data: Export current filtered view as CSV or JSON
- Round Toggle: Switch between Round 2 and Round 3 from the dashboard
- Theme Toggle: Switch between dark and light themes
- React 19: UI framework
- Vite: Build tool and dev server
- React Router DOM: Client-side routing
- Recharts: Interactive chart library
- PapaParse: CSV parsing
- Lodash: Data manipulation utilities
- Lucide React: Icon library
BSD 3-Clause License
Copyright (c) 2025, Dimagi Inc., Cory Zue
See LICENSE for full license text.
This is an internal project for evaluating LLM performance on low-resource languages. For questions or contributions, please contact the project maintainers.