LRL Evaluation Portal - Low-Resource Language Model Benchmarking

A comprehensive web dashboard for visualizing and analyzing the performance of Large Language Models (LLMs) across low-resource languages. This application provides interactive visualizations, comparative analysis, and detailed metrics for evaluating AI-generated text quality.

Overview

This project aims to close the gap in AI development and evaluation by systematically evaluating the quality of sentences generated by different LLMs across multiple African languages. Through careful assessment of AI-generated text across critical dimensions, we gather essential feedback from expert reviewers that helps us understand the current strengths and weaknesses of AI-generated text in these languages.

Features

Round 3 Analysis (Q4 2025)

8 Nigerian Languages: Bura-Pabir, Fulani, Hausa, Igbo, Marghi, Nigerian Pidgin, Shuwa Arabic, Yoruba
3 Metrics (1-7 scale):
- Clarity: How easy it is to read and understand the text
- Naturalness: Whether the sentence sounds like native speech
- Correctness: Technical accuracy (spelling, grammar, verb tenses)
Primary/Secondary Data Sources: Compare evaluations from different reviewers
Interactive Charts: Bar charts with error bars for each language
Model Filtering: Filter by provider (Anthropic, Google, OpenAI) and individual models
Language Navigation: Browse by individual language or view all languages

Round 2 Analysis

12 African Languages organized by geographic regions:
- East African: Amharic, Swahili, Luo
- West African: Yoruba, Hausa, Kanuri, Twi, Wolof, Yemba
- Southern African: Chichewa
- Central African: Luganda, Ewondo
5 Metrics:
- Readability (1-7 scale): How easy it is to read and understand the translation
- Adequacy (1-7 scale): How accurately the translation captures the original meaning
- Grammatical Correct (%) (0-100%): Percentage of grammatically correct sentences
- Real Words (%) (0-100%): Percentage of sentences with only real words
- Notable Error (%) (0-100%): Percentage of sentences with notable errors (lower is better)
Primary/Secondary Reviewer: Compare evaluations from different reviewers
Language Group Filtering: Browse by geographic region or individual language
Interactive Charts: Bar charts with error bars, percentage scales for percentage metrics

Common Features

Dark/Light Theme: Toggle between themes
Responsive Design: Optimized for desktop and mobile devices
Summary Statistics:
- Overall Leader (best performing model)
- Languages Analyzed count
- Models Compared count
- Total Samples count
Data Export: Download filtered data as CSV or JSON
Round Toggle: Switch between Round 2 and Round 3 analyses
Metric Tooltips: Hover over metrics to see definitions

Project Structure

web/
├── public/
│   ├── data/                    # Round 3 CSV files
│   └── round2/                  # Round 2 CSV files (by language)
├── src/
│   ├── components/
│   │   ├── Controls.jsx        # Round 3 filtering controls
│   │   ├── ControlsRound2.jsx  # Round 2 filtering controls
│   │   ├── LanguageChart.jsx   # Round 3 chart component
│   │   ├── LanguageChartRound2.jsx # Round 2 chart component
│   │   └── Toast.jsx           # Toast notification component
│   ├── pages/
│   │   ├── LandingPage.jsx    # Landing page with analysis cards
│   │   ├── Dashboard.jsx       # Round 3 dashboard
│   │   └── DashboardRound2.jsx # Round 2 dashboard
│   ├── utils/
│   │   ├── data.js             # Round 3 data loading and processing
│   │   └── dataRound2.js       # Round 2 data loading and processing
│   ├── App.jsx                 # Main app component with routing
│   └── main.jsx                # Entry point
└── package.json

Getting Started

Prerequisites

Node.js (v18 or higher)
npm or yarn

Installation

Navigate to the web directory:

cd web

Install dependencies:

npm install
# or
yarn install

Development

Start the development server:

npm run dev
# or
yarn dev

The application will be available at http://localhost:5173

Building for Production

Build the application:

npm run build
# or
yarn build

Preview the production build:

npm run preview
# or
yarn preview

Usage

Landing Page

The landing page provides:

Project Motivation: Overview of the project goals
Available Analyses: Cards for each round of analysis
- Click on a round card to see a preview with:
  - Languages included
  - Model performance summary
  - Navigation options (View Full Analysis, Browse by Language/Group)

Dashboard Navigation

Round 3 Dashboard

Metrics: Toggle between Clarity, Naturalness, and Correctness
Data Source: Switch between Primary and Secondary evaluations
Providers: Filter by Anthropic, Google, or OpenAI
Models: Select specific models within each provider
Language Selector: When viewing a single language, switch between languages without returning to the landing page

Round 2 Dashboard

Metrics: Toggle between Readability, Adequacy, Grammatical Correct (%), Real Words (%), and Notable Error (%)
Reviewer: Switch between Primary and Secondary reviewers
Providers: Filter by Anthropic, Google, or OpenAI
Models: Select specific models within each provider
Language Group/Language Selector: Browse by geographic region or individual language

Features

Interactive Charts: Hover over bars to see detailed statistics
Summary Cards: View overall leader, language count, model count, and total samples
Download Data: Export current filtered view as CSV or JSON
Round Toggle: Switch between Round 2 and Round 3 from the dashboard
Theme Toggle: Switch between dark and light themes

Technologies Used

React 19: UI framework
Vite: Build tool and dev server
React Router DOM: Client-side routing
Recharts: Interactive chart library
PapaParse: CSV parsing
Lodash: Data manipulation utilities
Lucide React: Icon library

License

BSD 3-Clause License

See LICENSE for full license text.

Contributing

This is an internal project for evaluating LLM performance on low-resource languages. For questions or contributions, please contact the project maintainers.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
public		public
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
components.json		components.json
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LRL Evaluation Portal - Low-Resource Language Model Benchmarking

Overview

Features

Round 3 Analysis (Q4 2025)

Round 2 Analysis

Common Features

Project Structure

Getting Started

Prerequisites

Installation

Development

Building for Production

Usage

Landing Page

Dashboard Navigation

Round 3 Dashboard

Round 2 Dashboard

Features

Technologies Used

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LRL Evaluation Portal - Low-Resource Language Model Benchmarking

Overview

Features

Round 3 Analysis (Q4 2025)

Round 2 Analysis

Common Features

Project Structure

Getting Started

Prerequisites

Installation

Development

Building for Production

Usage

Landing Page

Dashboard Navigation

Round 3 Dashboard

Round 2 Dashboard

Features

Technologies Used

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages