This repository provides a cleaned and structured dataset of African languages, formatted for developers, data scientists, and researchers. The dataset is sourced from SIL (Summer Institute of Linguistics) and is available in multiple formats to ensure ease of use across different applications.
The dataset is available for direct download in the following formats:
Format | File Name | Best For |
---|---|---|
Excel | African_Languages.xlsx | Business & analysis tools (Excel, Google Sheets) |
CSV | African_Languages.csv | General-purpose data usage & databases |
JSON | African_Languages.json | Web & API applications |
Parquet | African_Languages.parquet | Big data & fast analytics |
SQLite | African_Languages.db | Structured database queries |
The dataset includes the following columns:
Column Name | Description |
---|---|
language_name |
The full name of the language |
language_code |
The ISO-based 3-letter language code extracted from the dataset |
country_code |
The ISO 2-letter country code where the language is spoken |
country |
The country name where the language is spoken, based on ISO 2-letter country codes |
The data for this dataset is sourced from SIL (Summer Institute of Linguistics). You can learn more about SIL and its work on SIL International.