Google Research Datasets
- 1.2k followers
- Mountain View, CA
- http://research.google
Pinned Loading
Repositories
- Amplify_SSA Public
An annotated dataset of 9,003 adversarial queries in seven Sub-Saharan African languages.
google-research-datasets/Amplify_SSA’s past year of commit activity - SAFARI Public
The dataset consists of stereotypes collected in 4 Sub Saharan African countries for the purpose of AI model evaluations.
google-research-datasets/SAFARI’s past year of commit activity - SCALE-Cultural-Data Public
The dataset consists of globally situated cultural artifacts, covering 29 countries and many key aspects of culture.
google-research-datasets/SCALE-Cultural-Data’s past year of commit activity - artydiqa Public
ArTyDi-QA is a dataset for Question Answering (QA) and Question Generation (QG) in Modern Standard Arabic (MSA), adapted from TyDiQA. It features extractive QA where models find answer spans or identify unanswerable questions, and a QG task involving formulating questions from context and answer pairs.
google-research-datasets/artydiqa’s past year of commit activity - ssa-ai-terminologies Public
This dataset provides a glossary of AI terms in Swahili, Zulu, Xhosa, Afrikaans, English (as the common core), and other languages widely spoken in Africa. It's a JSON file, covering “Basic” and “Advanced” levels, to improve AI literacy.
google-research-datasets/ssa-ai-terminologies’s past year of commit activity - MGSM-Rev2 Public
To improve the MGSM benchmark, we corrected two erroneous English questions and rephrased others to remove ambiguity. We then used Gemini to retranslate all questions and subsequently used Gemini to verify that every question in the benchmark is now answerable.
google-research-datasets/MGSM-Rev2’s past year of commit activity - wit-retrieval Public
google-research-datasets/wit-retrieval’s past year of commit activity - cultural_familiarity_annotations Public
The dataset consists of AI generated stories and accompanied human ratings on their cultural fluency and relevance.
google-research-datasets/cultural_familiarity_annotations’s past year of commit activity - tydiqa-wana Public
google-research-datasets/tydiqa-wana’s past year of commit activity - conceptual-12m Public
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
google-research-datasets/conceptual-12m’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…