Google Research Datasets

natural-questions Public

Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question ans…

Python 1.1k 159

conceptual-captions Public

Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

Shell 560 27

Objectron Public

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the came…

Jupyter Notebook 2.3k 264

wit Public archive

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

1.1k 45

paws Public

This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase ident…

Python 560 54

dstc8-schema-guided-dialogue Public

The Schema-Guided Dialogue Dataset

Python 596 131

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Research Datasets

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!