Skip to content

GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Notifications You must be signed in to change notification settings

GIST-DSLab/GIFARC

Repository files navigation

GIFARC 

Synthetic Dataset for Leveraging Human‑Intuitive Analogies to Elevate AI Reasoning

PaperHugging Face DatasetsLicenseBuild

By embedding robust human-intuitive analogies into ARC-style tasks, GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search, thus efficiently reducing problem complexity and build a more concise and human-understandable solution.


will turn into .... GIFARC!

TL;DR

  • 1,614 ARC style puzzles made from GIF with analogy.
  • Pair‑wise ground‑truth mappings + rich textual rationales for supervised or in‑context use.
  • Easy Play generation pipeline - extend or remix new analogy families with gif in a few minutes.
  • Friendly Hugging Face dataset & interactive web demo for instant exploration.

Table of Contents

  1. Quick Start
  2. Dataset Card
  3. Pipeline Overview
  4. Project Structure
  5. Citing GIFARC
  6. Acknowledgements
  7. License

Quick Start

1. Install

We highly command to using docker. To setting with docker check SETUP.md.

git clone <GIT_url>
cd gifarc
pip install -r requirements.txt
pip install -r requirements-dev.txt      

2. Pull the Dataset

from datasets import load_dataset
ds = load_dataset("DumDev/gif_arc")

3. Generate Your Own GIFARC

Once your Set up is down, open description_executor.ipynb and run the code here.

4. Check the Web Demo

GIFARC Web Demo.


Dataset Card

Split #Tasks #Unique GIFs Size
Train 1,614 1,614 < 100 MB

Every task packages looks as follows:

{
  "source": "<source code>", # python code string
  "examples": [
      [<input_grid_1>,<output_grid_1>], # pair 1
      [<input_grid_2>,<output_grid_2>], # pair 2
      ...
    ], 
  "seeds": [
      "<file_name_1>",
      "<file_name_2>",
      ...,
      "<file_name_N>",
      "<Concept_and_description>"
    ], 
  "url": "<minified_url>"
}

See the full dataset card for licensing, intended use, and data statements.


Pipeline Overview

  • Modular & Easy generation – After put GIF in data/GIF, just click all run button at description_executor.ipynb to generate Your own data!
  • Stable environment setting enable easy set up with docker and devcontainer.
  • All intermediate artifacts are cached for reproducibility.

Detailed instructions live in GENERATION.md.


## Project Structure

./GIFARC
├── data
│   └── GIF
├── description_executor.ipynb # use this to execute
├── docker-compose.yml
├── docs
│   ├── EXPERIMENTS.md
│   ├── GENERATION.md
│   ├── project_directory_tree.txt
│   └── SETUP.md
├── loggings
├── README.md
├── requirements-dev.txt
├── requirements.txt
├── results # this will generate automatically
└── src
    ├── execution.py
    ├── experiments.py
    ├── generate_descriptions.py
    ├── generate_problems.py
    ├── generate_visualization_html.py
    ├── GIFARC_data_batch
    ├── GIFARC_utils
    ├── misc
    ├── parse_batch_description_samples.py
    ├── prompts
    ├── seeds
    ├── utility
    └── visualize_problems.py

Citing GIFARC

@misc{gifarc2025,
  title   = {GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning},
  author  = { Anonymous },
  year    = {2025},
  note    = {Under review at NeurIPS Datasets & Benchmarks 2025},
  url     = {}
}

Acknowledgements

  • GIPHY for powering the GIF search API.
  • BARC – our generation pipeline stands on the shoulders of this excellent project.
  • GIFARC wouldn’t be possible without the open‑source community and our amazing reviewers.

License

Distributed under the MIT License.

About

GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •