Skip to content

Latest commit

Β 

History

History
43 lines (30 loc) Β· 1.93 KB

File metadata and controls

43 lines (30 loc) Β· 1.93 KB

Knowledge Update Playground (KUP)

Hugging Face Dataset License Build Status


Welcome to Knowledge Update Playground (KUP) β€” an automatic framework for generating realistic knowledge update/conflict datasets and evaluating how well Large Language Models (LLMs) adapt to knowledge changes during continued pre-training.

πŸš€ Overview

KUP helps researchers and practitioners:

  • Generate realistic knowledge update pairs to simulate real-world knowledge shifts and conflicts.
  • Evaluate LLMs’ adaptability to knowledge updates during fine-tuning or continued pre-training.
  • Train LLMs using both continued pre-training and supervised fine-tuning following the setup in Synthetic Continued Pre-training.

This playground is designed to benchmark how well LLMs handle incremental knowledge, especially in dynamic environments.

Note: The main branch is fully functional. However, we are actively working on improving code readability, structure, and usability to make the project more production-ready in prod branch.


πŸ“„ Dataset

The KUP dataset contains 5,000 high-quality knowledge update/conflict pairs, automatically synthesized and verified to represent realistic knowledge shifts.

πŸ”— Hugging Face Dataset:
https://huggingface.co/datasets/aochongoliverli/KUP


πŸ“₯ Installation

git clone https://github.com/your-username/KUP.git
cd KUP
pip install -r requirements.txt