Skip to content

boschresearch/KnowCoL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning which has been accepted by AAAI 2026. This repository provides the official implementation of KnowCoL, a Knowledge-guided Contrastive Learning framework presented in our paper.

Introduction

alt text KnowCoL integrates visual features, text queries, and structured knowledge (e.g., Wikidata relations and Wikipedia descriptions) into a shared semantic space, enabling strong zero-shot generalization and robust disambiguation. The approach significantly improves recognition accuracy for rare and unseen entities while remaining lightweight compared to generative baselines.

Requirements

Create and activate your environment

conda env create -f environment.yml
conda activate my-env

Dataset

Download the OVEN dataset from HuggingFace here Download Wikidata subgraph for OVEN benchmark here Download Wikipedia knowledge base for OVEN benchmark here Place the downloaded data under the appropriate directory expected by the datamodule. E.g.,

dataset/
├── oven_data/               # Processed OVEN annotations
├── oven_images/             # Image files associated with OVEN
├── test_data/               # Test split for evaluation
├── wikidata_subgraph_v1/    # Extracted Wikidata subgraph
│   ├── entity.txt           # List of entity IDs and labels
│   ├── relation.txt         # List of relation IDs and names
│   ├── triplet_h.jsonl      # Head-anchored knowledge graph triples
│   └── triplet_t.jsonl      # Tail-anchored knowledge graph triples
└── knowledge_base
    ├── wikipedia_images_full # contains the lead images on the Wikipedia
    ├── Wiki6M_ver_1_1.jsonl  # contains image paths of the entities. 
    └── wikidata_relation_1_1.jsonl # contains text descriptions of the entities.

Training

python3 knowcol/training.py

config options:

  • model.beta1: hyperparameter beta1
  • model.beta2: hyperparameter beta2
  • datamodule.batch_size: batch size for training
  • trainer.max_epochs: epochs to train ...

Testing

python3 knowcol/evaluations/oven_eval.py

specify the checkpoint and model in the python file

Reference

If you think this work is interesting, please consider to cite:

@article{
    Zhou_Halilaj_Monka_Schmid_Zhu_Wu_Nazer_Staab_2026,
    title={Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning},
    volume={40},
    url={https://ojs.aaai.org/index.php/AAAI/article/view/38370}, DOI={10.1609/aaai.v40i16.38370},
    number={16},
    journal={Proceedings of the AAAI Conference on Artificial Intelligence},
    author={Zhou, Hongkuan and Halilaj, Lavdim and Monka, Sebastian and Schmid, Stefan and Zhu, Yuqicheng and Wu, Jingcheng and Nazer, Nadeem and Staab, Steffen},
    year={2026},
    month={Mar.},
    pages={13638-13646}
}

About

Accompanying code for the AAAI2026 paper "Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors