Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning
This software is a research prototype, solely developed for and published as part of the publication Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning which has been accepted by AAAI 2026. This repository provides the official implementation of KnowCoL, a Knowledge-guided Contrastive Learning framework presented in our paper.
KnowCoL integrates visual features, text queries, and structured knowledge (e.g., Wikidata relations and Wikipedia descriptions) into a shared semantic space, enabling strong zero-shot generalization and robust disambiguation. The approach significantly improves recognition accuracy for rare and unseen entities while remaining lightweight compared to generative baselines.
conda env create -f environment.yml
conda activate my-envDownload the OVEN dataset from HuggingFace here Download Wikidata subgraph for OVEN benchmark here Download Wikipedia knowledge base for OVEN benchmark here Place the downloaded data under the appropriate directory expected by the datamodule. E.g.,
dataset/
├── oven_data/ # Processed OVEN annotations
├── oven_images/ # Image files associated with OVEN
├── test_data/ # Test split for evaluation
├── wikidata_subgraph_v1/ # Extracted Wikidata subgraph
│ ├── entity.txt # List of entity IDs and labels
│ ├── relation.txt # List of relation IDs and names
│ ├── triplet_h.jsonl # Head-anchored knowledge graph triples
│ └── triplet_t.jsonl # Tail-anchored knowledge graph triples
└── knowledge_base
├── wikipedia_images_full # contains the lead images on the Wikipedia
├── Wiki6M_ver_1_1.jsonl # contains image paths of the entities.
└── wikidata_relation_1_1.jsonl # contains text descriptions of the entities.python3 knowcol/training.pyconfig options:
- model.beta1: hyperparameter beta1
- model.beta2: hyperparameter beta2
- datamodule.batch_size: batch size for training
- trainer.max_epochs: epochs to train ...
python3 knowcol/evaluations/oven_eval.pyspecify the checkpoint and model in the python file
If you think this work is interesting, please consider to cite:
@article{
Zhou_Halilaj_Monka_Schmid_Zhu_Wu_Nazer_Staab_2026,
title={Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning},
volume={40},
url={https://ojs.aaai.org/index.php/AAAI/article/view/38370}, DOI={10.1609/aaai.v40i16.38370},
number={16},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Zhou, Hongkuan and Halilaj, Lavdim and Monka, Sebastian and Schmid, Stefan and Zhu, Yuqicheng and Wu, Jingcheng and Nazer, Nadeem and Staab, Steffen},
year={2026},
month={Mar.},
pages={13638-13646}
}