CQA_EntityLinking

Codes and data set for IJCAI2022: Community Question Answering Entity Linking via Leveraging Auxiliary Data.

Requirements

torch == 1.8.0+
transformers == 4.5.1

Dataset: QuoraEL

We construct a new dataset QuoraEL, which contains data of 504 CQA texts in total. The Wikipedia dump (July 2019 version) is used as the reference KB. Our data are in the folder data sets. CQAEL_dataset.json contains QuaraEL data mentioned above. Details of other files can be found in the codes for format conversion. Since our data set folder is too large, we release it here.

Data format

For each question, the following items are covered:question title, question url, ID of question, answers, mentions in question title, topics .

topics includes topic name, topic url, questions under this topic
For each answer, the following items are covered:

answer url, answer id, upvote count, answer content, mentions in answer content, user name, user url, user history answers, user history questions
For each mention, the following items are covered:

mention text, corresponding entity, candidates, gold entity index

candidates is a string and each candidate in Candidates is like:

<ENTITY>\t<WIKIPEDIA_ID>\t<PRIOR_PROB>

The index of gold entity is '-1' if the mention cannot be linked to any candidates. There are 8030 mentions that can be linked to some candidate.

Load data

The data set is constructed in json format. You can load it easily.

import json
with open(PATH_OF_DATASET_FILE, 'r') as fp:
  data = json.load(fp)

Files

models folder: Codes of our model and baseline models. Baselines includes Deep-ED, Ment-Norm, FGS2EE, Zeshel, REL, BLINK, GENRE. Some data files can be downloaded via links in their original repository.
dataset folder: our data are in the subfolder cqa-el. CQAEL_dataset.json contains QuaraEL data mentioned above. Details of other files can be found in the codes for format conversion.

For more details about the data set and the experiment settings, please refer to our paper.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
BLINK		BLINK
EntEmb		EntEmb
GENRE		GENRE
REL		REL
common		common
mulrel_nel		mulrel_nel
our_framework		our_framework
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CQA_EntityLinking

Requirements

Dataset: QuoraEL

Data format

Load data

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CQA_EntityLinking

Requirements

Dataset: QuoraEL

Data format

Load data

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages