KG-HTC: Integrating Knowledge Graphs into LLMs for Zero-shot Hierarchical Text Classification

Introduction

Since Directed Acyclic Graph (DAG) can represent the hierarchical structure of labels in Hierarchical Text Classification (HTC), the combination of Large Language Models (LLMs) and knowledge graphs is especially well-suited for this purpose.

We represent the taxonomy as a DAG-based knowledge graph and compute the cosine similarity between the text and the embeddings of labels at each level. By applying preset thresholds, candidate labels that are highly semantically relevant to the input text are chosen at every hierarchical level. Then, leveraging these candidate labels, the system dynamically retrieves the most pertinent subgraph from the complete label knowledge graph corresponding to the given text.

For the retrieved subgraph, an upwards propagation algorithm is employed to systematically enumerate all possible hierarchical paths from the leaf nodes to the root, with each path representing a complete reversed hierarchical label sequence. These structured sequences are subsequently concatenated into a prompt, which is fed into a large language model to perform the zero-shot classification task.

How to run the code

Save database/graph_records.json to neo4j.
Create .env file. Input your OpenAI api and DB password.

API_KEY = 'Input your key'
API_VERSION = ''
AZURE_ENDPOINT = ''
DEPLOYMENT_NAME = ''

EMBEDDING_DEPLOYMENT_NAME = ''

NEO4J_USERNAME = ''
NEO4J_PASSWORD = ''
NEO4J_URI = ''

run experiment with python command.

python code_dbpedia/gpt_dbpedia.py

Evaluation

We evaluate our approach using three public datasets and achieve new state-of-the-art results for all of them. Without relying on any annotated data, the KG-HTC method significantly enhances the model's capability to discriminate long-tail and sparse labels.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
ablation_full_kg		ablation_full_kg
ablation_qwen		ablation_qwen
code_amazon		code_amazon
code_dbpedia		code_dbpedia
code_llm_only		code_llm_only
code_wos		code_wos
database		database
dataset/dbpedia		dataset/dbpedia
prompts		prompts
script_ablation_full		script_ablation_full
script_ablation_qwen		script_ablation_qwen
script_main		script_main
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KG-HTC: Integrating Knowledge Graphs into LLMs for Zero-shot Hierarchical Text Classification

Introduction

How to run the code

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

QianboZang/KG-HTC

Folders and files

Latest commit

History

Repository files navigation

KG-HTC: Integrating Knowledge Graphs into LLMs for Zero-shot Hierarchical Text Classification

Introduction

How to run the code

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages