You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LLMs4OL is a community development initiative collocated with the 23rd International Semantic Web Conference (ISWC) to explore the potential of Large Language Models (LLMs) in Ontology Learning (OL), a vital process for enhancing the web with structured knowledge to improve interoperability. By leveraging LLMs, the challenge aims to advance understanding and innovation in OL, aligning with the goals of the Semantic Web to create a more intelligent and user-friendly web.
LLMs4OL is a community development initiative collocated with the International Semantic Web Conference (ISWC) to explore the potential of Large Language Models (LLMs) in Ontology Learning (OL), a vital process for enhancing the web with structured knowledge to improve interoperability. By leveraging LLMs, the challenge aims to advance understanding and innovation in OL, aligning with the goals of the Semantic Web to create a more intelligent and user-friendly web.
5
21
6
22
7
23
.. list-table::
8
-
:widths: 20 80
24
+
:widths: 20 20 60
9
25
:header-rows: 1
10
26
11
-
* - **Task**
27
+
* - **Edition**
28
+
- **Task**
12
29
- **Description**
13
-
* - **Text2Onto**
30
+
* - ``LLMs4OL'25``
31
+
- **Text2Onto**
14
32
- Extract ontological terms and types from unstructured text.
15
33
16
34
**ID**: ``text-to-onto``
17
35
18
36
**Info**: This task focuses on extracting foundational elements (Terms and Types) from unstructured text documents to build the initial structure of an ontology. It involves recognizing domain-relevant vocabulary (Term Extraction, SubTask 1) and categorizing it appropriately (Type Extraction, SubTask 2). It bridges the gap between natural language and structured knowledge representation.
19
37
20
38
**Example**: **COVID-19** is a term of the type **Disease**.
21
-
* - **Term Typing**
39
+
* - ``LLMs4OL'24``, ``LLMs4OL'25``
40
+
- **Term Typing**
22
41
- Discover the generalized type for a lexical term.
23
42
24
43
**ID**: ``term-typing``
25
44
26
45
**Info**: The process of assigning a generalized type to each lexical term involves mapping lexical items to their most appropriate semantic categories or ontological classes. For example, in the biomedical domain, the term ``aspirin`` should be classified under ``Pharmaceutical Drug``. This task is crucial for organizing extracted terms into structured ontologies and improving knowledge reuse.
27
46
28
47
**Example**: Assign the type ``"disease"`` to the term ``"myocardial infarction"``.
29
-
* - **Taxonomy Discovery**
48
+
* - ``LLMs4OL'24``, ``LLMs4OL'25``
49
+
- **Taxonomy Discovery**
30
50
- Discover the taxonomic hierarchy between type pairs.
31
51
32
52
**ID**: ``taxonomy-discovery``
33
53
34
54
**Info**: Taxonomy discovery focuses on identifying hierarchical relationships between types, enabling the construction of taxonomic structures (i.e., ``is-a`` relationships). Given a pair of terms or types, the task determines whether one is a subclass of the other. For example, discovering that ``Sedan is a subclass of Car`` contributes to structuring domain knowledge in a way that supports reasoning and inferencing in ontology-driven applications.
35
55
36
56
**Example**: Recognize that ``"lung cancer"`` is a subclass of ``"cancer"``, which is a subclass of ``"disease"``.
37
-
* - **Non-Taxonomic Relation Extraction**
57
+
* - ``LLMs4OL'24``, ``LLMs4OL'25``
58
+
- **Non-Taxonomic Relation Extraction**
38
59
- Identify non-taxonomic, semantic relations between types.
39
60
40
61
**ID**: ``non-taxonomic-re``
@@ -44,13 +65,17 @@ LLMs4OL is a community development initiative collocated with the 23rd Internati
44
65
**Example**: Identify that *"virus"* ``causes`` *"infection"* or *"aspirin"* ``treats`` *"headache"*.
45
66
46
67
68
+
.. note::
69
+
70
+
* Proceedings of 1st LLMs4OL Challenge @ ISWC 2024 available at `https://www.tib-op.org/ojs/index.php/ocp/issue/view/169 <https://www.tib-op.org/ojs/index.php/ocp/issue/view/169>`_
71
+
* Proceedings of 2nd LLMs4OL Challenge @ ISWC 2025 available at `https://www.tib-op.org/ojs/index.php/ocp/issue/view/185 <https://www.tib-op.org/ojs/index.php/ocp/issue/view/185>`_
47
72
48
73
.. toctree::
49
74
:maxdepth:1
50
-
:caption:LLMs4OL Learners
75
+
:caption:LLMs4OL Challenge Series Participants Learners
The :class:`LearnerPipeline` class integrates the random-forest term-typing learner with a retriever, runs training, and evaluates performance on the test split.
109
-
110
-
.. code-block:: python
111
-
112
-
# Import core modules from the OntoLearner library
113
-
from ontolearner import GeoNames, train_test_split, LearnerPipeline
114
-
from ontolearner.learner.term_typing import AlexbekRFLearner # RF learner over text+graph features
115
-
116
-
# Load the GeoNames ontology and extract labeled term-typing data
117
-
ontology = GeoNames()
118
-
ontology.load()
119
-
data = ontology.extract()
120
-
121
-
# Split the labeled term-typing data into train and test sets
122
-
train_data, test_data = train_test_split(
123
-
data,
124
-
test_size=0.2,
125
-
random_state=42,
126
-
)
127
-
128
-
# Configure the RF-based learner (embeddings + optional graph features)
129
-
rf_learner = AlexbekRFLearner(
130
-
device="cpu", # switch to "cuda" if you have a GPU
131
-
batch_size=16,
132
-
max_length=512, # max tokenizer length for embedding model inputs
133
-
threshold=0.30, # probability cutoff for assigning each type
134
-
use_graph_features=True, # set False for pure RF on text embeddings only
135
-
)
136
-
137
-
# Build the pipeline and pass raw structured objects end-to-end.
138
-
pipe = LearnerPipeline(
139
-
retriever=rf_learner,
140
-
retriever_id="intfloat/e5-base-v2", # or "Qwen/Qwen3-Embedding-4B" if you have sufficient GPU memory
141
-
ontologizer_data=True, # True if data is already {"term": ..., "types": [...], ...}
142
-
device="cpu",
143
-
batch_size=16,
144
-
)
145
-
146
-
# Run the full learning pipeline on the term-typing task
147
-
outputs = pipe(
148
-
train_data=train_data,
149
-
test_data=test_data,
150
-
task="term-typing",
151
-
evaluate=True,
152
-
ontologizer_data=True,
153
-
)
154
-
155
-
# Display evaluation summary and runtime
156
-
print("Metrics:", outputs.get("metrics"))
157
-
print("Elapsed time:", outputs["elapsed_time"])
158
-
print(outputs)
159
-
160
-
161
111
Term Typing (RAG-based)
162
112
-----------------------
163
113
@@ -168,7 +118,7 @@ The RAG-based term-typing setup also uses GeoNames. We again load the ontology a
168
118
169
119
.. code-block:: python
170
120
171
-
from ontolearner import GeoNames, train_test_split, evaluation_report
121
+
from ontolearner import GeoNames, train_test_split
Here, :class:`LearnerPipeline` trains the cross-attention model on train edges, predicts taxonomic relations on the test set, and reports evaluation metrics.
347
-
348
-
.. code-block:: python
349
-
350
-
from ontolearner import GeoNames, train_test_split, LearnerPipeline
351
-
from ontolearner.learner.taxonomy_discovery import AlexbekCrossAttnLearner
0 commit comments