Skip to content

Commit ac0c232

Browse files
committed
📝 update challenge learners docs
1 parent 15f628b commit ac0c232

File tree

6 files changed

+312
-748
lines changed

6 files changed

+312
-748
lines changed

docs/source/learners/llms4ol.rst

Lines changed: 39 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,61 @@
1-
LLMs4OL Challenge Learners
2-
==========================
31

4-
LLMs4OL is a community development initiative collocated with the 23rd International Semantic Web Conference (ISWC) to explore the potential of Large Language Models (LLMs) in Ontology Learning (OL), a vital process for enhancing the web with structured knowledge to improve interoperability. By leveraging LLMs, the challenge aims to advance understanding and innovation in OL, aligning with the goals of the Semantic Web to create a more intelligent and user-friendly web.
2+
.. sidebar:: Challenge Series Websites
3+
4+
* `1st LLMs4OL @ ISWC 2024 <https://sites.google.com/view/llms4ol>`_
5+
* `2nd LLMs4OL @ ISWC 2025 <https://sites.google.com/view/llms4ol2025>`_
6+
7+
8+
.. raw:: html
9+
10+
<div align="center">
11+
<img src="https://raw.githubusercontent.com/sciknoworg/OntoLearner/refs/heads/dev/docs/source/learners/images/challenge-logo.png" alt="challenge-logo" width="10%"/>
12+
</div>
13+
14+
LLMs4OL Challenge
15+
==================================================================================================================
16+
17+
18+
19+
20+
LLMs4OL is a community development initiative collocated with the International Semantic Web Conference (ISWC) to explore the potential of Large Language Models (LLMs) in Ontology Learning (OL), a vital process for enhancing the web with structured knowledge to improve interoperability. By leveraging LLMs, the challenge aims to advance understanding and innovation in OL, aligning with the goals of the Semantic Web to create a more intelligent and user-friendly web.
521

622

723
.. list-table::
8-
:widths: 20 80
24+
:widths: 20 20 60
925
:header-rows: 1
1026

11-
* - **Task**
27+
* - **Edition**
28+
- **Task**
1229
- **Description**
13-
* - **Text2Onto**
30+
* - ``LLMs4OL'25``
31+
- **Text2Onto**
1432
- Extract ontological terms and types from unstructured text.
1533

1634
**ID**: ``text-to-onto``
1735

1836
**Info**: This task focuses on extracting foundational elements (Terms and Types) from unstructured text documents to build the initial structure of an ontology. It involves recognizing domain-relevant vocabulary (Term Extraction, SubTask 1) and categorizing it appropriately (Type Extraction, SubTask 2). It bridges the gap between natural language and structured knowledge representation.
1937

2038
**Example**: **COVID-19** is a term of the type **Disease**.
21-
* - **Term Typing**
39+
* - ``LLMs4OL'24``, ``LLMs4OL'25``
40+
- **Term Typing**
2241
- Discover the generalized type for a lexical term.
2342

2443
**ID**: ``term-typing``
2544

2645
**Info**: The process of assigning a generalized type to each lexical term involves mapping lexical items to their most appropriate semantic categories or ontological classes. For example, in the biomedical domain, the term ``aspirin`` should be classified under ``Pharmaceutical Drug``. This task is crucial for organizing extracted terms into structured ontologies and improving knowledge reuse.
2746

2847
**Example**: Assign the type ``"disease"`` to the term ``"myocardial infarction"``.
29-
* - **Taxonomy Discovery**
48+
* - ``LLMs4OL'24``, ``LLMs4OL'25``
49+
- **Taxonomy Discovery**
3050
- Discover the taxonomic hierarchy between type pairs.
3151

3252
**ID**: ``taxonomy-discovery``
3353

3454
**Info**: Taxonomy discovery focuses on identifying hierarchical relationships between types, enabling the construction of taxonomic structures (i.e., ``is-a`` relationships). Given a pair of terms or types, the task determines whether one is a subclass of the other. For example, discovering that ``Sedan is a subclass of Car`` contributes to structuring domain knowledge in a way that supports reasoning and inferencing in ontology-driven applications.
3555

3656
**Example**: Recognize that ``"lung cancer"`` is a subclass of ``"cancer"``, which is a subclass of ``"disease"``.
37-
* - **Non-Taxonomic Relation Extraction**
57+
* - ``LLMs4OL'24``, ``LLMs4OL'25``
58+
- **Non-Taxonomic Relation Extraction**
3859
- Identify non-taxonomic, semantic relations between types.
3960

4061
**ID**: ``non-taxonomic-re``
@@ -44,13 +65,17 @@ LLMs4OL is a community development initiative collocated with the 23rd Internati
4465
**Example**: Identify that *"virus"* ``causes`` *"infection"* or *"aspirin"* ``treats`` *"headache"*.
4566

4667

68+
.. note::
69+
70+
* Proceedings of 1st LLMs4OL Challenge @ ISWC 2024 available at `https://www.tib-op.org/ojs/index.php/ocp/issue/view/169 <https://www.tib-op.org/ojs/index.php/ocp/issue/view/169>`_
71+
* Proceedings of 2nd LLMs4OL Challenge @ ISWC 2025 available at `https://www.tib-op.org/ojs/index.php/ocp/issue/view/185 <https://www.tib-op.org/ojs/index.php/ocp/issue/view/185>`_
4772

4873
.. toctree::
4974
:maxdepth: 1
50-
:caption: LLMs4OL Learners
75+
:caption: LLMs4OL Challenge Series Participants Learners
5176
:titlesonly:
5277

53-
rwthdbis_learner
54-
skhnlp_learner
55-
alexbek_learner
56-
sbunlp_learner
78+
llms4ol_challenge/rwthdbis_learner
79+
llms4ol_challenge/skhnlp_learner
80+
llms4ol_challenge/alexbek_learner
81+
llms4ol_challenge/sbunlp_learner

docs/source/learners/alexbek_learner.rst renamed to docs/source/learners/llms4ol_challenge/alexbek_learner.rst

Lines changed: 29 additions & 172 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ For term typing, we use GeoNames as an example ontology. Labeled term–type pai
5151

5252
.. code-block:: python
5353
54-
from ontolearner import GeoNames, train_test_split, evaluation_report
54+
from ontolearner import GeoNames, train_test_split
5555
5656
# Load the GeoNames ontology and extract labeled term-typing data
5757
ontology = GeoNames()
@@ -74,15 +74,15 @@ The task IDs are: ``term-typing``, ``taxonomy-discovery``, ``non-taxonomic-re``.
7474

7575
.. code-block:: python
7676
77-
from ontolearner.learner.term_typing import AlexbekRFLearner
78-
7977
task = "term-typing"
8078
8179
We first configure the Alexbek random-forest learner.
8280
This learner builds features from text embeddings (and optionally graph structure) and trains a random-forest classifier for term typing.
8381

8482
.. code-block:: python
8583
84+
from ontolearner.learner.term_typing import AlexbekRFLearner
85+
8686
rf_learner = AlexbekRFLearner(
8787
device="cpu", # switch to "cuda" if available
8888
batch_size=16,
@@ -91,6 +91,12 @@ This learner builds features from text embeddings (and optionally graph structur
9191
use_graph_features=True # set False for pure text-based features
9292
)
9393
94+
Learn and Predict
95+
~~~~~~~~~~~~~~~~~~~~~
96+
97+
.. code-block:: python
98+
99+
from ontolearner import evaluation_report
94100
# Fit the RF-based learner on the training split
95101
rf_learner.fit(train_data, task=task)
96102
@@ -102,62 +108,6 @@ This learner builds features from text embeddings (and optionally graph structur
102108
metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
103109
print(metrics)
104110
105-
Pipeline Usage
106-
~~~~~~~~~~~~~~
107-
108-
The :class:`LearnerPipeline` class integrates the random-forest term-typing learner with a retriever, runs training, and evaluates performance on the test split.
109-
110-
.. code-block:: python
111-
112-
# Import core modules from the OntoLearner library
113-
from ontolearner import GeoNames, train_test_split, LearnerPipeline
114-
from ontolearner.learner.term_typing import AlexbekRFLearner # RF learner over text+graph features
115-
116-
# Load the GeoNames ontology and extract labeled term-typing data
117-
ontology = GeoNames()
118-
ontology.load()
119-
data = ontology.extract()
120-
121-
# Split the labeled term-typing data into train and test sets
122-
train_data, test_data = train_test_split(
123-
data,
124-
test_size=0.2,
125-
random_state=42,
126-
)
127-
128-
# Configure the RF-based learner (embeddings + optional graph features)
129-
rf_learner = AlexbekRFLearner(
130-
device="cpu", # switch to "cuda" if you have a GPU
131-
batch_size=16,
132-
max_length=512, # max tokenizer length for embedding model inputs
133-
threshold=0.30, # probability cutoff for assigning each type
134-
use_graph_features=True, # set False for pure RF on text embeddings only
135-
)
136-
137-
# Build the pipeline and pass raw structured objects end-to-end.
138-
pipe = LearnerPipeline(
139-
retriever=rf_learner,
140-
retriever_id="intfloat/e5-base-v2", # or "Qwen/Qwen3-Embedding-4B" if you have sufficient GPU memory
141-
ontologizer_data=True, # True if data is already {"term": ..., "types": [...], ...}
142-
device="cpu",
143-
batch_size=16,
144-
)
145-
146-
# Run the full learning pipeline on the term-typing task
147-
outputs = pipe(
148-
train_data=train_data,
149-
test_data=test_data,
150-
task="term-typing",
151-
evaluate=True,
152-
ontologizer_data=True,
153-
)
154-
155-
# Display evaluation summary and runtime
156-
print("Metrics:", outputs.get("metrics"))
157-
print("Elapsed time:", outputs["elapsed_time"])
158-
print(outputs)
159-
160-
161111
Term Typing (RAG-based)
162112
-----------------------
163113

@@ -168,7 +118,7 @@ The RAG-based term-typing setup also uses GeoNames. We again load the ontology a
168118

169119
.. code-block:: python
170120
171-
from ontolearner import GeoNames, train_test_split, evaluation_report
121+
from ontolearner import GeoNames, train_test_split
172122
173123
ontology = GeoNames()
174124
ontology.load()
@@ -190,15 +140,15 @@ The task IDs are: ``term-typing``, ``taxonomy-discovery``, ``non-taxonomic-re``.
190140

191141
.. code-block:: python
192142
193-
from ontolearner.learner.term_typing import AlexbekRAGLearner
194-
195143
task = "term-typing"
196144
197145
Next, we configure a Retrieval-Augmented Generation (RAG) term-typing classifier.
198146
An encoder retrieves top-k similar training examples, and a generative LLM predicts types conditioned on the query term plus retrieved examples.
199147

200148
.. code-block:: python
201149
150+
from ontolearner.learner.term_typing import AlexbekRAGLearner
151+
202152
rag_learner = AlexbekRAGLearner(
203153
llm_model_id="Qwen/Qwen2.5-0.5B-Instruct",
204154
retriever_model_id="sentence-transformers/all-MiniLM-L6-v2",
@@ -211,6 +161,13 @@ An encoder retrieves top-k similar training examples, and a generative LLM predi
211161
# Load the underlying LLM and retriever for RAG-based term typing
212162
rag_learner.load(llm_id=rag_learner.llm_model_id)
213163
164+
Learn and Predict
165+
~~~~~~~~~~~~~~~~~~~~~
166+
167+
.. code-block:: python
168+
169+
from ontolearner import evaluation_report
170+
214171
# Index the training data for retrieval and prepare prompts
215172
rag_learner.fit(train_data, task=task)
216173
@@ -222,59 +179,6 @@ An encoder retrieves top-k similar training examples, and a generative LLM predi
222179
metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
223180
print(metrics)
224181
225-
Pipeline Usage
226-
~~~~~~~~~~~~~~
227-
228-
We place the RAG learner in the ``llm`` slot of :class:`LearnerPipeline`.
229-
The pipeline handles retrieval, LLM calls, and evaluation end-to-end.
230-
231-
.. code-block:: python
232-
233-
# Import core modules from the OntoLearner library
234-
from ontolearner import GeoNames, train_test_split, LearnerPipeline
235-
from ontolearner.learner.term_typing import AlexbekRAGLearner
236-
237-
# Load the GeoNames ontology.
238-
ontology = GeoNames()
239-
ontology.load()
240-
241-
# Extract labeled items and split into train/test sets for evaluation
242-
train_data, test_data = train_test_split(
243-
ontology.extract(),
244-
test_size=0.2,
245-
random_state=42,
246-
)
247-
248-
# Configure a Retrieval-Augmented Generation (RAG) term-typing classifier.
249-
rag_learner = AlexbekRAGLearner(
250-
llm_model_id="Qwen/Qwen2.5-0.5B-Instruct",
251-
retriever_model_id="sentence-transformers/all-MiniLM-L6-v2",
252-
device="cuda",
253-
top_k=3,
254-
max_new_tokens=256,
255-
output_dir="./results/",
256-
)
257-
258-
# Build the pipeline and pass raw structured objects end-to-end.
259-
pipe = LearnerPipeline(
260-
llm=rag_learner,
261-
llm_id="Qwen/Qwen2.5-0.5B-Instruct",
262-
ontologizer_data=True,
263-
)
264-
265-
# Run the full learning pipeline on the term-typing task
266-
outputs = pipe(
267-
train_data=train_data,
268-
test_data=test_data,
269-
task="term-typing",
270-
evaluate=True,
271-
ontologizer_data=True,
272-
)
273-
274-
# Display the evaluation results and runtime
275-
print("Metrics:", outputs.get("metrics")) # e.g., {'precision': ..., 'recall': ..., 'f1_micro': ..., ...}
276-
print("Elapsed time (s):", outputs.get("elapsed_time"))
277-
278182
279183
Taxonomy Discovery
280184
------------------
@@ -286,7 +190,7 @@ For taxonomy discovery, we again use the GeoNames ontology. It exposes parent–
286190

287191
.. code-block:: python
288192
289-
from ontolearner import GeoNames, train_test_split, evaluation_report
193+
from ontolearner import GeoNames, train_test_split
290194
291195
ontology = GeoNames()
292196
ontology.load()
@@ -307,15 +211,15 @@ The task IDs are: ``term-typing``, ``taxonomy-discovery``, ``non-taxonomic-re``.
307211

308212
.. code-block:: python
309213
310-
from ontolearner import AlexbekCrossAttnLearner
311-
312214
task = "taxonomy-discovery"
313215
314216
Next, we configure the Alexbek cross-attention learner.
315217
It uses embeddings of type labels and a lightweight cross-attention layer to predict *is-a* relations.
316218

317219
.. code-block:: python
318220
221+
from ontolearner import AlexbekCrossAttnLearner
222+
319223
cross_learner = AlexbekCrossAttnLearner(
320224
embedding_model="sentence-transformers/all-MiniLM-L6-v2",
321225
device="cpu",
@@ -329,6 +233,13 @@ It uses embeddings of type labels and a lightweight cross-attention layer to pre
329233
seed=42,
330234
)
331235
236+
Learn and Predict
237+
~~~~~~~~~~~~~~~~~~~~~~
238+
239+
.. code-block:: python
240+
241+
from ontolearner import evaluation_report
242+
332243
# Train the cross-attention model on taxonomic edges
333244
cross_learner.fit(train_data, task=task)
334245
@@ -339,57 +250,3 @@ It uses embeddings of type labels and a lightweight cross-attention layer to pre
339250
truth = cross_learner.tasks_ground_truth_former(data=test_data, task=task)
340251
metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
341252
print(metrics)
342-
343-
Pipeline Usage
344-
~~~~~~~~~~~~~~
345-
346-
Here, :class:`LearnerPipeline` trains the cross-attention model on train edges, predicts taxonomic relations on the test set, and reports evaluation metrics.
347-
348-
.. code-block:: python
349-
350-
from ontolearner import GeoNames, train_test_split, LearnerPipeline
351-
from ontolearner.learner.taxonomy_discovery import AlexbekCrossAttnLearner
352-
353-
# Load & split
354-
ontology = GeoNames()
355-
ontology.load()
356-
data = ontology.extract()
357-
train_data, test_data = train_test_split(
358-
data,
359-
test_size=0.2,
360-
random_state=42,
361-
)
362-
363-
# Configure the cross-attention learner
364-
cross_learner = AlexbekCrossAttnLearner(
365-
embedding_model="sentence-transformers/all-MiniLM-L6-v2",
366-
device="cpu",
367-
num_heads=8,
368-
lr=5e-5,
369-
weight_decay=0.01,
370-
num_epochs=1,
371-
batch_size=256,
372-
neg_ratio=1.0,
373-
output_dir="./results/crossattn/",
374-
seed=42,
375-
)
376-
377-
# Build pipeline
378-
pipeline = LearnerPipeline(
379-
llm=cross_learner, # cross-attention learner
380-
llm_id="cross-attn", # label for bookkeeping
381-
ontologizer_data=False,
382-
)
383-
384-
# Train + predict + evaluate
385-
outputs = pipeline(
386-
train_data=train_data,
387-
test_data=test_data,
388-
task="taxonomy-discovery",
389-
evaluate=True,
390-
ontologizer_data=False,
391-
)
392-
393-
print("Metrics:", outputs.get("metrics"))
394-
print("Elapsed time:", outputs["elapsed_time"])
395-
print(outputs)

0 commit comments

Comments
 (0)