sciknoworg
diff --git a/‎docs/source/index.rst‎
Lines changed: 2 additions & 1 deletion b/‎docs/source/index.rst‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/source/learners/images/alexbek-learner.png‎
108 KB b/‎docs/source/learners/images/alexbek-learner.png‎
108 KB
diff --git a/‎docs/source/learners/images/challenge-logo.png‎
4.25 MB b/‎docs/source/learners/images/challenge-logo.png‎
4.25 MB
diff --git a/‎docs/source/learners/images/rwth-dbis-learner.png‎
112 KB b/‎docs/source/learners/images/rwth-dbis-learner.png‎
112 KB
diff --git a/‎docs/source/learners/llms4ol.rst‎
Lines changed: 81 additions & 0 deletions b/‎docs/source/learners/llms4ol.rst‎
Lines changed: 81 additions & 0 deletions
diff --git a/‎docs/source/learners/llms4ol_challenge/alexbek_learner.rst‎
Lines changed: 252 additions & 0 deletions b/‎docs/source/learners/llms4ol_challenge/alexbek_learner.rst‎
Lines changed: 252 additions & 0 deletions
@@ -197,13 +197,14 @@ or GitHub repository:
    learning_tasks/text2onto
 
 .. toctree::
-   :maxdepth: 1
+   :maxdepth: 4
    :caption: Learner Models
    :hidden:
 
    learners/llm
    learners/retrieval
    learners/rag
+   learners/llms4ol
 
 .. toctree::
    :maxdepth: 4
 
@@ -0,0 +1,81 @@
+
+.. sidebar:: Challenge Series Websites
+
+	* `1st LLMs4OL @ ISWC 2024 <https://sites.google.com/view/llms4ol>`_
+	* `2nd LLMs4OL @ ISWC 2025 <https://sites.google.com/view/llms4ol2025>`_
+
+
+.. raw:: html
+
+   <div align="center">
+     <img src="https://raw.githubusercontent.com/sciknoworg/OntoLearner/refs/heads/dev/docs/source/learners/images/challenge-logo.png" alt="challenge-logo" width="10%"/>
+   </div>
+
+LLMs4OL Challenge
+==================================================================================================================
+
+
+
+
+LLMs4OL is a community development initiative collocated with the International Semantic Web Conference (ISWC) to explore the potential of Large Language Models (LLMs) in Ontology Learning (OL), a vital process for enhancing the web with structured knowledge to improve interoperability. By leveraging LLMs, the challenge aims to advance understanding and innovation in OL, aligning with the goals of the Semantic Web to create a more intelligent and user-friendly web.
+
+
+.. list-table::
+   :widths: 20 20 60
+   :header-rows: 1
+
+   * - **Edition**
+     - **Task**
+     - **Description**
+   * - ``LLMs4OL'25``
+     - **Text2Onto**
+     - Extract ontological terms and types from unstructured text.
+
+       **ID**: ``text-to-onto``
+
+       **Info**: This task focuses on extracting foundational elements (Terms and Types) from unstructured text documents to build the initial structure of an ontology. It involves recognizing domain-relevant vocabulary (Term Extraction, SubTask 1) and categorizing it appropriately (Type Extraction, SubTask 2). It bridges the gap between natural language and structured knowledge representation.
+
+       **Example**: **COVID-19** is a term of the type **Disease**.
+   * - ``LLMs4OL'24``, ``LLMs4OL'25``
+     - **Term Typing**
+     - Discover the generalized type for a lexical term.
+
+       **ID**: ``term-typing``
+
+       **Info**: The process of assigning a generalized type to each lexical term involves mapping lexical items to their most appropriate semantic categories or ontological classes. For example, in the biomedical domain, the term ``aspirin`` should be classified under ``Pharmaceutical Drug``. This task is crucial for organizing extracted terms into structured ontologies and improving knowledge reuse.
+
+       **Example**: Assign the type ``"disease"`` to the term ``"myocardial infarction"``.
+   * - ``LLMs4OL'24``, ``LLMs4OL'25``
+     - **Taxonomy Discovery**
+     - Discover the taxonomic hierarchy between type pairs.
+
+       **ID**: ``taxonomy-discovery``
+
+       **Info**: Taxonomy discovery focuses on identifying hierarchical relationships between types, enabling the construction of taxonomic structures (i.e., ``is-a`` relationships). Given a pair of terms or types, the task determines whether one is a subclass of the other. For example, discovering that ``Sedan is a subclass of Car`` contributes to structuring domain knowledge in a way that supports reasoning and inferencing in ontology-driven applications.
+
+       **Example**: Recognize that ``"lung cancer"`` is a subclass of ``"cancer"``, which is a subclass of ``"disease"``.
+   * - ``LLMs4OL'24``, ``LLMs4OL'25``
+     - **Non-Taxonomic Relation Extraction**
+     - Identify non-taxonomic, semantic relations between types.
+
+       **ID**: ``non-taxonomic-re``
+
+       **Info**: This task aims to extract non-hierarchical (non-taxonomic) semantic relations between concepts in an ontology. Unlike taxonomy discovery, which deals with is-a relationships, this task focuses on other meaningful associations such as part-whole (part-of), causal (causes), functional (used-for), and associative (related-to) relationships. For example, in a medical ontology, discovering that ``Aspirin treats Headache`` adds valuable relational knowledge that enhances the utility of an ontology.
+
+       **Example**: Identify that *"virus"* ``causes`` *"infection"* or *"aspirin"* ``treats`` *"headache"*.
+
+
+.. note::
+
+	* Proceedings of 1st LLMs4OL Challenge @ ISWC 2024 available at `https://www.tib-op.org/ojs/index.php/ocp/issue/view/169 <https://www.tib-op.org/ojs/index.php/ocp/issue/view/169>`_
+	* Proceedings of 2nd LLMs4OL Challenge @ ISWC 2025 available at `https://www.tib-op.org/ojs/index.php/ocp/issue/view/185 <https://www.tib-op.org/ojs/index.php/ocp/issue/view/185>`_
+
+.. toctree::
+   :maxdepth: 1
+   :caption: LLMs4OL Challenge Series Participants Learners
+   :titlesonly:
+
+   llms4ol_challenge/rwthdbis_learner
+   llms4ol_challenge/skhnlp_learner
+   llms4ol_challenge/alexbek_learner
+   llms4ol_challenge/sbunlp_learner
@@ -0,0 +1,252 @@
+Alexbek Learner
+================
+
+.. sidebar:: Alexbek Learner Examples
+
+   * Text2Onto: `llm_learner_alexbek_text2onto.py <https://github.com/sciknoworg/OntoLearner/blob/main/examples/llm_learner_alexbek_text2onto.py>`_
+   * Term Typing: `llm_learner_alexbek_rf_term_typing.py <https://github.com/sciknoworg/OntoLearner/blob/main/examples/llm_learner_alexbek_rf_term_typing.py>`_
+   * Taxonomy Discovery: `llm_learner_alexbek_cross_attn_taxonomy_discovery.py <https://github.com/sciknoworg/OntoLearner/blob/main/examples/llm_learner_alexbek_cross_attn_taxonomy_discovery.py>`_
+
+The team presented a comprehensive system for addressing Tasks A, B, and C of the LLMs4OL 2025 challenge, which together span the full ontology construction pipeline: term extraction, typing, and taxonomy discovery. Their approach combines retrieval-augmented prompting, zero-shot classification, and attention-based graph modeling — each tailored to the demands of the respective task.
+
+.. note::
+
+	Read more about the model at `Alexbek at LLMs4OL 2025 Tasks A, B, and C: Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies) <https://www.tib-op.org/ojs/index.php/ocp/article/view/2899>`_.
+
+.. hint::
+
+	The original implementation is available at `https://github.com/BelyaevaAlex/LLMs4OL-Challenge-Alexbek <https://github.com/BelyaevaAlex/LLMs4OL-Challenge-Alexbek>`_ repository.
+
+Overview
+---------------------------------
+
+.. raw:: html
+
+   <div align="center">
+     <img src="https://raw.githubusercontent.com/sciknoworg/OntoLearner/refs/heads/dev/docs/source/learners/images/alexbek-learner.png" alt="Alexbek Team" width="90%"/>
+   </div>
+   <br>
+
+For **Task A (Text2Onto)**, they jointly extract domain-specific terms and their ontological types using a retrieval-augmented generation (RAG) pipeline. Training data is reformulated into a correspondence between documents, terms, and types, while test-time inference leverages semantically similar training examples. This single-pass method requires no model fine-tuning and leverages lexical augmentation. For **Task B (Term Typing)**, which involves assigning types to given terms, they adopt a dual strategy. In the few-shot setting (for domains with labeled training data), they reuse the RAG scheme with few-shot prompting. In the zero-shot or label-scarce setting, they use a classifier that combines cosine similarity scores from multiple embedding models using confidence-based weighting (e.g., via random forests or RAG-style retrieval). For **Task C (Taxonomy Discovery)**, they model taxonomy discovery as graph inference. Using embeddings of type labels, they train a lightweight cross-attention layer to predict *is-a* relations by approximating a soft adjacency matrix.
+
+Methodological Summary:
+
+1. **Retrieval-Augmented Text2Onto.** Training data is restructured into document–term–type correspondences. At inference time, the system retrieves semantically similar training examples and feeds them, together with the query document, into a small generative LLM to jointly predict candidate terms and their types.
+
+2. **Hybrid Term Typing.**
+
+   * **Random-Forest Variant.** Uses dense text embeddings (and optionally graph-based features from the ontology) as input to a random-forest classifier, producing multi-label type assignments per term.
+   * **RAG-Based Variant.** Combines a bi-encoder retriever with a generative LLM: for each query term, top-*k* labeled examples are retrieved and concatenated into the prompt. The LLM then predicts types in a structured format (e.g., JSON), which are parsed and evaluated.
+
+3. **Cross-Attention Taxonomy Discovery.** Type labels (or term representations) are embedded using a sentence-transformer model and passed through a lightweight cross-attention layer. The resulting network approximates a soft adjacency matrix over types and is trained to distinguish positive (true parent–child) from negative (corrupted) edges.
+
+
+Term Typing (Random-Forest)
+---------------------------
+
+Loading Ontological Data
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+For term typing, we use GeoNames as an example ontology. Labeled term–type pairs are extracted and split into train and test sets.
+
+.. code-block:: python
+
+   from ontolearner import GeoNames, train_test_split
+
+   # Load the GeoNames ontology and extract labeled term-typing data
+   ontology = GeoNames()
+   ontology.load()
+   data = ontology.extract()
+
+   # Split the labeled term-typing data into train and test sets
+   train_data, test_data = train_test_split(
+       data,
+       test_size=0.2,
+       random_state=42,
+   )
+
+Initialize Learner
+~~~~~~~~~~~~~~~~~~
+
+Before defining the learner, choose the ontology learning task to perform.
+Available tasks have been described in `LLMs4OL Paradigms <https://ontolearner.readthedocs.io/learning_tasks/llms4ol.html>`_.
+The task IDs are: ``term-typing``, ``taxonomy-discovery``, ``non-taxonomic-re``.
+
+.. code-block:: python
+
+   task = "term-typing"
+
+We first configure the Alexbek random-forest learner.
+This learner builds features from text embeddings (and optionally graph structure) and trains a random-forest classifier for term typing.
+
+.. code-block:: python
+
+   from ontolearner.learner.term_typing import AlexbekRFLearner
+
+   rf_learner = AlexbekRFLearner(
+       device="cpu",           # switch to "cuda" if available
+       batch_size=16,
+       max_length=512,         # max tokenizer length for embedding inputs
+       threshold=0.30,         # probability cutoff for assigning each type
+       use_graph_features=True # set False for pure text-based features
+   )
+
+Learn and Predict
+~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+   from ontolearner import evaluation_report
+   # Fit the RF-based learner on the training split
+   rf_learner.fit(train_data, task=task)
+
+   # Predict types for the held-out test terms
+   predicts = rf_learner.predict(test_data, task=task)
+
+   # Build gold labels and evaluate
+   truth = rf_learner.tasks_ground_truth_former(data=test_data, task=task)
+   metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
+   print(metrics)
+
+Term Typing (RAG-based)
+-----------------------
+
+Loading Ontological Data
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The RAG-based term-typing setup also uses GeoNames. We again load the ontology and split labeled term–type instances into train and test sets.
+
+.. code-block:: python
+
+   from ontolearner import GeoNames, train_test_split
+
+   ontology = GeoNames()
+   ontology.load()
+   data = ontology.extract()
+
+   # Extract labeled items and split into train/test sets for evaluation
+   train_data, test_data = train_test_split(
+       data,
+       test_size=0.2,
+       random_state=42,
+   )
+
+Initialize Learner
+~~~~~~~~~~~~~~~~~~
+
+Before defining the learner, choose the ontology learning task to perform.
+Available tasks have been described in `LLMs4OL Paradigms <https://ontolearner.readthedocs.io/learning_tasks/llms4ol.html>`_.
+The task IDs are: ``term-typing``, ``taxonomy-discovery``, ``non-taxonomic-re``.
+
+.. code-block:: python
+
+   task = "term-typing"
+
+Next, we configure a Retrieval-Augmented Generation (RAG) term-typing classifier.
+An encoder retrieves top-k similar training examples, and a generative LLM predicts types conditioned on the query term plus retrieved examples.
+
+.. code-block:: python
+
+   from ontolearner.learner.term_typing import AlexbekRAGLearner
+
+   rag_learner = AlexbekRAGLearner(
+       llm_model_id="Qwen/Qwen2.5-0.5B-Instruct",
+       retriever_model_id="sentence-transformers/all-MiniLM-L6-v2",
+       device="cuda",      # or "cpu"
+       top_k=3,
+       max_new_tokens=256,
+       output_dir="./results/",
+   )
+
+   # Load the underlying LLM and retriever for RAG-based term typing
+   rag_learner.load(llm_id=rag_learner.llm_model_id)
+
+Learn and Predict
+~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+   from ontolearner import evaluation_report
+
+   # Index the training data for retrieval and prepare prompts
+   rag_learner.fit(train_data, task=task)
+
+   # Predict types for the held-out test terms
+   predicts = rag_learner.predict(test_data, task=task)
+
+   # Build gold labels and evaluate
+   truth = rag_learner.tasks_ground_truth_former(data=test_data, task=task)
+   metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
+   print(metrics)
+
+
+Taxonomy Discovery
+------------------
+
+Loading Ontological Data
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+For taxonomy discovery, we again use the GeoNames ontology. It exposes parent–child relations that can be embedded and fed to a cross-attention model.
+
+.. code-block:: python
+
+   from ontolearner import GeoNames, train_test_split
+
+   ontology = GeoNames()
+   ontology.load()
+   data = ontology.extract()
+
+   train_data, test_data = train_test_split(
+       data,
+       test_size=0.2,
+       random_state=42,
+   )
+
+Initialize Learner
+~~~~~~~~~~~~~~~~~~
+
+Before defining the learner, choose the ontology learning task to perform.
+Available tasks have been described in `LLMs4OL Paradigms <https://ontolearner.readthedocs.io/learning_tasks/llms4ol.html>`_.
+The task IDs are: ``term-typing``, ``taxonomy-discovery``, ``non-taxonomic-re``.
+
+.. code-block:: python
+
+   task = "taxonomy-discovery"
+
+Next, we configure the Alexbek cross-attention learner.
+It uses embeddings of type labels and a lightweight cross-attention layer to predict *is-a* relations.
+
+.. code-block:: python
+
+   from ontolearner import AlexbekCrossAttnLearner
+
+   cross_learner = AlexbekCrossAttnLearner(
+       embedding_model="sentence-transformers/all-MiniLM-L6-v2",
+       device="cpu",
+       num_heads=8,
+       lr=5e-5,
+       weight_decay=0.01,
+       num_epochs=1,
+       batch_size=256,
+       neg_ratio=1.0,
+       output_dir="./results/crossattn/",
+       seed=42,
+   )
+
+Learn and Predict
+~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+   from ontolearner import evaluation_report
+
+   # Train the cross-attention model on taxonomic edges
+   cross_learner.fit(train_data, task=task)
+
+   # Predict taxonomic relations on the test set
+   predicts = cross_learner.predict(test_data, task=task)
+
+   # Build gold labels and evaluate
+   truth = cross_learner.tasks_ground_truth_former(data=test_data, task=task)
+   metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
+   print(metrics)