Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
b6fc1ee
Merge pull request #268 from sciknoworg/dev
HamedBabaei Aug 22, 2025
c93d0c4
:bookmark: v1.4.0
HamedBabaei Aug 22, 2025
11ad22e
:sparkles: update automated taging
HamedBabaei Aug 22, 2025
a5e0ead
:bug: fix metadata commit
HamedBabaei Aug 22, 2025
8777c16
:bug: fix metadata release
HamedBabaei Aug 22, 2025
a819812
:bookmark: v1.4.1
HamedBabaei Aug 22, 2025
b6bcb69
:bug: repo push
HamedBabaei Aug 22, 2025
50d2d2a
:bug: repo push
HamedBabaei Aug 22, 2025
d1a418a
:bug: add repo push
HamedBabaei Aug 22, 2025
f7943a4
:bug: add repo push
HamedBabaei Aug 22, 2025
2855213
:pencil2: add remote set-url
HamedBabaei Aug 22, 2025
9b29724
:pencil2: add token
HamedBabaei Aug 22, 2025
70f4767
:pencil2: PR automation for metadata
HamedBabaei Aug 22, 2025
2863dea
:pencil2: skip auto-update branch
HamedBabaei Aug 22, 2025
4ac970c
:pencil2: add auto-update merge
HamedBabaei Aug 22, 2025
6f9ee67
:bug: fix peter-evans pull and merge request
HamedBabaei Aug 22, 2025
d69b833
:pencil2: merge only workflow PR
HamedBabaei Aug 22, 2025
14cd3c3
:pencil2:
HamedBabaei Aug 22, 2025
087224e
:pencil2:
HamedBabaei Aug 23, 2025
a4350b3
:pencil2:
HamedBabaei Aug 23, 2025
314e3f4
:pencil2: fix PR and add delete to auto-update branch
HamedBabaei Aug 24, 2025
8a167f9
:pencil2: minor fix
HamedBabaei Aug 24, 2025
f91ad2d
:bookmark: Update metadata after release (#269)
HamedBabaei Aug 24, 2025
301250b
:bookmark: Update metadata after release (#270)
HamedBabaei Aug 24, 2025
1aed3ab
:bookmark: Update metadata after release (#271)
HamedBabaei Aug 24, 2025
62716f6
:pencil2: fix auto delete
HamedBabaei Aug 24, 2025
c758419
Merge remote-tracking branch 'origin/main'
HamedBabaei Aug 24, 2025
0514a82
:pencil2: update maintenance plan
HamedBabaei Sep 1, 2025
86e2766
:sparkles: update requirements
HamedBabaei Sep 1, 2025
4e543ab
:bookmark: v1.4.2
HamedBabaei Sep 1, 2025
6658155
Merge remote-tracking branch 'origin/main'
HamedBabaei Sep 1, 2025
f0a4326
:pencil2: update requirements
HamedBabaei Sep 1, 2025
0445926
:bookmark: Update metadata after release (#272)
HamedBabaei Sep 1, 2025
e0e44d5
:sparkles: update library dependencies/ GPU&CPU installation
HamedBabaei Sep 7, 2025
832ae50
:sparkles: update library dependencies
HamedBabaei Sep 7, 2025
2dcd294
:sparkles: update library dependencies
HamedBabaei Sep 7, 2025
90e7de8
:bug: add explicit priority to torch
HamedBabaei Sep 7, 2025
2dc0cb6
:bug: add default priority to torch
HamedBabaei Sep 7, 2025
d4dd11b
:bug: fix typo
HamedBabaei Sep 7, 2025
a723d4f
:bug: revert back changes
HamedBabaei Sep 7, 2025
a6b708b
:bug: bug fix in learner
HamedBabaei Sep 7, 2025
919ad60
:bookmark: Update metadata after release (#273)
HamedBabaei Sep 7, 2025
6d0f49c
:memo: fix typo
HamedBabaei Sep 7, 2025
d0c37e6
:bookmark: v1.4.3
HamedBabaei Sep 7, 2025
e904a1e
:pencil2: add torch versioning to setups
HamedBabaei Sep 7, 2025
63de7e5
Merge remote-tracking branch 'origin/main'
HamedBabaei Sep 7, 2025
dc1c8d1
:bookmark: Update metadata after release (#274)
HamedBabaei Sep 9, 2025
4b6044a
:pencil2: add trust_remote_code=True for retrievers
HamedBabaei Sep 9, 2025
055b2e5
:bookmark: v1.4.4
HamedBabaei Sep 9, 2025
7f4f86e
:sparkles: add batch retriever
HamedBabaei Sep 16, 2025
7f17143
:memo: add short note on batch retriever
HamedBabaei Sep 16, 2025
ab98645
:pencil2: minor fix
HamedBabaei Sep 16, 2025
8b4919a
Merge remote-tracking branch 'origin/main'
HamedBabaei Sep 16, 2025
5a391d5
:bookmark: v1.4.5
HamedBabaei Sep 16, 2025
3ca7b46
:bookmark: Update metadata after release (#275)
HamedBabaei Sep 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 42 additions & 15 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
name: Publish Python Package
permissions:
contents: write
pull-requests: write

on:
push:
Expand All @@ -13,6 +16,8 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
persist-credentials: false # important to use PAT for pushing

- name: Set up Python
uses: actions/setup-python@v4
Expand All @@ -24,31 +29,53 @@ jobs:
curl -sSL https://install.python-poetry.org | python3 -
echo "export PATH=\"$HOME/.local/bin:$PATH\"" >> $GITHUB_ENV

- name: Install poetry-dynamic-versioning plugin
run: poetry self add "poetry-dynamic-versioning[plugin]"

- name: Install dependencies
run: poetry install --no-interaction --no-ansi

- name: Build the package
run: poetry build

- name: Configure Poetry for PyPI
# Generate metadata after publishing
- name: Generate Dublin Core metadata
run: |
poetry config pypi-token.pypi ${{ secrets.TWINE_API_TOKEN }}
mkdir -p metadata
poetry run python -c "from ontolearner import OntoLearnerMetadataExporter; OntoLearnerMetadataExporter().export('metadata/ontolearner-metadata.rdf')"

- name: Publish to PyPI
- name: Create and update Pull Request
id: cpr
uses: peter-evans/create-pull-request@v7
with:
token: ${{ secrets.REPO_PUSH_TOKEN }}
branch: auto-update
base: main
commit-message: ":bookmark: Update metadata after release"
title: "🤖 Automated metadata update"
body: "This PR updates the Dublin Core metadata after release."
add-paths: |
metadata/ontolearner-metadata.rdf

# Automatically merge the PR if possible
- name: Auto-merge PR
if: steps.cpr.outputs.pull-request-operation == 'created'
uses: peter-evans/enable-pull-request-automerge@v3
with:
token: ${{ secrets.REPO_PUSH_TOKEN }}
pull-request-number: ${{ steps.cpr.outputs.pull-request-number }}
merge-method: squash

- name: Delete auto-update branch
if: steps.cpr.outputs.pull-request-operation == 'created'
run: |
poetry publish --no-interaction --no-ansi
git remote set-url origin https://x-access-token:${{ secrets.REPO_PUSH_TOKEN }}@github.com/${{ github.repository }}
git push origin --delete auto-update

# 🔹 NEW STEP: Generate metadata after publishing
- name: Generate Dublin Core metadata
- name: Configure Poetry for PyPI
run: |
mkdir -p metadata
poetry run python -c "from ontolearner import OntoLearnerMetadataExporter; OntoLearnerMetadataExporter().export('metadata/ontolearner-metadata.rdf')"
poetry config pypi-token.pypi ${{ secrets.TWINE_API_TOKEN }}

# 🔹 Commit metadata back to repo
- name: Commit and push metadata
- name: Publish to PyPI
run: |
git config --global user.name "github-actions[bot]"
git config --global user.email "github-actions[bot]@users.noreply.github.com"
git add metadata/
git commit -m ":bookmark: Update metadata after release"
git push origin HEAD:main
poetry publish --no-interaction --no-ansi
13 changes: 11 additions & 2 deletions .github/workflows/test-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,17 @@ name: Test OntoLearner Package

on:
push:
branches: [main]
branches:
- main
- '!auto-update'
pull_request:
branches: [main]
branches:
- main

jobs:
build-and-test:
if: github.head_ref != 'auto-update'

runs-on: ubuntu-latest

strategy:
Expand All @@ -28,6 +33,10 @@ jobs:
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Install poetry-dynamic-versioning plugin
run: |
poetry self add "poetry-dynamic-versioning[plugin]"

- name: Configure Poetry and install dependencies
run: |
poetry config virtualenvs.create false
Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
## Changelog

### v1.4.5 (September 16, 2025)
- add batch retriever feature to `AutoRetrieverLearner`


### v1.4.4 (September 9, 2025)
- add `trust_remote_code=True` for retrievers like Nomic-AI

### v1.4.3 (September 7, 2025)
- Update dependencies
- fix bug in learner
- cosmetic fix to the docs

### v1.4.2 (September 1, 2025)
- fix dependency issue for torch and transformers.
- update maintenance plan

### v1.4.1 (August 22, 2025)
- added ontolearner-metadata CI/CD based build.

### v1.4.0 (August 22, 2025)
- added dublin core metadata exporter
- added ontolearner metadata documentation
- added `VERSION` file for versioning

### v1.3.1 (August 13, 2025)
- `Processor` module is operational. Fixed with ease of use principles.
- The huggingface readme files template are updated.
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,5 @@ keywords:
- Large Language Models
- Text-to-ontology
license: MIT
version: 1.3.1
version: 1.4.5
date-released: '2025'
21 changes: 11 additions & 10 deletions MAINTENANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,17 @@ A core team will be responsible for the ongoing maintenance of OntoLearner, incl

A roadmap for new features and improvements, ensuring the library evolves in response to user needs and feedback is presented as follows. This list will be updated regularly as we explore the variety of works within the ontology alignment field to ensure the diverse methods within the library.

| Category | Description | Status |
|:-----------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------:|
|Ontologizer| Adding more ontologies to the OntoLearner | InProgress|
| Reasoning | Integration of reasoning-oriented prompt evaluation tasks to test LLM capabilities in generating consistent and logically valid ontological structures (e.g., subclass chains, disjointness, transitivity). | TODO |
| Agentic | Support for agent-based extensions using platforms like [CrewAI](https://github.com/crewAIInc/crewAI) to enable autonomous, multi-step ontology engineering workflows coordinated through modular agents. | TODO |
|Documentation| Adding more documentation and tutorials | InProgress|
|Testing| Adding unittest to support different stages of modularization | InProgress|
|Learner| Incorporating more learner models. Including those from LLMs4OL challenge | InProgress|
|Reasoning| Adding reasoning techniques | To-Do|
|...| ... |...|
| Category | Description | Status |
|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
| Ontologizer | Adding more ontologies to the OntoLearner | In Progress |
| Reasoning | Integration of reasoning-oriented prompt evaluation tasks to test LLM capabilities in generating consistent and logically valid ontological structures (e.g., subclass chains, disjointness, transitivity). | TODO |
| Ontology Search | Enabling search across relations, individuals, and axioms for enhanced exploration and debugging of ontologies. | TODO |
| Agentic | Support for agent-based extensions using platforms like [CrewAI](https://github.com/crewAIInc/crewAI) to enable autonomous, multi-step ontology engineering workflows coordinated through modular agents. | TODO |
| Documentation | Adding more documentation and tutorials | In Progress |
| Testing | Adding unittest to support different stages of modularization | In Progress |
| Learner | Incorporating more learner models, including those from the LLMs4OL 2024 challenge (to be put into action) and 2025 challenge (to be integrated). | In Progress |
| UI / Visualization | Developing user interfaces for interactive exploration and visualization of ontologies. | TODO |
| ...| ....|...|

> **If you are willing to have your Ontology Learning model or feature within OntoLearner don't hesitate to contact us via [GitHub Issues](https://github.com/sciknoworg/ontolearner/issues) or via email to [[email protected]](mailto:[email protected])**.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/learners/rag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ We start by importing necessary components from the ontolearner package, loading
# Load the AgrO ontology (an agricultural domain ontology)
ontology = AgrO()
ontology.load()
ontological_data = ontology.extract(),
ontological_data = ontology.extract()

# Extract structured data from the ontology and split into train/test sets
train_data, test_data = train_test_split(
Expand Down Expand Up @@ -111,7 +111,7 @@ You initialize the ``LearnerPipeline`` by directly providing the ``retriever_id`
# Load the AgrO ontology, which contains concepts related to wines, their properties, and categories
ontology = AgrO()
ontology.load() # Load entities, types, and structured term annotations from the ontology
ontological_data = ontology.extract(),
ontological_data = ontology.extract()
# Extract term-typing instances and split into train and test sets
train_data, test_data = train_test_split(
ontological_data,
Expand Down
8 changes: 8 additions & 0 deletions docs/source/learners/retrieval.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,14 @@ You will see a evaluations results.
* T5 models (e.g., "google/flan-t5-base")
* Nomic-AI models

When working with large contexts, the retriever model may encounter memory issues. To address this, OntoLearner’s ``AutoRetrieverLearner`` provides a ``batch_size`` argument. By setting this, the retriever computes similarities in smaller batches instead of calculating the full cosine similarity across all stored knowledge embeddings at once, reducing memory usage and improving efficiency. To use this, simply:

.. code-block:: python

ret_learner = AutoRetrieverLearner(top_k=5, batch_size=1024)



Pipeline Usage
-----------------------

Expand Down
22 changes: 11 additions & 11 deletions docs/source/ontologizer/metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Metadata

.. note::

OntoLearner Metadata will be created automatically at Github under `metadata/ <https://github.com/sciknoworg/OntoLearner/tree/main/metadata>`_ directory, and it is available for download after ``ontolearner > 1.3.1`` also at `Releases <https://github.com/sciknoworg/OntoLearner/releases>`_ per release.
OntoLearner Metadata will be created automatically at Github under `metadata/ <https://github.com/sciknoworg/OntoLearner/tree/main/metadata>`_ directory, and it is available for download after ``ontolearner > 1.4.0`` also at `Releases <https://github.com/sciknoworg/OntoLearner/releases>`_ per release.

.. hint::

Expand Down Expand Up @@ -31,7 +31,7 @@ The ``OntoLearnerMetadataExporter`` is a utility class for generating **Dublin C
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<!-- Top-level collection -->
<ontologizer:Collection rdf:about="https://ontolearner.readthedocs.io/benchmarking/">
<ontologizer:Collection rdf:about="https://ontolearner.readthedocs.io/benchmarking/benchmark.html">
<dc:title>OntoLearner Benchmark Ontologies</dc:title>
<dc:description>This Dublin Core metadata collection describes ontologies benchmarked in OntoLearner. It includes information such as title, creator, format, license, and version.</dc:description>
<dc:creator>OntoLearner Team</dc:creator>
Expand Down Expand Up @@ -74,7 +74,7 @@ The following table summarizes the key **Dublin Core metadata properties** captu
- NCI Thesaurus (NCIt)
- Ontology full name
* - ``dcterms:description``
- See above example RDF structure
- NCI Thesaurus (NCIt) is a reference terminology that includes broad coverage of the cancer domain...
- Detailed ontology description
* - ``dcterms:creator``
- NCI
Expand All @@ -89,7 +89,7 @@ The following table summarizes the key **Dublin Core metadata properties** captu
- Creative Commons 4.0
- License information
* - ``dcterms:source``
- URL
- `https://terminology.tib.eu/ts/ontologies/NCIT <https://terminology.tib.eu/ts/ontologies/NCIT>`_
- Download or reference URL
* - ``dcterms:subject``
- Medicine
Expand All @@ -102,13 +102,13 @@ The following represents the benchmark collection info. The `dcterms:hasVersion`

.. code-block:: xml

<ontologizer:Collection rdf:about="https://ontolearner.readthedocs.io/benchmarking/">
<dc:title>OntoLearner Benchmark Ontologies</dc:title>
<dc:description>This Dublin Core metadata collection describes ontologies benchmarked in OntoLearner. It includes information such as title, creator, format, license, and version.</dc:description>
<dc:creator>OntoLearner Team</dc:creator>
<dcterms:license>MIT License</dcterms:license>
<dcterms:hasVersion>1.4.0</dcterms:hasVersion>
</ontologizer:Collection>
<ontologizer:Collection rdf:about="https://ontolearner.readthedocs.io/benchmarking/benchmark.html">
<dc:title>OntoLearner Benchmark Ontologies</dc:title>
<dc:description>This Dublin Core metadata collection describes ontologies benchmarked in OntoLearner. It includes information such as title, creator, format, license, and version.</dc:description>
<dc:creator>OntoLearner Team</dc:creator>
<dcterms:license>MIT License</dcterms:license>
<dcterms:hasVersion>1.4.0</dcterms:hasVersion>
</ontologizer:Collection>

Exporter
--------------------
Expand Down
1 change: 1 addition & 0 deletions examples/retriever_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
train_data, test_data = train_test_split(ontology.extract(), test_size=0.2, random_state=42)

# Initialize a retriever-style learner for relation extraction tasks
# batch_size is being used inside the AutoRetrieverLearner to allow for larger KB retrieval!
ret_learner = AutoRetrieverLearner(top_k=5)

# Load a pre-trained retriever model using its identifier
Expand Down
Loading