Skip to content

Commit 750f5d3

Browse files
authored
Merge pull request #243 from sciknoworg/dev
add v1.1.1
2 parents fe3777d + 3036c3e commit 750f5d3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+1392
-401
lines changed

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
## Changelog
22

3+
### v1.1.1 (May 27, 2025)
4+
- add HF documentation
5+
- add license headers
6+
- refactor documentations
7+
- improve hf layout
8+
- add examples
39

4-
### v1.1.0 (May 13, 2025)
10+
### v1.1.0 (May 21, 2025)
511
- Version changes
612
- Refactor documentations
713
- Add Readme

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,5 +31,5 @@ keywords:
3131
- Large Language Models
3232
- Text-to-ontology
3333
license: MIT
34-
version: 1.1.0
34+
version: 1.1.1
3535
date-released: '2025'

README.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ print(ontolearner.__version__)
4747
## 🚀 Quick Tour
4848
Get started with OntoLearner in just a few lines of code. This guide demonstrates how to initialize ontologies, load datasets, and train an LLM-assisted learner for ontology engineering tasks.
4949

50-
**Basic Usage**:
50+
**Basic Usage - Automatic Download from Hugging Face**:
5151
```python
5252
from ontolearner.ontology import Wine
5353

@@ -61,6 +61,17 @@ ontology.load()
6161
data = ontology.extract()
6262
```
6363

64+
**Basic Usage - Manual Download from Hugging Face**:
65+
```python
66+
from ontolearner.ontology import Wine
67+
68+
# 1. Initialize an ontologizer from OntoLearner
69+
ontology = Wine()
70+
71+
# 2. Download the ontology from Hugging Face
72+
file_path = ontology.from_huggingface()
73+
```
74+
6475
**LLM-Based Learning Pipeline**:
6576
```python
6677
from ontolearner import ontology, utils, learner
@@ -98,11 +109,9 @@ rag_learner.fit(train_data=train_data, task="term-typing")
98109
predicted = rag_learner.predict(test_data, task="term-typing")
99110
```
100111

101-
102-
103112
## ⭐ Contribution
104113

105-
We welcome contributions to enhance OntoLearner and make it even better! Please review our contribution guidelines in [CONTRIBUTING.md](CONTRIBUTING.md) before getting started.You are also welcome to assist with the ongoing maintenance by referring to [MAINTENANCE.md](MAINTENANCE.md). Your support is greatly appreciated.
114+
We welcome contributions to enhance OntoLearner and make it even better! Please review our contribution guidelines in [CONTRIBUTING.md](CONTRIBUTING.md) before getting started. You are also welcome to assist with the ongoing maintenance by referring to [MAINTENANCE.md](MAINTENANCE.md). Your support is greatly appreciated.
106115

107116

108117
If you encounter any issues or have questions, please submit them in the [GitHub issues tracker](https://github.com/sciknoworg/OntoLearner/issues).

docs/source/huggingface.rst

Lines changed: 95 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,91 @@
1-
HuggingFace
1+
HuggingFace Integration
22
==========================
3+
OntoLearner provides seamless integration with Hugging Face,
4+
allowing you to easily download ontologies and use pre-trained models.
5+
6+
Ontology Repositories
7+
--------------------
38
OntoLearner maintains a set of default repositories for each domain under the `SciKnowOrg` organization.
49
These repositories follow the naming pattern `SciKnowOrg/ontolearner-{domain}` and contain pre-processed ontology data.
510

6-
Basic Usage
7-
-----------
11+
Available domains include:
12+
13+
.. list-table:: OntoLearner Domain Repositories
14+
:header-rows: 1
15+
:widths: 25 15 60
16+
17+
* - Domain
18+
- Repository
19+
- Description
20+
* - Agriculture
21+
- `ontolearner-agriculture <https://huggingface.co/datasets/SciKnowOrg/ontolearner-agriculture>`_
22+
- Ontologies about farming systems, crops, food production, and agricultural vocabularies.
23+
* - Arts and Humanities
24+
- `ontolearner-arts_and_humanities <https://huggingface.co/datasets/SciKnowOrg/ontolearner-arts_and_humanities>`_
25+
- Ontologies that describe music, iconography, cultural artifacts, and humanistic content.
26+
* - Biology and Life Sciences
27+
- `ontolearner-biology_and_life_sciences <https://huggingface.co/datasets/SciKnowOrg/ontolearner-biology_and_life_sciences>`_
28+
- Ontologies about biological entities, systems, organisms, and molecular biology.
29+
* - Chemistry
30+
- `ontolearner-chemistry <https://huggingface.co/datasets/SciKnowOrg/ontolearner-chemistry>`_
31+
- Ontologies describing chemical entities, reactions, methods, and computational chemistry models.
32+
* - Ecology and Environment
33+
- `ontolearner-ecology_and_environment <https://huggingface.co/datasets/SciKnowOrg/ontolearner-ecology_and_environment>`_
34+
- Ontologies about ecological systems, environments, biomes, and sustainability science.
35+
* - Education
36+
- `ontolearner-education <https://huggingface.co/datasets/SciKnowOrg/ontolearner-education>`_
37+
- Ontologies describing learning content, educational programs, competencies, and teaching resources.
38+
* - Events
39+
- `ontolearner-events <https://huggingface.co/datasets/SciKnowOrg/ontolearner-events>`_
40+
- Ontologies for representing events, time, schedules, and calendar-based occurrences.
41+
* - Finance
42+
- `ontolearner-finance <https://huggingface.co/datasets/SciKnowOrg/ontolearner-finance>`_
43+
- Ontologies describing economic indicators, e-commerce, trade, and financial instruments.
44+
* - Food and Beverage
45+
- `ontolearner-food_and_beverage <https://huggingface.co/datasets/SciKnowOrg/ontolearner-food_and_beverage>`_
46+
- Ontologies related to food, beverages, ingredients, and culinary products.
47+
* - General Knowledge
48+
- `ontolearner-general_knowledge <https://huggingface.co/datasets/SciKnowOrg/ontolearner-general_knowledge>`_
49+
- Broad-scope ontologies and upper vocabularies used across disciplines for general-purpose semantic modeling.
50+
* - Geography
51+
- `ontolearner-geography <https://huggingface.co/datasets/SciKnowOrg/ontolearner-geography>`_
52+
- Ontologies for modeling spatial and geopolitical entities, locations, and place names.
53+
* - Industry
54+
- `ontolearner-industry <https://huggingface.co/datasets/SciKnowOrg/ontolearner-industry>`_
55+
- Ontologies describing industrial processes, smart buildings, manufacturing systems, and equipment.
56+
* - Law
57+
- `ontolearner-law <https://huggingface.co/datasets/SciKnowOrg/ontolearner-law>`_
58+
- Ontologies dealing with legal processes, regulations, and rights (e.g., copyright).
59+
* - Library and Cultural Heritage
60+
- `ontolearner-library_and_cultural_heritage <https://huggingface.co/datasets/SciKnowOrg/ontolearner-library_and_cultural_heritage>`_
61+
- Ontologies used in cataloging, archiving, and authority control of cultural and scholarly resources.
62+
* - Materials Science and Engineering
63+
- `ontolearner-materials_science_and_engineering <https://huggingface.co/datasets/SciKnowOrg/ontolearner-materials_science_and_engineering>`_
64+
- Ontologies related to materials, their structure, properties, processing, and engineering applications.
65+
* - Medicine
66+
- `ontolearner-medicine <https://huggingface.co/datasets/SciKnowOrg/ontolearner-medicine>`_
67+
- Ontologies covering clinical knowledge, diseases, drugs, treatments, and biomedical data.
68+
* - News and Media
69+
- `ontolearner-news_and_media <https://huggingface.co/datasets/SciKnowOrg/ontolearner-news_and_media>`_
70+
- Ontologies that model journalism, broadcasting, creative works, and media metadata.
71+
* - Scholarly Knowledge
72+
- `ontolearner-scholarly_knowledge <https://huggingface.co/datasets/SciKnowOrg/ontolearner-scholarly_knowledge>`_
73+
- Ontologies modeling the structure, process, and administration of scholarly research, publications, and infrastructure.
74+
* - Social Sciences
75+
- `ontolearner-social_sciences <https://huggingface.co/datasets/SciKnowOrg/ontolearner-social_sciences>`_
76+
- Ontologies for modeling societal structures, behavior, identity, and social interaction.
77+
* - Units and Measurements
78+
- `ontolearner-units_and_measurements <https://huggingface.co/datasets/SciKnowOrg/ontolearner-units_and_measurements>`_
79+
- Ontologies defining scientific units, quantities, dimensions, and observational models.
80+
* - Upper Ontology
81+
- `ontolearner-upper_ontology <https://huggingface.co/datasets/SciKnowOrg/ontolearner-upper_ontology>`_
82+
- Foundational ontologies that provide abstract concepts like objects, processes, and relations.
83+
* - Web and Internet
84+
- `ontolearner-web_and_internet <https://huggingface.co/datasets/SciKnowOrg/ontolearner-web_and_internet>`_
85+
- Ontologies that model web semantics, linked data, APIs, and online communication standards.
86+
87+
Loading Ontologies from Hugging Face
88+
-----------------------------------
889
The simplest way to load an ontology from Hugging Face:
990

1091
.. code-block:: python
@@ -13,3 +94,14 @@ The simplest way to load an ontology from Hugging Face:
1394
ontology = Wine()
1495
ontology.load() # automatically downloads from HuggingFace
1596
data = ontology.extract()
97+
98+
This will automatically download the ontology file and pre-processed datasets from the appropriate Hugging Face repository.
99+
100+
.. hint::
101+
Each ontology repository on Hugging Face includes comprehensive documentation:
102+
103+
* **README.md**: Contains information about the domain and available ontologies
104+
* **Citation Information**: How to cite the ontologies in academic work
105+
* **Usage Examples**: Code snippets showing how to use the ontologies
106+
107+
For example, see the `SciKnowOrg/ontolearner-agriculture <https://huggingface.co/datasets/SciKnowOrg/ontolearner-agriculture>`_ repository.

docs/source/learners/learner.rst

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,36 @@
11
Learners
2-
=======================================
2+
========
3+
This section presents **three minimal, runnable walk-throughs** that showcase each
4+
learner type supported by *OntoLearner*:
5+
6+
Authentication
7+
--------------
8+
Some models on Hugging Face require authentication. You can provide your Hugging Face token in several ways:
9+
1. **Environment Variable**: Set the `HUGGINGFACE_ACCESS_TOKEN` environment variable
10+
2. **Direct Parameter**: Pass the token directly to the constructor:
11+
12+
.. code-block:: python
13+
14+
llm = AutoLearnerLLM(token="your_huggingface_token")
15+
16+
3. **.env File**: Create a `.env` file with your token:
17+
18+
.. code-block:: text
19+
20+
HUGGINGFACE_ACCESS_TOKEN=your_huggingface_token
21+
22+
Then load it in your script:
23+
24+
.. code-block:: python
25+
26+
from dotenv import find_dotenv, load_dotenv
27+
_ = load_dotenv(find_dotenv())
28+
29+
30+
.. toctree::
31+
:maxdepth: 1
32+
:caption: Available tutorials
33+
34+
retrieval.rst
35+
llm.rst
36+
rag.rst

docs/source/learners/llm.rst

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,75 @@
11
Large Language Models
22
========================
3+
LLM-only learners leverage the power of large language models to perform ontology learning tasks
4+
without using retrieval components. This approach is particularly useful when you want to rely
5+
on the model's inherent knowledge rather than specific examples from the training data.
6+
7+
How LLM-only Learners Work
8+
--------------------------
9+
LLM-only learners operate by:
10+
1. **Prompting**: Formulating a task-specific prompt that describes the ontology learning task
11+
2. **Generation**: Using the LLM to generate a response based on the prompt and its pre-trained knowledge
12+
13+
The methodology behind LLM-only learners relies on the model's ability to understand and interpret
14+
ontological concepts through prompt engineering. These prompts encode domain knowledge and task requirements,
15+
guiding the model to generate structured ontological elements such as taxonomies, relations,
16+
and concept classifications. The approach leverages the fact that pre-trained LLMs
17+
have internalized substantial background knowledge about various domains during their training,
18+
which can be accessed and systematically organized through appropriate prompting strategies
19+
without explicitly retrieving external knowledge.
20+
21+
Setting Up an LLM-only Learner
22+
------------------------------
23+
Here's how to set up an LLM-only learner using the OntoLearner pipeline:
24+
25+
.. code-block:: python
26+
27+
from ontolearner.learner_pipeline import LearnerPipeline
28+
from ontolearner.learner import AutoLearnerLLM
29+
from ontolearner.ontology import Wine
30+
from ontolearner.utils.train_test_split import train_test_split
31+
32+
ontology = Wine()
33+
ontology.load()
34+
train_data, test_data = train_test_split(ontology.extract(), test_size=0.2)
35+
36+
pipeline = LearnerPipeline(
37+
task="taxonomy-discovery",
38+
llm=AutoLearnerLLM(token="your_huggingface_token"),
39+
llm_id="mistralai/Mistral-7B-Instruct-v0.1"
40+
)
41+
42+
results, metrics = pipeline.fit_predict_evaluate(
43+
train_data=train_data,
44+
test_data=test_data,
45+
test_limit=10
46+
)
47+
48+
Supported Models
49+
----------------
50+
OntoLearner supports various LLM models, including:
51+
52+
- Mistral models (e.g., "mistralai/Mistral-7B-Instruct-v0.1")
53+
- Llama models (e.g., "meta-llama/Llama-3.1-8B-Instruct")
54+
- Qwen models (e.g., "Qwen/Qwen3-0.6B")
55+
- DeepSeek models (e.g., "deepseek-ai/deepseek-llm-7b-base")
56+
57+
Supported Tasks
58+
---------------
59+
LLM-only learners support all three main ontology learning tasks:
60+
61+
1. **Term Typing**: Predicting the type(s) of a given term
62+
2. **Taxonomy Discovery**: Identifying hierarchical relationships
63+
3. **Non-Taxonomy Discovery**: Identifying non-hierarchical relationships
64+
65+
Example
66+
-------
67+
For a complete example of using an LLM-only learner, see the example script:
68+
69+
.. code-block:: bash
70+
71+
python scripts/examples/learner_example_llm.py
72+
73+
.. note::
74+
75+
The code is available at `OntoLearner GitHub repository <https://github.com/sciknoworg/OntoLearner/blob/dev/scripts/examples/learner_example_llm.py>`_

docs/source/learners/rag.rst

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,66 @@
11
Retrieval Augmented Generation
2-
=======================================
2+
==============================
3+
RAG (Retrieval Augmented Generation) learners combine the strengths of both retrieval models
4+
and large language models to perform ontology learning tasks.
5+
6+
How RAG Learners Work
7+
---------------------
8+
RAG learners operate in two main steps:
9+
1. **Retrieval**: First, the retriever component finds the most relevant examples from the training data based on similarity to the input query.
10+
2. **Generation**: Then, the LLM component uses these retrieved examples as context to generate a response.
11+
12+
The methodology behind RAG learners combines vector retrieval with generative language modeling
13+
to enhance ontology learning tasks. This hybrid approach addresses the limitations of using LLMs alone
14+
by grounding the model's responses in specific ontological examples from the training data.
15+
By encoding ontological elements into a vector space, the retriever can identify semantically similar concepts,
16+
relations, or taxonomic structures. These retrieved examples serve as few-shot demonstrations
17+
that provide the LLM with domain-specific context, enabling more accurate and consistent ontological inferences.
18+
This approach is particularly effective for specialized domains where the model's pre-trained knowledge
19+
may be insufficient or where precise ontological alignments are critical.
20+
21+
Setting Up a RAG Learner
22+
------------------------
23+
Here's how to set up a RAG learner using the OntoLearner pipeline:
24+
25+
.. code-block:: python
26+
27+
from ontolearner.learner_pipeline import LearnerPipeline
28+
from ontolearner.ontology import Wine
29+
from ontolearner.utils.train_test_split import train_test_split
30+
31+
ontology = Wine()
32+
ontology.load()
33+
train_data, test_data = train_test_split(ontology.extract(), test_size=0.2)
34+
35+
pipeline = LearnerPipeline(
36+
task="term-typing",
37+
retriever_id="sentence-transformers/all-MiniLM-L6-v2",
38+
llm_id="mistralai/Mistral-7B-Instruct-v0.1",
39+
hf_token="your_huggingface_token"
40+
)
41+
42+
results, metrics = pipeline.fit_predict_evaluate(
43+
train_data=train_data,
44+
test_data=test_data,
45+
top_k=3,
46+
test_limit=10
47+
)
48+
49+
Supported Tasks
50+
---------------
51+
RAG learners support all three main ontology learning tasks:
52+
1. **Term Typing**: Predicting the type(s) of a given term
53+
2. **Taxonomy Discovery**: Identifying hierarchical relationships
54+
3. **Non-Taxonomy Discovery**: Identifying non-hierarchical relationships
55+
56+
Example
57+
-------
58+
For a complete example of using a RAG learner, see the example script:
59+
60+
.. code-block:: bash
61+
62+
python scripts/examples/learner_example_rag.py
63+
64+
.. note::
65+
66+
The code is available at `OntoLearner GitHub repository <https://github.com/sciknoworg/OntoLearner/blob/dev/scripts/examples/learner_example_rag.py>`_

0 commit comments

Comments
 (0)