Skip to content

Commit a633aaf

Browse files
committed
📝 update readme
1 parent d96821d commit a633aaf

File tree

1 file changed

+69
-79
lines changed

1 file changed

+69
-79
lines changed

README.md

Lines changed: 69 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -35,27 +35,32 @@ import ontolearner
3535
print(ontolearner.__version__)
3636
```
3737

38+
Please refer to [Installation](https://ontolearner.readthedocs.io/installation.html) page for further options.
3839

3940
## 🔗 Essential Resources
4041

41-
| Resource | Info |
42-
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----|
43-
| **[📚 OntoLearner Documentation](https://ontolearner.readthedocs.io/)** | Dive into OntoLearner's extensive documentation to explore its modular architecture, including Ontologizers, Learning Tasks, and Learner Models. The documentation provides detailed guides, references, and tutorials to help you get started and make the most of OntoLearner's capabilities. |
44-
| **[🤗 Datasets on Hugging Face](https://huggingface.co/collections/SciKnowOrg/ontolearner-benchmarking-6823bcd051300c210b7ef68a)** | You can access the curated colloctions of machine-readable ontologies across diverse domains such as agriculture, medicine, social sciences, and more. OntoLearner Benchmarking datasets are optimized for integration into generative AI pipelines, supporting versioning, streaming, and metadata inspection.|
45-
| **Quick Tour on OntoLearner** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DuElAyEFzd1vtqTjDEXWcc0zCbiV2Yee?usp=sharing) | Follow this hands-on Colab tutorial to explore the complete OntoLearner workflow—from loading ontologies and extracting structured data, to training RAG models and evaluating performance on benchmark tasks. Ideal for researchers, developers, and educators getting started with ontology-centric machine learning. |
46-
42+
| Resource | Info |
43+
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
44+
| **[📚 OntoLearner Documentation](https://ontolearner.readthedocs.io/)** | OntoLearner's extensive documentation website. |
45+
| **[🤗 Datasets on Hugging Face](https://huggingface.co/collections/SciKnowOrg/ontolearner-benchmarking-6823bcd051300c210b7ef68a)** | Access curated, machine-readable ontologies. |
46+
| **Quick Tour on OntoLearner** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DuElAyEFzd1vtqTjDEXWcc0zCbiV2Yee?usp=sharing) ``version=1.2.1`` | OntoLearner hands-on Colab tutorials. |
47+
| **[🚀 Quickstart](https://ontolearner.readthedocs.io/quickstart.html)** | Get started quickly with OntoLearner’s main features and workflow. |
48+
| **[🕸️ Learning Tasks](https://ontolearner.readthedocs.io/learning_tasks/learning_tasks.html)** | Explore supported ontology learning tasks like LLMs4OL Paradigm tasks and Text2Onto. | |
49+
| **[🧠 Learner Models](https://ontolearner.readthedocs.io/learners/llm.html)** | Browse and configure various learner models, including LLMs, Retrieval, or RAG approaches. |
50+
| **[📚 Ontologies Documentations](https://ontolearner.readthedocs.io/benchmarking/benchmark.html)** | Review benchmark ontologies and datasets used for evaluation and training. |
51+
| **[🧩 How to work with Ontologizer?](https://ontolearner.readthedocs.io/ontologizer/ontology_modularization.html)** | Learn how to modularize and preprocess ontologies using the Ontologizer module. |
4752

4853
## 🚀 Quick Tour
4954
Get started with OntoLearner in just a few lines of code. This guide demonstrates how to initialize ontologies, load datasets, and train an LLM-assisted learner for ontology engineering tasks.
5055

5156
**Basic Usage - Automatic Download from Hugging Face**:
5257
```python
53-
from ontolearner.ontology import Wine
58+
from ontolearner import Wine
5459

5560
# 1. Initialize an ontologizer from OntoLearner
5661
ontology = Wine()
5762

58-
# 2. Load the ontology automatically from Hugging Face
63+
# 2. Load the ontology automatically from HuggingFace
5964
ontology.load()
6065

6166
# 3. Extract the learning task dataset
@@ -67,99 +72,84 @@ To see the ontology metadata you can print the ontology:
6772
print(ontology)
6873
```
6974

70-
**Basic Usage - Manual Download from Hugging Face**:
71-
```python
72-
from ontolearner.ontology import Wine
73-
74-
# 1. Initialize an ontologizer from OntoLearner
75-
ontology = Wine()
75+
Now, explore [150+ ready-to-use ontologies](https://ontolearner.readthedocs.io/benchmarking/benchmark.html) or read on [how to work with ontologizers](https://ontolearner.readthedocs.io/ontologizer/ontology_modularization.html).
7676

77-
# 2. Download the ontology from Hugging Face
78-
ontology.from_huggingface()
79-
```
77+
**Learner Models**:
8078

81-
**LLM-Based Learning Pipeline**:
8279
```python
83-
from ontolearner import ontology, utils, learner
84-
from ontolearner.evaluation import calculate_term_typing_metrics
80+
from ontolearner import AutoRetrieverLearner, AgrO, train_test_split, evaluation_report
8581

86-
# 1. Load the ontology and extract training data
87-
onto = ontology.Wine()
88-
data = onto.extract()
82+
# 1. Programmatic import of an ontology
83+
ontology = AgrO()
84+
ontology.load()
8985

90-
# 2. Split into train and test sets
91-
train_data, test_data = utils.train_test_split(
92-
data, test_size=0.2, random_state=42
93-
)
86+
# 2. Load tasks datasets
87+
ontological_data = ontology.extract()
9488

95-
# 3. Initialize a Retrieval-Augmented Generation (RAG) learner
96-
retriever = learner.BERTRetrieverLearner()
97-
llm = learner.AutoLearnerLLM(token="...") # a token required for LLMs with an access
98-
prompt = learner.StandardizedPrompting(task="term-typing")
89+
# 3. Split into train and test sets
90+
train_data, test_data = train_test_split(ontological_data, test_size=0.2, random_state=42)
9991

100-
rag_learner = learner.AutoRAGLearner(
101-
learner_retriever=retriever,
102-
learner_llm=llm,
103-
prompting=prompt
104-
)
92+
# 4. Initialize Learner
93+
task = 'non-taxonomic-re'
94+
ret_learner = AutoRetrieverLearner(top_k=5)
95+
ret_learner.load(model_id='sentence-transformers/all-MiniLM-L6-v2')
10596

106-
# 4. Load pretrained components
107-
rag_learner.load(
108-
retriever_id="sentence-transformers/all-MiniLM-L6-v2",
109-
llm_id="Qwen/Qwen2.5-0.5B-Instruct"
110-
)
97+
# 5. Fit the model to training data and do the predict
98+
ret_learner.fit(train_data, task=task)
99+
predicts = ret_learner.predict(test_data, task=task)
111100

112-
# 5. Fit the model to training data
113-
rag_learner.fit(train_data=train_data, task="term-typing")
114-
115-
# 6. Predict on test data
116-
results = []
117-
for typing in test_data.term_typings:
118-
term = typing.term
119-
ground_truth = typing.types
120-
predicted = rag_learner.predict(term, task="term-typing")
121-
metrics = calculate_term_typing_metrics(predicted, ground_truth)
122-
results.append({
123-
'term': term,
124-
'ground_truth': ground_truth,
125-
'predicted': predicted,
126-
**metrics
127-
})
101+
# 6. Evaluation
102+
truth = ret_learner.tasks_ground_truth_former(data=test_data, task=task)
103+
metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)
104+
print(metrics)
128105
```
106+
Other learners:
107+
* [LLM-Based Learner](https://ontolearner.readthedocs.io/learners/llm.html)
108+
* [RAG-Based Learner](https://ontolearner.readthedocs.io/learners/rag.html)
109+
110+
**LearnerPipeline**: The OntoLearner also offers a streamlined `LearnerPipeline` class that simplifies the entire process of initializing, training, predicting, and evaluating a RAG setup into a single call.
129111

130-
**LearnerPipeline**:
131-
```python
132-
from ontolearner import LearnerPipeline
133-
from ontolearner import ontology, utils
134112

135-
# 1. Load the ontology and extract training data
136-
onto = ontology.Wine()
137-
data = onto.extract()
138113

139-
# 2. Split into train and test sets
140-
train_data, test_data = utils.train_test_split(
141-
data, test_size=0.2, random_state=42
114+
```python
115+
# Import core components from the OntoLearner library
116+
from ontolearner import LearnerPipeline, AgrO, train_test_split
117+
118+
# Load the AgrO ontology, which includes structured agricultural knowledge
119+
ontology = AgrO()
120+
ontology.load() # Load ontology data (e.g., entities, relations, metadata)
121+
122+
# Extract relation instances from the ontology and split them into training and test sets
123+
train_data, test_data = train_test_split(
124+
ontology.extract(), # Extract annotated (head, tail, relation) triples
125+
test_size=0.2, # 20% for evaluation
126+
random_state=42 # Ensures reproducible splits
142127
)
143128

144-
# 3. Specify learner pipeline and models
129+
# Initialize the learning pipeline using a dense retriever
145130
pipeline = LearnerPipeline(
146-
task="term-typing",
147-
retriever_id="sentence-transformers/all-MiniLM-L6-v2",
148-
llm_id="Qwen/Qwen2.5-0.5B-Instruct",
149-
hf_token="your_huggingface_token"
131+
retriever_id='sentence-transformers/all-MiniLM-L6-v2', # Hugging Face model ID for retrieval
132+
batch_size=10, # Number of samples to process per batch (if batching is enabled internally)
133+
top_k=5 # Retrieve top-5 most relevant support instance per query
150134
)
151135

152-
# 4. fit, predict, and evaluate
153-
results, metrics = pipeline.fit_predict_evaluate(
136+
# Run the pipeline on the training and test data
137+
# The pipeline performs: fit() → predict() → evaluate() in sequence
138+
outputs = pipeline(
154139
train_data=train_data,
155140
test_data=test_data,
156-
top_k=3, # Retrieve top-3 similar examples
157-
test_limit=-1 # on all samples
141+
evaluate=True, # If True, computes precision, recall, and F1-score
142+
task='non-taxonomic-re' # Specifies that we are doing non-taxonomic relation prediction
158143
)
159144

160-
# 5. printing the results
161-
print(f"RAG F1-Score: {metrics['avg_f1_score']:.3f}")
162-
print(f"RAG Exact Match: {metrics['avg_exact_match']:.3f}")
145+
# Print the evaluation metrics (precision, recall, F1)
146+
print("Metrics:", outputs['metrics'])
147+
148+
# Print the total elapsed time for training and evaluation
149+
print("Elapsed time:", outputs['elapsed_time'])
150+
151+
# Print the full output dictionary (includes predictions)
152+
print(outputs)
163153
```
164154

165155
## ⭐ Contribution

0 commit comments

Comments
 (0)