You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This tutorial walks you through the process of ontology matching using the OntoAligner library, leveraging retrieval-augmented generation (RAG) techniques. Starting with the necessary module imports, it defines a task and loads source and target ontologies along with reference matchings. The tutorial then encodes the ontologies using a specialized encoder, configures a retriever and an LLM, and generates predictions. Finally, it demonstrates two postprocessing techniques—heuristic and hybrid—followed by saving the matched alignments in XML format, ready for use or further analysis.
4
+
Usage
5
+
----------------
6
+
7
+
This guide walks you through the process of ontology matching using the OntoAligner library, leveraging **retrieval-augmented generation (RAG)** techniques. Starting with the necessary module imports, it defines a task and loads source and target ontologies along with reference matchings. The tutorial then encodes the ontologies using a specialized encoder, configures a retriever and an LLM, and generates predictions. Finally, it demonstrates two postprocessing techniques—heuristic and hybrid—followed by saving the matched alignments in XML format, ready for use or further analysis.
5
8
6
9
.. code-block:: python
7
10
@@ -70,13 +73,84 @@ In this tutorial, we demonstrated:
70
73
* Refining results with heuristic and hybrid postprocessing
71
74
* Saving results in XML format
72
75
73
-
You can customize the configurations and thresholds based on your specific dataset and use case. For more details, refer to the :doc:`../package_reference/postprocess`
76
+
.. hint::
77
+
78
+
You can customize the configurations and thresholds based on your specific dataset and use case. For more details, refer to the :doc:`../package_reference/postprocess`
This tutorial works based on FewShotRAG matching, an extension of the RAG model, designed for few-shot learning tasks. The FewShot RAG workflow is the same as RAG but with two differences:
151
+
FewShot-RAG aligner is an extension of the RAG aligner, designed for few-shot learning based alignment. The FewShot RAG workflow is the same as RAG but with two differences:
78
152
79
-
1. You only need to use FewShot encoders as follows, and since a fewshot model uses multiple examples you might also provide only specific examples from reference or other examples as a fewshot samples.
153
+
1. You only need to use ``FewShotEncoder`` encoders as follows, and since a few-shot model uses multiple examples you might also provide only specific examples from reference or other examples as a fewshot samples.
80
154
81
155
.. code-block:: python
82
156
@@ -95,8 +169,80 @@ This tutorial works based on FewShot RAG matching, an extension of the RAG model
95
169
96
170
model = MistralLLMBERTRetrieverFSRAG(positive_ratio=0.7, n_shots=5, **config)
97
171
98
-
In-Context Vectors RAG
99
-
------------------------
172
+
Embedded FewShot-RAG aligners within OntoAligner:
173
+
174
+
.. list-table::
175
+
:widths: 30 60 10
176
+
:header-rows: 1
177
+
178
+
* - FewShot-RAG Aligner
179
+
- Description
180
+
- Link
181
+
182
+
* - ``FalconLLMAdaRetrieverFSRAG``
183
+
- Falcon LLM with Ada retriever and few-shot examples for enhanced alignment.
[1] Liu, S., Ye, H., Xing, L., & Zou, J. (2023). `In-context vectors: Making in context learning more effective and controllable through latent space steering <https://arxiv.org/abs/2311.06668>`_. arXiv preprint arXiv:2311.06668.
244
+
245
+
100
246
This RAG variant performs ontology matching using ``ConceptRAGEncoder`` only. The In-Contect Vectors introduced by [1](https://github.com/shengliu66/ICV) tackle in-context learning as in-context vectors (ICV). We used LLMs in this perspective in the RAG module. The workflow is the same as RAG or FewShot RAG with the following differences:
101
247
102
248
@@ -108,7 +254,7 @@ This RAG variant performs ontology matching using ``ConceptRAGEncoder`` only. Th
[1] Liu, S., Ye, H., Xing, L., & Zou, J. (2023). `In-context vectors: Making in context learning more effective and controllable through latent space steering <https://arxiv.org/abs/2311.06668>`_. arXiv preprint arXiv:2311.06668.
267
+
Embedded ICV-RAG aligners within OntoAligner:
268
+
269
+
.. list-table::
270
+
:widths: 30 60 10
271
+
:header-rows: 1
272
+
273
+
* - ICV-RAG Aligner
274
+
- Description
275
+
- Link
276
+
277
+
* - ``FalconLLMAdaRetrieverICVRAG``
278
+
- Falcon LLM with Ada retriever for iterative consistency verification (ICV) alignment.
You can use custom LLMs with RAG for alignment. Below, we define two classes, each combining a retrieval mechanism with a LLMs to implement RAG aligner functionality.
318
+
319
+
.. code-block:: python
320
+
321
+
from ontoaligner.aligner import (
322
+
TFIDFRetrieval,
323
+
SBERTRetrieval,
324
+
AutoModelDecoderRAGLLM,
325
+
AutoModelDecoderRAGLLMV2,
326
+
RAG
327
+
)
328
+
329
+
classQwenLLMTFIDFRetrieverRAG(RAG):
330
+
Retrieval = TFIDFRetrieval
331
+
LLM= AutoModelDecoderRAGLLMV2
332
+
333
+
classMinistralLLMBERTRetrieverRAG(RAG):
334
+
Retrieval = SBERTRetrieval
335
+
LLM= AutoModelDecoderRAGLLM
336
+
337
+
As you can see, **QwenLLMTFIDFRetrieverRAG** Utilizes ``TFIDFRetrieval`` for lightweight retriever with Qwen LLM. While, **MinistralLLMBERTRetrieverRAG** Employs ``SBERTRetrieval`` for retriever using sentence transformers and Ministral LLM.
338
+
339
+
**AutoModelDecoderRAGLLMV2 and AutoModelDecoderRAGLLM Differences:**
340
+
341
+
The primary distinction between ``AutoModelDecoderRAGLLMV2`` and ``AutoModelDecoderRAGLLM`` lies in the enhanced functionality of the former. ``AutoModelDecoderRAGLLMV2`` includes additional methods (as presented in the following) for better classification and token validation. Overall, these classes enable seamless integration of retrieval mechanisms with LLM-based generation, making them powerful tools for ontology alignment and other domain-specific applications.
342
+
343
+
344
+
.. code-block:: python
345
+
346
+
defget_probas_yes_no(self, outputs):
347
+
"""Retrieves the probabilities for the "yes" and "no" labels from model output."""
0 commit comments