Skip to content

Commit f52befd

Browse files
authored
Merge pull request #52 from sciknoworg/dev
documentation update
2 parents 6459fbf + 3d8b40b commit f52befd

File tree

24 files changed

+1379
-621
lines changed

24 files changed

+1379
-621
lines changed

docs/source/_static/custom.css

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
height: auto;
66
}
77

8-
.nav-item:nth-child(-n+5) {
9-
display: none; /* Hides the first four nav items */
10-
}
8+
/*.nav-item:nth-child(-n+4) {*/
9+
/* display: none; !* Hides the first four nav items *!*/
10+
/*}*/
1111

1212
.full-width {
1313
display: block;
@@ -61,7 +61,7 @@
6161
/* Medium screens (e.g. tablets) */
6262
@media (max-width: 1024px) {
6363
.content:not(.custom) {
64-
max-width: 90%;
64+
max-width: 100%;
6565
}
6666
}
6767

@@ -71,3 +71,11 @@
7171
max-width: 100%;
7272
}
7373
}
74+
75+
.project-vision {
76+
background-color: #e6f7ff;
77+
padding: 1em;
78+
border-left: 5px solid #ffbe18;
79+
margin-bottom: 1.2em;
80+
font-size: 1.1em;
81+
}

docs/source/_static/custom.js

Lines changed: 70 additions & 29 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/source/aligner/lightweight.rst

Lines changed: 203 additions & 178 deletions
Large diffs are not rendered by default.

docs/source/aligner/llm.rst

Lines changed: 231 additions & 125 deletions
Large diffs are not rendered by default.

docs/source/aligner/rag.rst

Lines changed: 250 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1-
Retrieval Augmented Generation
1+
Retrieval-Augmented Generation
22
================================
33

4-
This tutorial walks you through the process of ontology matching using the OntoAligner library, leveraging retrieval-augmented generation (RAG) techniques. Starting with the necessary module imports, it defines a task and loads source and target ontologies along with reference matchings. The tutorial then encodes the ontologies using a specialized encoder, configures a retriever and an LLM, and generates predictions. Finally, it demonstrates two postprocessing techniques—heuristic and hybrid—followed by saving the matched alignments in XML format, ready for use or further analysis.
4+
Usage
5+
----------------
6+
7+
This guide walks you through the process of ontology matching using the OntoAligner library, leveraging **retrieval-augmented generation (RAG)** techniques. Starting with the necessary module imports, it defines a task and loads source and target ontologies along with reference matchings. The tutorial then encodes the ontologies using a specialized encoder, configures a retriever and an LLM, and generates predictions. Finally, it demonstrates two postprocessing techniques—heuristic and hybrid—followed by saving the matched alignments in XML format, ready for use or further analysis.
58

69
.. code-block:: python
710
@@ -70,13 +73,84 @@ In this tutorial, we demonstrated:
7073
* Refining results with heuristic and hybrid postprocessing
7174
* Saving results in XML format
7275

73-
You can customize the configurations and thresholds based on your specific dataset and use case. For more details, refer to the :doc:`../package_reference/postprocess`
76+
.. hint::
77+
78+
You can customize the configurations and thresholds based on your specific dataset and use case. For more details, refer to the :doc:`../package_reference/postprocess`
79+
80+
Embedded RAG aligners within OntoAligner:
81+
82+
.. list-table::
83+
:widths: 30 60 10
84+
:header-rows: 1
85+
86+
* - RAG Aligner
87+
- Description
88+
- Link
89+
90+
* - ``FalconLLMAdaRetrieverRAG``
91+
- Uses Falcon LLM with Ada-based dense retrieval.
92+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L85-L94>`__
93+
94+
* - ``FalconLLMBERTRetrieverRAG``
95+
- Uses Falcon LLM with BERT-based retrieval for contextual matching.
96+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L95-L102>`__
97+
98+
* - ``GPTOpenAILLMAdaRetrieverRAG``
99+
- Uses OpenAI GPT (e.g., GPT-4) with Ada-based retriever.
100+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L65-L73>`__
101+
102+
* - ``GPTOpenAILLMBERTRetrieverRAG``
103+
- Combines OpenAI GPT models with BERT-based retrieval.
104+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L75-L83>`__
105+
106+
* - ``LLaMALLMAdaRetrieverRAG``
107+
- Wraps LLaMA models with Ada retriever for hybrid RAG-based alignment.
108+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L25-L33>`__
109+
110+
* - ``LLaMALLMBERTRetrieverRAG``
111+
- Uses LLaMA models with BERT for semantic retrieval.
112+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L35-L43>`__
113+
114+
* - ``MPTLLMAdaRetrieverRAG``
115+
- Utilizes MPT models with Ada retriever for alignment generation.
116+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L125-L132>`__
117+
118+
* - ``MPTLLMBERTRetrieverRAG``
119+
- MPT model with BERT-based retrieval for enhanced context grounding.
120+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L135-L142>`__
121+
122+
* - ``MambaLLMAdaRetrieverRAG``
123+
- Uses Mamba LLM with Ada retriever for token-efficient alignment.
124+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L145-L152>`__
125+
126+
* - ``MambaLLMBERTRetrieverRAG``
127+
- Mamba LLM paired with BERT retriever for structured knowledge alignment.
128+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L155-L162>`__
129+
130+
* - ``MistralLLMAdaRetrieverRAG``
131+
- Mistral model with Ada retriever for compact and fast RAG workflows.
132+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L45-L52>`__
133+
134+
* - ``MistralLLMBERTRetrieverRAG``
135+
- Mistral model enhanced with BERT-based retrieval.
136+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L55-L63>`__
137+
138+
* - ``VicunaLLMAdaRetrieverRAG``
139+
- Vicuna model using Ada retrieval for alignment generation.
140+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L105-L112>`__
141+
142+
* - ``VicunaLLMBERTRetrieverRAG``
143+
- Vicuna model with BERT retriever for high-accuracy RAG-based alignment.
144+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/rag/models.py#L115-L122>`__
145+
74146

75-
FewShot RAG
147+
148+
149+
FewShot-RAG Aligner
76150
------------------------
77-
This tutorial works based on FewShot RAG matching, an extension of the RAG model, designed for few-shot learning tasks. The FewShot RAG workflow is the same as RAG but with two differences:
151+
FewShot-RAG aligner is an extension of the RAG aligner, designed for few-shot learning based alignment. The FewShot RAG workflow is the same as RAG but with two differences:
78152

79-
1. You only need to use FewShot encoders as follows, and since a fewshot model uses multiple examples you might also provide only specific examples from reference or other examples as a fewshot samples.
153+
1. You only need to use ``FewShotEncoder`` encoders as follows, and since a few-shot model uses multiple examples you might also provide only specific examples from reference or other examples as a fewshot samples.
80154

81155
.. code-block:: python
82156
@@ -95,8 +169,80 @@ This tutorial works based on FewShot RAG matching, an extension of the RAG model
95169
96170
model = MistralLLMBERTRetrieverFSRAG(positive_ratio=0.7, n_shots=5, **config)
97171
98-
In-Context Vectors RAG
99-
------------------------
172+
Embedded FewShot-RAG aligners within OntoAligner:
173+
174+
.. list-table::
175+
:widths: 30 60 10
176+
:header-rows: 1
177+
178+
* - FewShot-RAG Aligner
179+
- Description
180+
- Link
181+
182+
* - ``FalconLLMAdaRetrieverFSRAG``
183+
- Falcon LLM with Ada retriever and few-shot examples for enhanced alignment.
184+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L87-L95>`__
185+
186+
* - ``FalconLLMBERTRetrieverFSRAG``
187+
- Falcon LLM with BERT-based retrieval in a few-shot setup.
188+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L97-L105>`__
189+
190+
* - ``GPTOpenAILLMAdaRetrieverFSRAG``
191+
- OpenAI GPT with Ada retriever for few-shot RAG alignment.
192+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L67-L75>`__
193+
194+
* - ``GPTOpenAILLMBERTRetrieverFSRAG``
195+
- Combines OpenAI GPT and BERT retriever with few-shot prompting.
196+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L77-L84>`__
197+
198+
* - ``LLaMALLMAdaRetrieverFSRAG``
199+
- LLaMA model with Ada retriever for prompt-efficient few-shot alignment.
200+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L27-L34>`__
201+
202+
* - ``LLaMALLMBERTRetrieverFSRAG``
203+
- LLaMA with BERT retriever in a few-shot reasoning framework.
204+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L37-L44>`__
205+
206+
* - ``MPTLLMAdaRetrieverFSRAG``
207+
- MPT LLM with Ada-based retrieval in few-shot alignment generation.
208+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L127-L134>`__
209+
210+
* - ``MPTLLMBERTRetrieverFSRAG``
211+
- MPT model using BERT retriever and few-shot prompting for improved accuracy.
212+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L137-L144>`__
213+
214+
* - ``MambaLLMAdaRetrieverFSRAG``
215+
- Mamba LLM integrated with Ada retriever for low-latency few-shot alignment.
216+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L147-L154>`__
217+
218+
* - ``MambaLLMBERTRetrieverFSRAG``
219+
- Mamba model paired with BERT-based retrieval and few-shot capabilities.
220+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L157-L164>`__
221+
222+
* - ``MistralLLMAdaRetrieverFSRAG``
223+
- Mistral LLM with Ada retriever and few-shot support.
224+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L47-L54>`__
225+
226+
* - ``MistralLLMBERTRetrieverFSRAG``
227+
- Mistral model with BERT retrieval, enhanced by few-shot prompting.
228+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L57-L64>`__
229+
230+
* - ``VicunaLLMAdaRetrieverFSRAG``
231+
- Vicuna model with Ada retriever for fast, few-shot alignment.
232+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L107-L114>`__
233+
234+
* - ``VicunaLLMBERTRetrieverFSRAG``
235+
- Vicuna with BERT retriever in a few-shot setting for high-precision alignment.
236+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/fewshot/models.py#L117-L124>`__
237+
238+
ICV-RAG Aligner
239+
---------------------------------
240+
241+
.. sidebar:: Citation
242+
243+
[1] Liu, S., Ye, H., Xing, L., & Zou, J. (2023). `In-context vectors: Making in context learning more effective and controllable through latent space steering <https://arxiv.org/abs/2311.06668>`_. arXiv preprint arXiv:2311.06668.
244+
245+
100246
This RAG variant performs ontology matching using ``ConceptRAGEncoder`` only. The In-Contect Vectors introduced by [1](https://github.com/shengliu66/ICV) tackle in-context learning as in-context vectors (ICV). We used LLMs in this perspective in the RAG module. The workflow is the same as RAG or FewShot RAG with the following differences:
101247

102248

@@ -108,7 +254,7 @@ This RAG variant performs ontology matching using ``ConceptRAGEncoder`` only. Th
108254
encoder_model = ConceptRAGEncoder()
109255
encoded_ontology = encoder_model(source=dataset['source'], target=dataset['target'], reference=dataset['reference'])
110256
111-
2. Next, import an ICVRAG model, here we use Falcon model:
257+
2. Next, import an ICV-RAG aligner, here we use Falcon model:
112258

113259
.. code-block:: python
114260
@@ -118,4 +264,98 @@ This RAG variant performs ontology matching using ``ConceptRAGEncoder`` only. Th
118264
model.load(llm_path="tiiuae/falcon-7b", ir_path="all-MiniLM-L6-v2")
119265
120266
121-
[1] Liu, S., Ye, H., Xing, L., & Zou, J. (2023). `In-context vectors: Making in context learning more effective and controllable through latent space steering <https://arxiv.org/abs/2311.06668>`_. arXiv preprint arXiv:2311.06668.
267+
Embedded ICV-RAG aligners within OntoAligner:
268+
269+
.. list-table::
270+
:widths: 30 60 10
271+
:header-rows: 1
272+
273+
* - ICV-RAG Aligner
274+
- Description
275+
- Link
276+
277+
* - ``FalconLLMAdaRetrieverICVRAG``
278+
- Falcon LLM with Ada retriever for iterative consistency verification (ICV) alignment.
279+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L47-L54>`__
280+
281+
* - ``FalconLLMBERTRetrieverICVRAG``
282+
- Falcon LLM combined with BERT-based retriever for ICV-guided alignment.
283+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L57-L65>`__
284+
285+
* - ``LLaMALLMAdaRetrieverICVRAG``
286+
- LLaMA model with Ada retriever optimized for ICV-based reasoning.
287+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L15-L31>`__
288+
289+
* - ``LLaMALLMBERTRetrieverICVRAG``
290+
- LLaMA model paired with BERT retriever for ICV-driven alignment.
291+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L27-L34>`__
292+
293+
* - ``MPTLLMAdaRetrieverICVRAG``
294+
- MPT model with Ada retrieval for consistency-verified RAG alignment.
295+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L87-L94>`__
296+
297+
* - ``MPTLLMBERTRetrieverICVRAG``
298+
- MPT LLM with BERT retriever in an ICV pipeline for robust alignment.
299+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L97-L104>`__
300+
301+
* - ``VicunaLLMAdaRetrieverICVRAG``
302+
- Vicuna LLM with Ada retriever for ICV-RAG tasks.
303+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L67-L74>`__
304+
305+
* - ``VicunaLLMBERTRetrieverICVRAG``
306+
- Vicuna model paired with BERT-based retrieval for iterative consistency verification.
307+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/icv/models.py#L77-L84>`__
308+
309+
310+
Customized-RAG Aligner
311+
-----------------------
312+
313+
.. sidebar:: Useful links:
314+
315+
* `OntoAlignerPipeline Experimentation <https://github.com/sciknoworg/OntoAligner/blob/main/examples/OntoAlignerPipeline-Exp.ipynb>`_
316+
317+
You can use custom LLMs with RAG for alignment. Below, we define two classes, each combining a retrieval mechanism with a LLMs to implement RAG aligner functionality.
318+
319+
.. code-block:: python
320+
321+
from ontoaligner.aligner import (
322+
TFIDFRetrieval,
323+
SBERTRetrieval,
324+
AutoModelDecoderRAGLLM,
325+
AutoModelDecoderRAGLLMV2,
326+
RAG
327+
)
328+
329+
class QwenLLMTFIDFRetrieverRAG(RAG):
330+
Retrieval = TFIDFRetrieval
331+
LLM = AutoModelDecoderRAGLLMV2
332+
333+
class MinistralLLMBERTRetrieverRAG(RAG):
334+
Retrieval = SBERTRetrieval
335+
LLM = AutoModelDecoderRAGLLM
336+
337+
As you can see, **QwenLLMTFIDFRetrieverRAG** Utilizes ``TFIDFRetrieval`` for lightweight retriever with Qwen LLM. While, **MinistralLLMBERTRetrieverRAG** Employs ``SBERTRetrieval`` for retriever using sentence transformers and Ministral LLM.
338+
339+
**AutoModelDecoderRAGLLMV2 and AutoModelDecoderRAGLLM Differences:**
340+
341+
The primary distinction between ``AutoModelDecoderRAGLLMV2`` and ``AutoModelDecoderRAGLLM`` lies in the enhanced functionality of the former. ``AutoModelDecoderRAGLLMV2`` includes additional methods (as presented in the following) for better classification and token validation. Overall, these classes enable seamless integration of retrieval mechanisms with LLM-based generation, making them powerful tools for ontology alignment and other domain-specific applications.
342+
343+
344+
.. code-block:: python
345+
346+
def get_probas_yes_no(self, outputs):
347+
"""Retrieves the probabilities for the "yes" and "no" labels from model output."""
348+
probas_yes_no = (outputs.scores[0][:, self.answer_sets_token_id["yes"] +
349+
self.answer_sets_token_id["no"]].float().softmax(-1))
350+
return probas_yes_no
351+
352+
def check_answer_set_tokenizer(self, answer: str) -> bool:
353+
"""Checks if the tokenizer produces a single token for a given answer string."""
354+
return len(self.tokenizer(answer).input_ids) == 1
355+
356+
357+
.. note::
358+
359+
Consider reading the following section next:
360+
361+
* `Package Reference > Aligners <../package_reference/aligners.html>`_

0 commit comments

Comments
 (0)