Skip to content

Commit dbcfcf7

Browse files
authored
Merge pull request #54 from sciknoworg/dev
add kge with docs
2 parents e1d8d02 + a20920f commit dbcfcf7

File tree

12 files changed

+395
-86
lines changed

12 files changed

+395
-86
lines changed

README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,15 @@ Comprehensive documentation for OntoAligner, including detailed guides and examp
3939

4040

4141

42-
| Example | Tutorial | Script |
43-
|:----------------------------------------|:--------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------:|
44-
| Lightweight | [📚 Fuzzy Matching](https://ontoaligner.readthedocs.io/aligner/lightweight.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/fuzzy_matching.py) |
45-
| Retrieval | [📚 Retrieval Aligner](https://ontoaligner.readthedocs.io/aligner/retriever.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/retriever_matching.py) |
46-
| Large Language Models | [📚 Large Language Models Aligner](https://ontoaligner.readthedocs.io/aligner/llm.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/llm_matching.py) |
47-
| Retrieval Augmented Generation | [📚 Retrieval Augmented Generation](https://ontoaligner.readthedocs.io/aligner/rag.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/rag_matching.py)|
48-
| FewShot | [📚 FewShot RAG](https://ontoaligner.readthedocs.io/aligner/rag.html#fewshot-rag) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/rag_matching.py)
49-
| In-Context Vectors Learning | [📚 In-Context Vectors RAG](https://ontoaligner.readthedocs.io/aligner/rag.html#in-context-vectors-rag) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/icv_rag_matching.py)
42+
| Example | Tutorial | Script |
43+
|:-------------------------------|:--------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------:|
44+
| Lightweight | [📚 Fuzzy Matching](https://ontoaligner.readthedocs.io/aligner/lightweight.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/fuzzy_matching.py) |
45+
| Retrieval | [📚 Retrieval Aligner](https://ontoaligner.readthedocs.io/aligner/retriever.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/retriever_matching.py) |
46+
| Large Language Models | [📚 LLM Aligner](https://ontoaligner.readthedocs.io/aligner/llm.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/llm_matching.py) |
47+
| Retrieval Augmented Generation | [📚 RAG Aligner](https://ontoaligner.readthedocs.io/aligner/rag.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/rag_matching.py)|
48+
| FewShot | [📚 FewShot-RAG Aligner](https://ontoaligner.readthedocs.io/aligner/rag.html#fewshot-rag) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/rag_matching.py)
49+
| In-Context Vectors Learning | [📚 In-Context Vectors RAG](https://ontoaligner.readthedocs.io/aligner/rag.html#in-context-vectors-rag) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/icv_rag_matching.py)
50+
| Knowledge Graph Embedding | [📚 KGE Aligner](https://ontoaligner.readthedocs.io/aligner/kge.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/main/examples/kge.py)
5051
| eCommerce | [📚 Product Alignment in eCommerce](https://ontoaligner.readthedocs.io/usecases/ecommerce.html) | [📝 Code](https://github.com/sciknoworg/OntoAligner/blob/dev/examples/ecommerce_product_alignment.py)
5152

5253
## 🚀 Quick Tour

docs/source/aligner/kge.rst

Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
Knowledge Graph Embedding
2+
================================
3+
4+
Graph Embeddings
5+
---------------------------------
6+
7+
Ontology alignment involves finding correspondences between entities in different ontologies. OntoAligner addresses this challenge by leveraging **Knowledge Graph Embedding (KGE)** models. The core idea of KGE is to represent entities (like classes, properties, individuals) and relations within an ontology as **low-dimensional vectors** in a continuous vector space. These numerical representations (embeddings) are learned to preserve semantic relationships from the original ontology geometrically in the embedding space.
8+
9+
.. hint::
10+
11+
**Why KGE for Alignment?**
12+
13+
1) *Semantic Preservation*: KGE models aim to capture the meaning and relationships of entities in their vector representations.
14+
2) *Scalability*: Working with numerical vectors can be more efficient for large-scale comparison than symbolic matching.
15+
3) *Similarity Measurement*: Once entities are embedded, their semantic similarity can be easily measured (e.g., using cosine similarity).
16+
17+
18+
OntoAligner's KGE-based alignment process involves several key components that work in sequence. These components are described in the following figure within ``GraphEmbeddingsAligner``.
19+
20+
.. raw:: html
21+
22+
<div align="center">
23+
<img src="https://raw.githubusercontent.com/sciknoworg/OntoAligner/refs/heads/dev/docs/source/img/kge.jpg" width="80%"/>
24+
</div>
25+
26+
27+
Usage
28+
------------
29+
30+
.. sidebar::
31+
32+
Full code is available at `OntoAligner Repository. <https://github.com/sciknoworg/OntoAligner/blob/main/examples/kge.py>`_
33+
34+
35+
This module guides you through a step-by-step process for performing ontology alignment using a KGEs and the OntoAligner library. By the end, you’ll understand how to preprocess data, encode ontologies, generate alignments, evaluate results, and save the outputs in XML and JSON formats.
36+
37+
38+
.. tab:: ➡️ 1: Parser
39+
40+
The first step is to prepare the ontology data for the KGE model. The **Parser** transforms raw ontology information into a structured format suitable for KGE models.
41+
42+
.. code-block:: python
43+
44+
from ontoaligner.ontology import GraphTripleOMDataset
45+
46+
task = GraphTripleOMDataset()
47+
task.ontology_name = "Mouse-Human"
48+
print("task:", task)
49+
# >>> task: Track: GraphTriple, Source-Target sets: Mouse-Human
50+
51+
dataset = task.collect(
52+
source_ontology_path="assets/mouse-human/source.xml",
53+
target_ontology_path="assets/mouse-human/target.xml",
54+
reference_matching_path="assets/mouse-human/reference.xml"
55+
)
56+
print("dataset key-values:", dataset.keys())
57+
# >>> dataset key-values: dict_keys(['dataset-info', 'source', 'target', 'reference'])
58+
59+
print("Sample source ontology:", dataset['source'][0])
60+
61+
This will result in the sample source ontology with following metadata:
62+
63+
.. code-block:: javascript
64+
65+
[
66+
{
67+
'subject': ('http://mouse.owl#MA_0000143', 'tonsil'),
68+
'predicate': ('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'type'),
69+
'object': ('http://www.w3.org/2002/07/owl#Class', 'Class'),
70+
'subject_is_class': True,
71+
'object_is_class': False
72+
},
73+
...
74+
]
75+
::
76+
77+
.. tab:: ➡️ 2: Encoder
78+
79+
Once the soruce and target ontologies are parsed, the ``GraphTripleEncoder`` creates a triplet representations. The triplet representation is in ``[(Subject Label, Predicate Label, Object Label), ... ]`` format, which is standard input for KGE models.
80+
81+
.. code-block:: python
82+
83+
from ontoaligner.encoder import GraphTripleEncoder
84+
85+
encoder = GraphTripleEncoder()
86+
encoded_dataset = encoder(**dataset)
87+
::
88+
89+
.. tab:: ➡️ 3: Aligner
90+
91+
92+
After triplets are generated, they are fed into the KGE model. This is the core engine that learns low-dimensional embeddings for all entities and relations present in the triplets. Here lets use ``CovEAligner``, it is a specific implementation of the KGE-based aligner (specifically `ConvE <https://aaai.org/papers/11573-convolutional-2d-knowledge-graph-embeddings/>`_) within the OntoAligner library. It encapsulates the entire process from data ingestion and embedding learning to alignment prediction.
93+
94+
.. code-block:: python
95+
96+
from ontoaligner.aligner import ConvEAligner
97+
98+
kge_params = {
99+
'device': 'cpu', # str: Device to use for training ('cpu' or 'cuda')
100+
'embedding_dim': 300, # int: Dimensionality of learned embeddings
101+
'num_epochs': 50, # int: Number of training epochs
102+
'train_batch_size': 128, # int: Number of positive triplets per training batch
103+
'eval_batch_size': 64, # int: Number of triplets per evaluation batch
104+
'num_negs_per_pos': 5, # int: Number of negative samples per positive triplet
105+
'random_seed': 42, # int: Seed for reproducibility
106+
}
107+
108+
aligner = ConvEAligner(**kge_params)
109+
110+
matchings = aligner.generate(input_data=encoded_dataset)
111+
112+
.. note::
113+
114+
The ``.generate`` function will do the training and then matching.
115+
116+
::
117+
118+
.. tab:: ➡️ 4: Post-Process
119+
120+
This step focuses on post-processing predicted matchings, potentially utilizing a similarity score for filtering and applying cardinality based processing, and subsequently evaluating their quality against a reference dataset to assess performance before and after post-processing.
121+
122+
.. code-block:: python
123+
124+
from ontoaligner.postprocess import graph_postprocessor
125+
126+
processed_matchings = graph_postprocessor(predicts=matchings, threshold=0.5)
127+
128+
::
129+
130+
.. tab:: ➡️ 5: Evaluate and Export
131+
132+
The following code will compare the generated alignments with reference matchings. Then save the matchings in both XML and JSON formats for further analysis or use. Feel free to use any of the techniques.
133+
134+
.. code-block:: python
135+
136+
from ontoaligner.utils import metrics
137+
138+
evaluation = metrics.evaluation_report(predicts=matchings, references=dataset['reference'])
139+
print("Matching Evaluation Report:\n", evaluation)
140+
141+
evaluation = metrics.evaluation_report(predicts=processed_matchings, references=dataset['reference'])
142+
print("Matching Evaluation Report -- after post-processing:\n", evaluation)
143+
144+
145+
.. tab:: 📄 <> Export matchings to XML
146+
147+
::
148+
149+
from ontoaligner.utils import metrics
150+
151+
xml_str = xmlify.xml_alignment_generator(matchings=processed_matchings)
152+
with open("matchings.xml", "w", encoding="utf-8") as xml_file:
153+
xml_file.write(xml_str)
154+
155+
.. tab:: # 🧾 {} Export matchings to JSON
156+
157+
::
158+
159+
with open("matchings.json", "w", encoding="utf-8") as json_file:
160+
json.dump(processed_matchings, json_file, indent=4, ensure_ascii=False)
161+
::
162+
163+
164+
165+
166+
167+
168+
169+
KGE Aligners
170+
----------------------
171+
172+
173+
174+
The ``ontoaligner.aligner.graph`` module provides a suite of graph embedding-based aligners built on top of popular KGE models. These aligners leverage link prediction objectives and low-dimensional vector spaces to learn semantic representations of entities, facilitating accurate ontology alignment even across heterogeneous structures. Each aligner wraps a specific KGE model implemented through the PyKEEN framework, allowing plug-and-play integration and consistent similarity scoring across models. Some models include custom similarity functions to better capture semantic distance in complex embedding spaces (e.g., complex numbers or quaternions).
175+
176+
The following table lists the available KGE aligners:
177+
178+
.. list-table::
179+
:widths: 20 70 10
180+
:header-rows: 1
181+
182+
* - Aligner Name
183+
- Description
184+
- Link
185+
186+
* - ``ConvEAligner``
187+
- Based on ConvE, which uses 2D convolutions over reshaped entity and relation embeddings to model complex interactions.
188+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L17-L18>`_
189+
* - ``TransDAligner``
190+
- Based on TransD, which constructs relation-specific projection matrices dynamically from both entity and relation vectors.
191+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L21-L22>`_
192+
* - ``TransEAligner``
193+
- Based on TransE, a translation-based model that learns embeddings where :math:`h + r \approx t`.
194+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L25-L26>`_
195+
* - ``TransFAligner``
196+
- Based on TransF, which enables flexible translations for complex relations without increasing model complexity.
197+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L29-L230>`_
198+
* - ``TransHAligner``
199+
- Based on TransH, which projects entities onto relation-specific hyperplanes before translation.
200+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L33-L234>`_
201+
* - ``TransRAligner``
202+
- Based on TransR, which embeds entities and relations in separate spaces using relation-specific projections.
203+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L37-L38>`_
204+
* - ``DistMultAligner``
205+
- Based on DistMult, a bilinear model that uses diagonal matrices for efficient relational modeling.
206+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L41-L42>`_
207+
* - ``ComplExAligner``
208+
- Based on ComplEx, which uses complex-valued embeddings to model symmetric and antisymmetric relations; includes a custom similarity function using real parts of complex dot products.
209+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L45-L49>`_
210+
* - ``HolEAligner``
211+
- Based on HolE, which combines compositional and holographic representations using circular correlation.
212+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L51-L52>`_
213+
* - ``RotatEAligner``
214+
- Based on RotatE, which models relations as rotations in complex space and supports rich relational patterns; includes a similarity override.
215+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L55-L60>`_
216+
* - ``SimplEAligner``
217+
- Based on SimplE, which learns dependent embeddings for each entity and supports fully expressive factorization.
218+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L62-L63>`_
219+
* - ``CrossEAligner``
220+
- Based on CrossE, which learns both general and triple-specific embeddings to capture bidirectional interactions.
221+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L66-L67>`_
222+
* - ``BoxEAligner``
223+
- Based on BoxE, which models relations as boxes in vector space to support hierarchies and logical rules.
224+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L70-L71>`_
225+
* - ``CompGCNAligner``
226+
- Based on CompGCN, a graph convolutional network designed for multi-relational graphs using composition operations.
227+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L74-L75>`_
228+
* - ``MuREAligner``
229+
- Based on MuRE, which embeds entities in hyperbolic space to better model hierarchies and relation-specific transformations.
230+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L78-L79>`_
231+
* - ``QuatEAligner``
232+
- Based on QuatE, which uses quaternion embeddings and custom similarity logic to model expressive 4D rotations and relational structure.
233+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L82-L133>`_
234+
* - ``SEAligner``
235+
- Based on SE, a neural model that embeds symbolic knowledge into vector space using learned neural transformations.
236+
- `Source <https://github.com/sciknoworg/OntoAligner/blob/main/ontoaligner/aligner/kge/models.py#L134-L135>`_
237+
238+
To use KGE aligner based technique:
239+
240+
.. code-block:: python
241+
242+
from ontoaligner.aligner import TransEAligner
243+
244+
aligner = TransEAligner()
245+
246+
matchings = aligner.generate(input_data=...)
247+
248+
If the desired model is not avaliable in OntoAligner, then:
249+
250+
.. code-block:: python
251+
252+
from ontoaligner.aligner.graph import GraphEmbeddingAligner
253+
254+
class CustomKGEAligner(GraphEmbeddingAligner):
255+
model = "RESCAL"
256+
257+
aligner = CustomKGEAligner()
258+
matchings = aligner.generate(input_data=...)
259+
260+
261+
Here ``RESCAL`` is our custom KGE model.
262+
263+
.. note::
264+
265+
For possible models please take a look at `PyKEEN > Models <https://pykeen.readthedocs.io/en/latest/reference/models.html#classes>`_.

0 commit comments

Comments
 (0)