Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
## Changelog

### v1.4.10 (December 8, 2025)
- add complexity score
- add documentation for metrics
- bug fixes in Ontologizer

### v1.4.9 (December 8, 2025)
- add retriever collection
- add documentation for retrievers
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,5 @@ keywords:
- Large Language Models
- Text-to-ontology
license: MIT
version: 1.4.9
version: 1.4.10
date-released: '2025'
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ or GitHub repository:
ontologizer/ontology_hosting
ontologizer/new_ontologies
ontologizer/metadata
ontologizer/metrics

.. toctree::
:maxdepth: 1
Expand Down
110 changes: 110 additions & 0 deletions docs/source/ontologizer/metrics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
Metrics
=================

.. sidebar:: Metric Space:

There are a dedicated Hugging Face space for `OntoLearner Benchmark Metrics <https://huggingface.co/spaces/SciKnowOrg/OntoLearner-Benchmark-Metrics>`_ with analysis and live plots.

The ``Analyzer`` class in OntoLearner provides a unified interface for computing **ontology metrics**, which can be divided into two main categories: **Topology Metrics** (capture the structural characteristics of the ontology graph) and **Dataset Metrics** (assess the quality and distribution of the extracted learning datasets). Additionally, a **complexity score** can be derived from these metrics to summarize the overall ontology richness and complexity.

Topology Metrics
----------------
Topology metrics describe the structure and organization of an ontology. The ``Analyzer`` computes the following key metrics:

- **Total nodes** (``total_nodes``): Total number of nodes in the ontology graph.
- **Total edges** (``total_edges``): Total number of edges representing relations between nodes.
- **Root nodes** (``num_root_nodes``): Nodes with no incoming edges, representing top-level concepts.
- **Leaf nodes** (``num_leaf_nodes``): Nodes with no outgoing edges, representing bottom-level concepts.
- **Classes** (``num_classes``): Number of distinct ontology classes.
- **Properties** (``num_properties``): Number of distinct properties (object or datatype properties).
- **Individuals** (``num_individuals``): Number of instances associated with classes.
- **Depth metrics**:

- ``max_depth``: Maximum hierarchical depth in the ontology.
- ``min_depth``: Minimum hierarchical depth.
- ``avg_depth``: Average hierarchical depth across all nodes.
- ``depth_variance``: Variance of depth distribution.

- **Breadth metrics**:

- ``max_breadth``: Maximum number of nodes at any single hierarchy level.
- ``min_breadth``: Minimum number of nodes at any hierarchy level.
- ``avg_breadth``: Average number of nodes per hierarchy level.
- ``breadth_variance``: Variance of breadth distribution.

Dataset Metrics
---------------

Dataset metrics evaluate the characteristics of machine-learning datasets extracted from the ontology. These metrics include:

- **Number of term-type mappings** (``num_term_types``): Number of terms associated with types.
- **Number of taxonomic (is-a) relations** (``num_taxonomic_relations``): Count of hierarchical relations.
- **Number of non-taxonomic relations** (``num_non_taxonomic_relations``): Count of semantic associations not in the hierarchy.
- **Average terms per type** (``avg_terms``): Measures dataset balance across classes.


Complexity Score
----------------

The **complexity score** combines topology and dataset metrics into a single normalized score in ``[0, 1]``. First, metrics are **log-normalized** and weighted by category:

.. list-table::
:header-rows: 1
:widths: 25 50 25

* - Metric Category
- Example Metrics
- Weight
* - Graph structure
- ``total_nodes``, ``total_edges``, ``num_root_nodes``, ``num_leaf_nodes``
- 0.3
* - Knowledge coverage
- ``num_classes``, ``num_properties``, ``num_individuals``
- 0.25
* - Hierarchy
- ``max_depth``, ``min_depth``, ``avg_depth``, ``depth_variance``
- 0.10
* - Breadth
- ``max_breadth``, ``min_breadth``, ``avg_breadth``, ``breadth_variance``
- 0.20
* - Dataset (LLMs4OL)
- ``num_term_types``, ``num_taxonomic_relations``, ``num_non_taxonomic_relations``, ``avg_terms``
- 0.15


Next, the weighted sum of metrics is passed through a **logistic function** to normalize the final complexity score.


Example Usage
-------------

Here is a simple example demonstrating how to compute metrics and complexity for an ontology:

.. code-block:: python

from ontolearner.tools import Analyzer
from ontolearner.ontology import Wine

# Step 1 — Load ontology
ontology = Wine()
ontology.build_graph()

# Step 2 — Create the analyzer
analyzer = Analyzer()

# Step 3 — Compute topology and dataset metrics
topology_metrics = analyzer.compute_topology_metrics(ontology)
dataset_metrics = analyzer.compute_dataset_metrics(ontology)

# Step 4 — Compute overall complexity score
complexity_score = analyzer.compute_complexity_score(
topology_metrics=topology_metrics,
dataset_metrics=dataset_metrics
)
# Step 5 — Display results
print("Topology Metrics:", topology_metrics)
print("Dataset Metrics:", dataset_metrics)
print("Ontology Complexity Score:", complexity_score)


This workflow allows ontology engineers and researchers to **quantify structural quality, dataset richness, and overall complexity**, providing actionable insights for ontology evaluation, benchmarking, and improvement.
24 changes: 24 additions & 0 deletions examples/complexity_score.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from ontolearner.tools import Analyzer
from ontolearner.ontology import Wine

# Step 1 — Load ontology
ontology = Wine()
ontology.build_graph()

# Step 2 — Create the analyzer
analyzer = Analyzer()

# Step 3 — Compute topology and dataset metrics
topology_metrics = analyzer.compute_topology_metrics(ontology)
dataset_metrics = analyzer.compute_dataset_metrics(ontology)

# Step 4 — Compute overall complexity score
complexity_score = analyzer.compute_complexity_score(
topology_metrics=topology_metrics,
dataset_metrics=dataset_metrics
)

# Step 5 — Display results
print("Topology Metrics:", topology_metrics)
print("Dataset Metrics:", dataset_metrics)
print("Ontology Complexity Score:", complexity_score)
2 changes: 1 addition & 1 deletion ontolearner/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.4.9
1.4.10
4 changes: 2 additions & 2 deletions ontolearner/base/ontology.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ def _update_metrics_space(self, metrics_file_path: Path, metrics: OntologyMetric
# Save updated metrics
df.to_excel(metrics_file_path, index=False)

def is_valid_label(label: str) -> Any:
def is_valid_label(self, label: str) -> Any:
invalids = ['root', 'thing']
if label.lower() in invalids:
return None
Expand Down Expand Up @@ -522,7 +522,7 @@ def check_if_class(self, entity):
return True
return False

def _is_anonymous_id(label: str) -> bool:
def _is_anonymous_id(self, label: str) -> bool:
"""Check if a label represents an anonymous class identifier."""
if not label:
return True
Expand Down
51 changes: 51 additions & 0 deletions ontolearner/tools/analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

import logging
import time
import numpy as np
from abc import ABC
from rdflib import RDF, RDFS, OWL
from collections import defaultdict
Expand Down Expand Up @@ -186,6 +187,56 @@ def compute_topology_metrics(ontology: BaseOntology) -> TopologyMetrics:

return metrics

@staticmethod
def compute_complexity_score(
topology_metrics: TopologyMetrics,
dataset_metrics: DatasetMetrics,
a: float = 0.4,
b: float = 6.0,
eps: float = 1e-12
) -> float:
"""
Compute a single normalized complexity score for an ontology.

This function combines structural topology metrics and dataset quality metrics
into a weighted aggregate score, then applies a logistic transformation to
normalize it to the range [0, 1]. The score reflects overall ontology complexity,
considering graph structure, hierarchy, breadth, coverage, and dataset richness.

Args:
topology_metrics (TopologyMetrics): Precomputed structural metrics of the ontology graph.
dataset_metrics (DatasetMetrics): Precomputed metrics of extracted learning datasets.
a (float, optional): Steepness parameter for the logistic normalization function. Default is 0.4.
b (float, optional): Centering parameter for the logistic function, should be tuned to match the scale of aggregated metrics. Default is 6.0.
eps (float, optional): Small epsilon to prevent numerical issues in logistic computation. Default is 1e-12.

Returns:
float: Normalized complexity score in [0, 1], where higher values indicate more complex ontologies.

Notes:
- Weights are assigned to different metric categories: graph metrics, coverage metrics, hierarchy metrics,
breadth metrics, and dataset metrics (term-types, taxonomic, non-taxonomic relations).
- Metrics are log-normalized before weighting to reduce scale differences.
- The logistic transformation ensures the final score is bounded and interpretable.
"""
# Define metric categories with their weights
metric_categories = {
0.3: ["total_nodes", "total_edges", "num_root_nodes", "num_leaf_nodes"],
0.25: ["num_classes", "num_properties", "num_individuals"],
0.10: ["max_depth", "min_depth", "avg_depth", "depth_variance"],
0.20: ["max_breadth", "min_breadth", "avg_breadth", "breadth_variance"],
0.15: ["num_term_types", "num_taxonomic_relations", "num_non_taxonomic_relations", "avg_terms"]
}
weights = {metric: weight for weight, metrics in metric_categories.items() for metric in metrics}
metrics = [metric for _, metric_list in metric_categories.items() for metric in metric_list]
onto_metrics = {**topology_metrics.__dict__, **dataset_metrics.__dict__}
norm_weighted_values = [np.log1p(onto_metrics[m]) * weights[m] for m in metrics if m in onto_metrics]
total_weight = sum(weights[m] for m in metrics if m in onto_metrics)
weighted_sum = sum(norm_weighted_values) / total_weight if total_weight > 0 else 0.0
complexity_score = 1.0 / (1.0 + np.exp(-a * (weighted_sum - b) + eps))
return complexity_score


@staticmethod
def compute_dataset_metrics(ontology: BaseOntology) -> DatasetMetrics:
"""
Expand Down