|
| 1 | +Metrics |
| 2 | +================= |
| 3 | + |
| 4 | +.. sidebar:: Metric Space: |
| 5 | + |
| 6 | + There are a dedicated Hugging Face space for `OntoLearner Benchmark Metrics <https://huggingface.co/spaces/SciKnowOrg/OntoLearner-Benchmark-Metrics>`_ with analysis and live plots. |
| 7 | + |
| 8 | +The ``Analyzer`` class in OntoLearner provides a unified interface for computing **ontology metrics**, which can be divided into two main categories: **Topology Metrics** (capture the structural characteristics of the ontology graph) and **Dataset Metrics** (assess the quality and distribution of the extracted learning datasets). Additionally, a **complexity score** can be derived from these metrics to summarize the overall ontology richness and complexity. |
| 9 | + |
| 10 | +Topology Metrics |
| 11 | +---------------- |
| 12 | +Topology metrics describe the structure and organization of an ontology. The ``Analyzer`` computes the following key metrics: |
| 13 | + |
| 14 | +- **Total nodes** (``total_nodes``): Total number of nodes in the ontology graph. |
| 15 | +- **Total edges** (``total_edges``): Total number of edges representing relations between nodes. |
| 16 | +- **Root nodes** (``num_root_nodes``): Nodes with no incoming edges, representing top-level concepts. |
| 17 | +- **Leaf nodes** (``num_leaf_nodes``): Nodes with no outgoing edges, representing bottom-level concepts. |
| 18 | +- **Classes** (``num_classes``): Number of distinct ontology classes. |
| 19 | +- **Properties** (``num_properties``): Number of distinct properties (object or datatype properties). |
| 20 | +- **Individuals** (``num_individuals``): Number of instances associated with classes. |
| 21 | +- **Depth metrics**: |
| 22 | + |
| 23 | + - ``max_depth``: Maximum hierarchical depth in the ontology. |
| 24 | + - ``min_depth``: Minimum hierarchical depth. |
| 25 | + - ``avg_depth``: Average hierarchical depth across all nodes. |
| 26 | + - ``depth_variance``: Variance of depth distribution. |
| 27 | + |
| 28 | +- **Breadth metrics**: |
| 29 | + |
| 30 | + - ``max_breadth``: Maximum number of nodes at any single hierarchy level. |
| 31 | + - ``min_breadth``: Minimum number of nodes at any hierarchy level. |
| 32 | + - ``avg_breadth``: Average number of nodes per hierarchy level. |
| 33 | + - ``breadth_variance``: Variance of breadth distribution. |
| 34 | + |
| 35 | +Dataset Metrics |
| 36 | +--------------- |
| 37 | + |
| 38 | +Dataset metrics evaluate the characteristics of machine-learning datasets extracted from the ontology. These metrics include: |
| 39 | + |
| 40 | +- **Number of term-type mappings** (``num_term_types``): Number of terms associated with types. |
| 41 | +- **Number of taxonomic (is-a) relations** (``num_taxonomic_relations``): Count of hierarchical relations. |
| 42 | +- **Number of non-taxonomic relations** (``num_non_taxonomic_relations``): Count of semantic associations not in the hierarchy. |
| 43 | +- **Average terms per type** (``avg_terms``): Measures dataset balance across classes. |
| 44 | + |
| 45 | + |
| 46 | +Complexity Score |
| 47 | +---------------- |
| 48 | + |
| 49 | +The **complexity score** combines topology and dataset metrics into a single normalized score in ``[0, 1]``. First, metrics are **log-normalized** and weighted by category: |
| 50 | + |
| 51 | +.. list-table:: |
| 52 | + :header-rows: 1 |
| 53 | + :widths: 25 50 25 |
| 54 | + |
| 55 | + * - Metric Category |
| 56 | + - Example Metrics |
| 57 | + - Weight |
| 58 | + * - Graph structure |
| 59 | + - ``total_nodes``, ``total_edges``, ``num_root_nodes``, ``num_leaf_nodes`` |
| 60 | + - 0.3 |
| 61 | + * - Knowledge coverage |
| 62 | + - ``num_classes``, ``num_properties``, ``num_individuals`` |
| 63 | + - 0.25 |
| 64 | + * - Hierarchy |
| 65 | + - ``max_depth``, ``min_depth``, ``avg_depth``, ``depth_variance`` |
| 66 | + - 0.10 |
| 67 | + * - Breadth |
| 68 | + - ``max_breadth``, ``min_breadth``, ``avg_breadth``, ``breadth_variance`` |
| 69 | + - 0.20 |
| 70 | + * - Dataset (LLMs4OL) |
| 71 | + - ``num_term_types``, ``num_taxonomic_relations``, ``num_non_taxonomic_relations``, ``avg_terms`` |
| 72 | + - 0.15 |
| 73 | + |
| 74 | + |
| 75 | +Next, the weighted sum of metrics is passed through a **logistic function** to normalize the final complexity score. |
| 76 | + |
| 77 | + |
| 78 | +Example Usage |
| 79 | +------------- |
| 80 | + |
| 81 | +Here is a simple example demonstrating how to compute metrics and complexity for an ontology: |
| 82 | + |
| 83 | +.. code-block:: python |
| 84 | +
|
| 85 | + from ontolearner.tools import Analyzer |
| 86 | + from ontolearner.ontology import Wine |
| 87 | +
|
| 88 | + # Step 1 — Load ontology |
| 89 | + ontology = Wine() |
| 90 | + ontology.build_graph() |
| 91 | +
|
| 92 | + # Step 2 — Create the analyzer |
| 93 | + analyzer = Analyzer() |
| 94 | +
|
| 95 | + # Step 3 — Compute topology and dataset metrics |
| 96 | + topology_metrics = analyzer.compute_topology_metrics(ontology) |
| 97 | + dataset_metrics = analyzer.compute_dataset_metrics(ontology) |
| 98 | +
|
| 99 | + # Step 4 — Compute overall complexity score |
| 100 | + complexity_score = analyzer.compute_complexity_score( |
| 101 | + topology_metrics=topology_metrics, |
| 102 | + dataset_metrics=dataset_metrics |
| 103 | + ) |
| 104 | + # Step 5 — Display results |
| 105 | + print("Topology Metrics:", topology_metrics) |
| 106 | + print("Dataset Metrics:", dataset_metrics) |
| 107 | + print("Ontology Complexity Score:", complexity_score) |
| 108 | +
|
| 109 | +
|
| 110 | +This workflow allows ontology engineers and researchers to **quantify structural quality, dataset richness, and overall complexity**, providing actionable insights for ontology evaluation, benchmarking, and improvement. |
0 commit comments