Skip to content

Commit 9d40db5

Browse files
authored
Add multi-label Classification Support #minor (#76)
1 parent 67d75af commit 9d40db5

27 files changed

+3516
-95
lines changed

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -326,13 +326,12 @@ Thumbs.db
326326

327327
### VisualStudioCode ###
328328
.vscode/*
329-
!.vscode/tasks.json
330-
!.vscode/launch.json
331329
*.code-workspace
332330

333331
### VisualStudioCode Patch ###
334332
# Ignore all local history of files
335333
.history
336334
.ionide
337335

336+
338337
# End of https://www.toptal.com/developers/gitignore/api/python,pycharm,pycharm+all,visualstudiocode

docs/examples/plot_binary_policies.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
]
6767

6868
# Use random forest classifiers for every node
69-
# And exclusive siblings policy to select training examples for binary classifiers.
69+
# And inclusive policy to select training examples for binary classifiers.
7070
rf = RandomForestClassifier()
7171
classifier = LocalClassifierPerNode(local_classifier=rf, binary_policy="inclusive")
7272

docs/examples/plot_multilabel.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
==============================================
4+
Using Hierarchical Multi-Label Classification
5+
==============================================
6+
7+
A simple example to show how to use multi-label classification in HiClass.
8+
Please have a look at Algorithms Overview Section for :ref:`Multi-Label Classification` for the motivation and background behind the implementation.
9+
"""
10+
import numpy as np
11+
from sklearn.tree import DecisionTreeClassifier
12+
from hiclass.MultiLabelLocalClassifierPerNode import MultiLabelLocalClassifierPerNode
13+
14+
# Define data
15+
X_train = [[1, 2], [3, 4], [5, 6]]
16+
X_test = [[1, 2], [3, 4], [5, 6]]
17+
18+
# Define labels
19+
Y_train = np.array(
20+
[
21+
[["Mammal", "Human"], ["Fish"]], # Mermaid
22+
[["Mammal", "Human"], ["Mammal", "Bovine"]], # Minotaur
23+
[["Mammal", "Human"]], # just a Human
24+
],
25+
dtype=object,
26+
)
27+
28+
# Use decision tree classifiers for every node
29+
tree = DecisionTreeClassifier()
30+
classifier = MultiLabelLocalClassifierPerNode(local_classifier=tree)
31+
32+
# Train local classifier per node
33+
classifier.fit(X_train, Y_train)
34+
35+
# Predict
36+
predictions = classifier.predict(X_test)
37+
print(predictions)
68.5 KB
Loading
483 KB
Loading
49.5 KB
Loading
78.9 KB
Loading
260 KB
Loading

docs/source/algorithms/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _algorithms:
2+
13
Algorithms Overview
24
===================
35

@@ -12,4 +14,5 @@ HiClass provides implementations for the most popular machine learning models fo
1214
local_classifier_per_node
1315
local_classifier_per_parent_node
1416
local_classifier_per_level
17+
multi_label
1518
metrics
Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
.. _hierarchical-multi-label-Classification-Overview:
2+
3+
==========================
4+
Multi-Label Classification
5+
==========================
6+
7+
HiClass supports hierarchical multi-label classification.
8+
This means a sample can belong to multiple classes at the same hierarchy level.
9+
10+
On this page, we motivate, explain, and demonstrate how hierarchical multi-label classification is implemented in HiClass.
11+
12+
++++++++++++++++++++++++++
13+
Motivation
14+
++++++++++++++++++++++++++
15+
In numerous hierarchical classification problems, it is possible for a sample to be associated with multiple classes at the same level of the hierarchy.
16+
This occurs when the classes are not mutually exclusive.
17+
For instance, let us consider a problem involving the classification of dog breeds, where we aim to determine a dog's breed based on available data.
18+
Without allowing for multiple paths through the dog breed hierarchy, we would have to assign a single label to each sample, which means we have to choose a single path through the hierarchy, assigning a dog to a single breed.
19+
However, this only sometimes reflects reality since a dog can be a mix of multiple breeds.
20+
For example, a dog can be a mix of a Dachshund and a Golden Retriever.
21+
In such a scenario, we aim to assign both the Dachshund and Golden Retriever labels to the sample, which requires at least two paths through the hierarchy.
22+
The following figure illustrates this example.
23+
24+
.. _example_dog_breed_hierarchy:
25+
26+
.. figure:: ../algorithms/hc_dog_breed_hierarchy.png
27+
:align: center
28+
:width: 80%
29+
30+
An example image of a dog that is a mix of a Dachshund and a Golden Retriever, thereby requiring multiple paths through the hierarchy for correct classification.
31+
32+
Another multi-label classification example is document classification, in which we aim to classify a document based on its content.
33+
The categories are often hierarchical in nature, such as classifying documents into broad topics like "Technology", "Sports", and "Politics", which further have subcategories like "Artificial Intelligence", "Football", and "International Relations".
34+
A document can belong to multiple categories, for example, a text that deals with the influence of advancements in AI on International Relations, which can only be correctly classified by multiple paths through the hierarchy.
35+
36+
++++++++++++++++++++++++++++++++++++++++
37+
Background - Classification Terminology
38+
++++++++++++++++++++++++++++++++++++++++
39+
To explain what we mean by hierarchical multi-label classification, we first need to define some terminology.
40+
41+
.. figure:: ../algorithms/hc_background.png
42+
:align: left
43+
:figwidth: 30%
44+
45+
The set of classification problems from most generic (multi-class) to most specific (hierarchical multi-label classification).
46+
47+
In a multi-class classification problem, a sample can be assigned to one class among several options.
48+
In a multi-label classification problem, a sample can be associated with multiple classes simultaneously.
49+
A hierarchical classification problem is a type of multi-label classification problem where classes are organized in a hierarchical structure represented as a graph, such as a tree or directed acyclic graph (DAG).
50+
In this graph, the nodes correspond to the classes to be predicted.
51+
If not specified, it is usually assumed that a sample can only belong to one class at each level of the hierarchy.
52+
This means a sample can only be associated with a single path through the hierarchy, starting from the root node and ending at a leaf node.
53+
In hierarchical multi-label classification, this restriction is lifted.
54+
A sample can belong to multiple classes at any level of the hierarchy, i.e., a sample can be classified by multiple paths through the hierarchy.
55+
56+
|
57+
|
58+
59+
++++++++++++++++++++++++++
60+
Design - Target Format
61+
++++++++++++++++++++++++++
62+
HiClass is designed to be compatible with the scikit-learn API.
63+
For the non-multi-label hierarchical classification case, the target array follows the sklearn format for a multi-label classification problem.
64+
However, since there is no sklearn specific multi-label hierarchical format, HiClass implements its own format extension.
65+
The HiClass target format extends the non-multi-label hierarchical classification format by adding a new dimension to the 2-dimensional array, which captures the different paths through the hierarchy.
66+
67+
.. figure:: ../algorithms/hc_format.png
68+
:align: center
69+
:width: 80%
70+
71+
HiClass hierarchical multi-label classification format extension for samples classified by the dog breed hierarchy.
72+
73+
This is implemented as a nested list of lists, in which the last dimension specifies a path through the hierarchy.
74+
75+
.. code-block:: python
76+
77+
y = [
78+
[["Retriever", "Golden Retriever"], ["Hound", "Dachshund"]], # sample 1
79+
[["Hound", "Beagle"]] # sample 2
80+
]
81+
82+
Important to note here is that we specify the whole list of nodes from the root to the most specific nodes for each path.
83+
Even in cases where only the leaf nodes are different, we still need to specify the whole path.
84+
For example, if sample 1 belonged to the Labrador class instead of the Dachshund class, we still need to specify the whole path from the root to the Golden Retriever and Labrador nodes, which would be :code:`[["Retriever", "Golden Retriever"], ["Retriever", "Labrador"]]`.
85+
This is a consequence of using Numpy arrays for the implementation which require fixed dimensions for the target array.
86+
Furthermore, by explicitly specifying the whole path from the root to the leaf node, the target format is readable and easy to comprehend and also works well for hierarchies that are not trees but DAGs.
87+
88+
89+
++++++++++++++++++++++++++
90+
Fitting the Classifiers
91+
++++++++++++++++++++++++++
92+
In this section, we outline how fitting of the local classifiers is implemented in HiClass for hierarchical multi-label classification.
93+
Here, we only focus on the hierarchical multi-label classification case for the :class:`hiclass.MultiLabelLocalClassifierPerNode` and :class:`hiclass.MultiLabelLocalClassifierPerParentNode` classifiers.
94+
For a recap on how the strategies work, visit the :ref:`Algorithms<algorithms>` section.
95+
96+
97+
.. _hierarchical-multi-label-local-classifier-per-node:
98+
99+
Local Classifier Per Node
100+
---------------------------
101+
The :class:`hiclass.MultiLabelLocalClassifierPerNode` strategy fits a binary local classifier for each node in the hierarchy.
102+
:class:`hiclass.BinaryPolicy` defines which samples belong to the positive and which ones to the negative class for a given local classifier.
103+
HiClass implements that positive and negative samples for a local classifier are mutually exclusive, i.e., a sample can only belong to a local classifier's positive or negative class.
104+
In the hierarchical multi-label case, a sample belongs to the positive class if it belongs to any of the paths through the hierarchy that are associated with the local classifier.
105+
106+
For instance, the :ref:`example image <example_dog_breed_hierarchy>` is assigned to the positive class for the Retriever classifier since it belongs to the Golden Retriever class, which is a child of the Retriever node.
107+
It is also assigned to the positive class for the Hound classifier since it does not belong to the Dachshund class, which is a child of the Hound node.
108+
109+
110+
.. _hierarchical-multi-label-local-classifier-per-parent-node:
111+
112+
Local Classifier Per Parent Node
113+
---------------------------------
114+
The :class:`hiclass.MultiLabelLocalClassifierPerParentNode` trains a multi-class classifier for each non-leaf/parent node, i.e., a node with children in the hierarchy.
115+
The classes to be predicted are the children of the node.
116+
For the multi-label case, a sample can belong to multiple children of a node.
117+
Internally, this is implemented by duplicating the sample and assigning each duplicate to one of the node's children.
118+
The classifier does not need to support the sklearn multi-label format and can be a standard sklearn classifier.
119+
120+
++++++++++++++++++++++++++
121+
Prediction
122+
++++++++++++++++++++++++++
123+
So far, we have only discussed the fitting of the classifiers; in this section, we outline how the prediction is implemented in HiClass for multiple paths.
124+
HiClass follows a top-down prediction strategy in which a data sample is classified by nodes in the hierarchy, starting from the root and going down to the leaf nodes.
125+
In the single path case, the data sample is assigned the label with the highest probability at each level.
126+
This leads to only a single path through the hierarchy for each data sample.
127+
128+
.. figure:: ../algorithms/hc_prediction.png
129+
:align: center
130+
:width: 80%
131+
132+
Predicting the labels for a sample using the top-down prediction strategy. Numeric values in red are the predicted probabilities for each node.
133+
134+
In the example given above, the sample would be assigned the label :code:`["Retriever", "Golden Retriever"]`, since this is the path with the highest probability starting at the root node.
135+
In contrast, when we want to allow for multiple paths through the hierarchy, we need to specify a criterion different from taking the highest probability to assign labels to data samples.
136+
HiClass implements two strategies for this: Threshold and Tolerance.
137+
138+
Threshold
139+
-------------------------
140+
The Threshold strategy assigns a label to a data sample if the probability of the label is above a given threshold.
141+
The threshold :math:`\lambda \in [0, 1]` is a parameter passed to the predict function and specifies an absolute probability value.
142+
143+
.. math::
144+
Predictions(Node) = \{c \in Children(Node): \mathbb{P}(c) \geq \lambda\}
145+
146+
In the example given above, if we set :math:`\lambda = 0.6`, we would assign the label :code:`[["Retriever", "Golden Retriever"], ["Hound", "Dachshund"]]` to the sample since the probabilities of the assigned nodes are greater than 0.6.
147+
While this strategy is simple to implement and understand, it has the disadvantage that it is impossible to specify a different threshold for each node in the hierarchy, requiring a global threshold for all nodes.
148+
Furthermore, with the top-down prediction strategy, if the predicted probability is below the threshold for a node, the prediction stops regardless of the probabilities of the nodes further down the hierarchy.
149+
For example, if :math:`\lambda = 0.85`, no label is assigned to the sample since the probabilities for the Retriever and Hound class are below the threshold value and traversing the hierarchy stops.
150+
151+
Tolerance
152+
-------------------------
153+
The Tolerance strategy mitigates the problem that arises from the absolute probability value in the Threshold strategy by assigning a label to a data sample if the probability is within a given tolerance of the highest probability for neighboring nodes.
154+
The tolerance :math:`\gamma \in [0, 1]` is a parameter that is passed to the predict function and specifies a relative probability value.
155+
156+
.. math::
157+
Predictions(Node) = \{ c \in Children(Node): \mathbb{P}(c) ≥ max( \mathbb{P}(children) ) - \gamma \}
158+
159+
160+
This strategy has the advantage of always predicting at least one class at each level since the tolerance is relative to the highest probability.
161+
For example, with :math:`\gamma = 0.3` we would predict the labels :code:`[["Retriever", "Golden Retriever"], ["Hound", "Dachshund"], ["Hound", "Beagle"]]`.
162+
Note that the Beagle label is assigned in the second level because its probability of 0.5 is within the threshold of 0.3 of the highest probability of 0.8 (Dachshund class) of a neighboring node.
163+
164+
165+
.. _hierarchical-multi-label-metrics:
166+
167+
++++++++++++++++++++++++++
168+
Metrics
169+
++++++++++++++++++++++++++
170+
We extend the hierarchical precision, recall, and F-Score metrics to evaluate the performance of the hierarchical multi-label classifiers.
171+
The hierarchical precision, recall, and F-Score are defined as follows and are also defined in :ref:`Metrics <metrics-overview>`.
172+
173+
Here, we give an example of the hierarchical precision and recall for the multi-label case.
174+
175+
.. figure:: ../algorithms/hc_metrics.png
176+
:align: center
177+
:width: 100%
178+
179+
Note that we can define micro and macro averages when calculating the hierarchical precision and recall for multiple samples.
180+
The micro-precision/recall of all predictions are considered together, regardless of the sample.
181+
In contrast, in the macro precision/recall, we first calculate a sample's hierarchical precision/recall and then aggregate the results.
182+
Since samples can have differing numbers of labels assigned to them, micro and macro averages can lead to different results.
183+
184+
185+
++++++++++++++++++++++++++++++++++++++++
186+
Code example - Putting it all together
187+
++++++++++++++++++++++++++++++++++++++++
188+
.. rst-class:: sphx-glr-script-out
189+
190+
Out:
191+
192+
.. code-block:: none
193+
194+
[[['Retriever' 'Golden Retriever']
195+
['Hound' 'Dachshund']]
196+
197+
[['Retriever' 'Golden Retriever']
198+
['' '']]
199+
200+
[['Hound' 'Dachshund']
201+
['Hound' 'Beagle']]]
202+
203+
204+
205+
206+
207+
208+
|
209+
210+
.. code-block:: default
211+
212+
213+
from sklearn.tree import DecisionTreeClassifier
214+
215+
from hiclass.MultiLabelLocalClassifierPerNode import MultiLabelLocalClassifierPerNode
216+
217+
# Define data
218+
X_train = [[1, 2], [3, 4], [5, 6]]
219+
X_test = [[1, 2], [3, 4], [5, 6]]
220+
221+
# Define Labels
222+
Y_train = np.array([
223+
[["Retriever", "Golden Retriever"], ["Hound", "Dachshund"]],
224+
[["Retriever", "Labrador"]],
225+
[["Hound", "Dachshund"], ["Hound", "Beagle"]],
226+
], dtype=object)
227+
228+
# Use decision tree classifiers for every node
229+
tree = DecisionTreeClassifier()
230+
classifier = MultiLabelLocalClassifierPerNode(local_classifier=tree)
231+
232+
# Train local classifier per node
233+
classifier.fit(X_train, Y_train)
234+
235+
# Predict
236+
predictions = classifier.predict(X_test)
237+
print(predictions)
238+
239+
240+
.. rst-class:: sphx-glr-timing
241+
242+
**Total running time of the script:** ( 0 minutes 0.047 seconds)

0 commit comments

Comments
 (0)