Skip to content

Commit b507eea

Browse files
committed
Added more detailed docs about interest measures
1 parent 498fc38 commit b507eea

File tree

2 files changed

+155
-18
lines changed

2 files changed

+155
-18
lines changed

docs/getting_started.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,31 @@ The same example as above, using :class:`~niaarm.mine.get_rules`:
193193
Run Time: 6.9498 seconds
194194
Rules exported to output.csv
195195
196+
Interest Measures
197+
-----------------
198+
199+
The framework currently implements the following interest measures (metrics):
200+
201+
- Support
202+
- Confidence
203+
- Lift [#fn]_
204+
- Coverage
205+
- RHS Support
206+
- Conviction [#fn]_
207+
- Inclusion
208+
- Amplitude
209+
- Interestingness
210+
- Comprehensibility
211+
- Netconf [#fn]_
212+
- Yule's Q [#fn]_
213+
214+
More information about these interest measures can be found in the API reference
215+
of the :class:`~niaarm.rule.Rule` class.
216+
217+
.. rubric:: Footnotes
218+
219+
.. [#fn] Not available as fitness metrics.
220+
196221
197222
Examples
198223
--------

niaarm/rule.py

Lines changed: 130 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,30 +9,142 @@ class Rule:
99
Args:
1010
antecedent (list[Feature]): A list of antecedents of the association rule.
1111
consequent (list[Feature]): A list of consequents of the association rule.
12-
fitness (Optional[float]): Value of the fitness function.
12+
fitness (Optional[float]): Fitness value of the association rule.
1313
transactions (Optional[pandas.DataFrame]): Transactional database.
1414
1515
Attributes:
16-
cls.metrics (tuple[str]): List of all available metrics.
17-
support (float): Support of the rule i.e. proportion of transactions containing
18-
both the antecedent and the consequent.
19-
confidence (float): Confidence of the rule, defined as the proportion of transactions that contain
20-
the consequent in the set of transactions that contain the antecedent.
21-
lift (float): Lift of the rule. Lift measures how many times more often the antecedent and the consequent Y
16+
cls.metrics (tuple[str]): List of all available interest measures.
17+
support: Support is defined on an itemset as the proportion of transactions that contain the attribute :math:`X`.
18+
19+
:math:`supp(X) = \frac{n_{X}}{|D|},`
20+
21+
where :math:`|D|` is the number of records in the transactional database.
22+
23+
For an association rule, support is defined as the support of all the attributes in the rule.
24+
25+
:math:`supp(X \implies Y) = \frac{n_{XY}}{|D|}`
26+
27+
**Range:** :math:`[0, 1]`
28+
29+
**Reference:** Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules,
30+
2015, URL: https://mhahsler.github.io/arules/docs/measures
31+
confidence: Confidence of the rule, defined as the proportion of transactions that contain
32+
the consequent in the set of transactions that contain the antecedent. This proportion is an estimate
33+
of the probability of seeing the consequent, if the antecedent is present in the transaction.
34+
35+
:math:`conf(X \implies Y) = \frac{supp(X \implies Y)}{supp(X)}`
36+
37+
**Range:** :math:`[0, 1]`
38+
39+
**Reference:** Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules,
40+
2015, URL: https://mhahsler.github.io/arules/docs/measures
41+
lift: Lift measures how many times more often the antecedent and the consequent Y
2242
occur together than expected if they were statistically independent.
23-
coverage (float): Coverage of the rule, also known as antecedent support. It measures the probability that
24-
the rule applies to a randomly selected transaction.
25-
rhs_support (float): Support of the consequent.
26-
conviction (float): Conviction of the rule.
27-
inclusion (float): Inclusion of the rule is defined as the ratio between the number of attributes of the rule
28-
and all attributes in the dataset.
29-
amplitude (float): Amplitude of the rule.
30-
interestingness (float): Interestingness of the rule.
31-
comprehensibility (float): Comprehensibility of the rule.
32-
netconf (float): The netconf metric evaluates the interestingness of
43+
44+
:math:`lift(X \implies Y) = \frac{conf(X \implies Y)}{supp(Y)}`
45+
46+
**Range:** :math:`[0, \infty]` (1 means independence)
47+
48+
**Reference:** Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules,
49+
2015, URL: https://mhahsler.github.io/arules/docs/measures
50+
coverage: Coverage, also known as antecedent support, is an estimate of the probability that
51+
the rule applies to a randomly selected transaction. It is the proportion of transactions
52+
that contain the antecedent.
53+
54+
:math:`cover(X \implies Y) = supp(X)`
55+
56+
**Range:** :math:`[0, 1]`
57+
58+
**Reference:** Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules,
59+
2015, URL: https://mhahsler.github.io/arules/docs/measures
60+
rhs_support: Support of the consequent.
61+
62+
:math:`RHSsupp(X \implies Y) = supp(Y)`
63+
64+
**Range:** :math:`[0, 1]`
65+
66+
**Reference:** Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules,
67+
2015, URL: https://mhahsler.github.io/arules/docs/measures
68+
conviction: Conviction can be interpreted as the ratio of the expected frequency that the antecedent occurs without
69+
the consequent.
70+
71+
:math:`conv(X \implies Y) = \frac{1 - supp(Y)}{1 - conf(X \implies Y)}`
72+
73+
**Range:** :math:`[0, \infty]` (1 means independence, :math:`\infty` means the rule always holds)
74+
75+
**Reference:** Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules,
76+
2015, URL: https://mhahsler.github.io/arules/docs/measures
77+
inclusion: Inclusion is defined as the ratio between the number of attributes of the rule
78+
and all attributes in the database.
79+
80+
:math:`inclusion(X \implies Y) = \frac{|X \cup Y|}{m},`
81+
82+
where :math:`m` is the total number of attributes in the transactional database.
83+
84+
85+
**Range:** :math:`[0, 1]`
86+
87+
**Reference:** I. Fister Jr., V. Podgorelec, I. Fister. Improved Nature-Inspired Algorithms for Numeric Association
88+
Rule Mining. In: Vasant P., Zelinka I., Weber GW. (eds) Intelligent Computing and Optimization. ICO 2020. Advances in
89+
Intelligent Systems and Computing, vol 1324. Springer, Cham.
90+
amplitude: Amplitude measures the quality of a rule, preferring attributes with smaller intervals.
91+
92+
:math:`ampl(X \implies Y) = 1 - \frac{1}{n}\sum_{k = 1}^{n}{\frac{Ub_k - Lb_k}{max(o_k) - min(o_k)}},`
93+
94+
where :math:`n` is the total number of attributes in the rule, :math:`Ub_k` and :math:`Lb_k` are upper and lower
95+
bounds of the selected attribute, and :math:`max(o_k)` and :math:`min(o_k)` are the maximum and minimum
96+
feasible values of the attribute :math:`o_k` in the transactional database.
97+
98+
**Range:** :math:`[0, 1]`
99+
100+
**Reference:** I. Fister Jr., I. Fister A brief overview of swarm intelligence-based algorithms for numerical
101+
association rule mining. arXiv preprint arXiv:2010.15524 (2020).
102+
interestingness: Interestingness of the rule, defined as:
103+
104+
:math:`interest(X \implies Y) = \frac{supp(X \implies Y)}{supp(X)} \cdot \frac{supp(X \implies Y)}{supp(Y)}
105+
\cdot (1 - \frac{supp(X \implies Y)}{|D|})`
106+
107+
Here, the first part gives us the probability of generating the rule based on the antecedent, the second part
108+
gives us the probability of generating the rule based on the consequent and the third part is the probability
109+
that the rule won't be generated. Thus, rules with very high support will be deemed uninteresting.
110+
111+
**Range:** :math:`[0, 1]`
112+
113+
**Reference:** I. Fister Jr., I. Fister A brief overview of swarm intelligence-based algorithms for numerical
114+
association rule mining. arXiv preprint arXiv:2010.15524 (2020).
115+
comprehensibility: Comprehensibility of the rule. Rules with fewer attributes in the consequent are more
116+
comprehensible.
117+
118+
:math:`comp(X \implies Y) = \frac{log(1 + |Y|)}{log(1 + |X \cup Y|)}`
119+
120+
**Range:** :math:`[0, 1]`
121+
122+
**Reference:** I. Fister Jr., I. Fister A brief overview of swarm intelligence-based algorithms for numerical
123+
association rule mining. arXiv preprint arXiv:2010.15524 (2020).
124+
netconf: The netconf metric evaluates the interestingness of
33125
association rules depending on the support of the rule and the
34126
support of the antecedent and consequent of the rule.
35-
yulesq (float): Yule's Q metric.
127+
128+
:math:`netconf(X \implies Y) = \frac{supp(X \implies Y) - supp(X)supp(Y)}{supp(X)(1 - supp(X))}`
129+
130+
**Range:** :math:`[-1, 1]` (Negative values represent negative dependence, positive values represent positive
131+
dependence and 0 represents independence)
132+
133+
**Reference:** E. V. Altay and B. Alatas, "Sensitivity Analysis of MODENAR Method for Mining of Numeric Association
134+
Rules," 2019 1st International Informatics and Software Engineering Conference (UBMYK), 2019, pp. 1-6,
135+
doi: 10.1109/UBMYK48245.2019.8965539.
136+
yulesq: The Yule's Q metric represents the correlation between two possibly related dichotomous events.
137+
138+
:math:`yulesq(X \implies Y) =
139+
\frac{supp(X \implies Y)supp(\neg X \implies \neg Y) - supp(X \implies \neg Y)supp(\neg X \implies Y)}
140+
{supp(X \implies Y)supp(\neg X \implies \neg Y) + supp(X \implies \neg Y)supp(\neg X \implies Y)}`
141+
142+
**Range:** :math:`[-1, 1]` (-1 reflects total negative association, 1 reflects perfect positive association
143+
and 0 reflects independence)
144+
145+
**Reference:** E. V. Altay and B. Alatas, "Sensitivity Analysis of MODENAR Method for Mining of Numeric Association
146+
Rules," 2019 1st International Informatics and Software Engineering Conference (UBMYK), 2019, pp. 1-6,
147+
doi: 10.1109/UBMYK48245.2019.8965539.
36148
37149
"""
38150

0 commit comments

Comments
 (0)