You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -36,14 +39,52 @@ As with word error rate, a score of zero indicates perfect performance and
36
39
higher scores (which may exceed 100) indicate poorer performance. For more
37
40
details, consult section 6.1 of the [NIST RT-09 evaluation plan](https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf).
38
41
42
+
Jaccard error rate
43
+
------------------
44
+
We also report Jaccard error rate (JER), a metric introduced for [DIHARD II](https://coml.lscp.ens.fr/dihard/index.html) that is based on the [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index). The Jaccard index is a similarity
45
+
measure typically used to evaluate the output of image segmentation systems and
46
+
is defined as the ratio between the intersection and union of two segmentations.
47
+
To compute Jaccard error rate, an optimal mapping between reference and system
48
+
speakers is determined and for each pair the Jaccard index of their
49
+
segmentations is computed. The Jaccard error rate is then 1 minus the average
50
+
of these scores.
51
+
52
+
More concretely, assume we have ``N`` reference speakers and ``M`` system
53
+
speakers. An optimal mapping between speakers is determined using the
54
+
Hungarian algorithm so that each reference speaker is paired with at most one
55
+
system speaker and each system speaker with at most one reference speaker. Then,
56
+
for each reference speaker ``ref`` the speaker-specific Jaccard error rate is
57
+
``(FA + MISS)/TOTAL``, where:
58
+
59
+
-``TOTAL`` is the duration of the union of reference and system speaker
60
+
segments; if the reference speaker was not paired with a system speaker, it is
61
+
the duration of all reference speaker segments
62
+
-``FA`` is the total system speaker time not attributed to the reference
63
+
speaker; if the reference speaker was not paired with a system speaker, it is
64
+
0
65
+
-``MISS`` is the total reference speaker time not attributed to the system
66
+
speaker; if the reference speaker was not paired with a system speaker, it is
67
+
equal to ``TOTAL``
68
+
69
+
The Jaccard error rate then is the average of the speaker specific Jaccard error
70
+
rates.
71
+
72
+
JER and DER are highly correlated with JER typically being higher, especially in
73
+
recordings where one or more speakers is particularly dominant. Where it tends
74
+
to track DER is in outliers where the diarization is especially bad, resulting
75
+
in one or more unmapped system speakers whose speech is not then penalized. In
76
+
these cases, where DER can easily exceed 500%, JER will never exceed 100% and
77
+
may be far lower if the reference speakers are handled correctly. For this
78
+
reason, it may be useful to pair JER with another metric evaluating speech
79
+
detection and/or speaker overlap detection.
39
80
40
81
Clustering metrics
41
82
---------------------------------
42
-
An alternate approach to system evaluation is convert both the reference and
43
-
system outputs to frame-level labels, then evaluate using one of many
44
-
well-known approaches for evaluating clustering performance. Each recording
45
-
is converted to a sequence of 10 ms frames, each of which is assigned a single
46
-
label corresponding to one of the following cases:
83
+
A third approach to system evaluation is convert both the reference and system
84
+
outputs to frame-level labels, then evaluate using one of many well-known
85
+
approaches for evaluating clustering performance. Each recording is converted to
86
+
a sequence of 10 ms frames, each of which is assigned a single label
87
+
corresponding to one of the following cases:
47
88
48
89
- the frame contains no speech
49
90
- the frame contains speech from a single speaker (one label per speaker
@@ -56,7 +97,7 @@ These frame-level labelings are then scored with the following metrics:
56
97
### Goodman-Kruskal tau
57
98
Goodman-Kruskal tau is an asymmetric association measure dating back to work
58
99
by Leo Goodman and William Kruskal in the 1950s (Goodman and Kruskal, 1954).
59
-
For a reference labeling ``ref`` and a system labeling ``ref``,
100
+
For a reference labeling ``ref`` and a system labeling ``sys``,
60
101
``GKT(ref, sys)`` corresponds to the fraction of variability in ``sys`` that
61
102
can be explained by ``ref``. Consequently, ``GKT(ref, sys)`` is 1 when ``ref``
62
103
is perfectly predictive of ``sys`` and 0 when it is not predictive at all.
0 commit comments