[ENH] ROC analysis: show thresholds#3172
Conversation
ales-erjavec
left a comment
There was a problem hiding this comment.
If I enable "Show convex ROC curves" and hover over the graph I get:
-------------------------- AttributeError Exception ---------------------------
Traceback (most recent call last):
File "/Users/aleserjavec/workspace/orange3/Orange/widgets/evaluate/owrocanalysis.py", line 614, in _on_mouse_moved
pts = sp.pointsAt(act_pos)
AttributeError: 'PlotCurveItem' object has no attribute 'pointsAt'
-------------------------------------------------------------------------------
| graphics = curve.avg_threshold() | ||
| self.plot.addItem(graphics.curve_item) | ||
| self.plot.addItem(graphics.confint_item) | ||
| graphics.curve_item.scene().sigMouseMoved.connect(self._on_mouse_moved) |
There was a problem hiding this comment.
Connect the signal once in the __init__ method:
self.plotview.scene().sigMouseMoved.connect(self._on_mouse_moved)
|
|
||
| thresh = curve_pts.thresholds[idx_closest] | ||
| if not numpy.isnan(thresh): | ||
| item = pg.TextItem(text="{:.3f}".format(thresh), color=(0, 0, 0)) |
There was a problem hiding this comment.
Why not QToolTip.showText?
There was a problem hiding this comment.
Mainly because I am new to QT and was just following a few resources online (I've changed it now).
I've fixed most of the things that you've mentioned and some other bugs.
Thank you for the helpful comments!
Codecov Report
@@ Coverage Diff @@
## master #3172 +/- ##
==========================================
+ Coverage 82.74% 82.76% +0.01%
==========================================
Files 344 344
Lines 59304 59397 +93
==========================================
+ Hits 49073 49158 +85
- Misses 10231 10239 +8 |
| def _on_mouse_moved(self, pos): | ||
| curves = [crv for crv in self.plot.curves if isinstance(crv, pg.PlotCurveItem)] | ||
| for i in range(len(self.selected_classifiers)): | ||
| sp = curves[i].childItems()[0] |
There was a problem hiding this comment.
This assumes implicitly that the order of the curves in the self.plot.curves corresponds to the self.selected_classifiers. This seems fragile.
You can use:
curves = [(clsf_idx, self.plot_curves(self.target_index, clsf_idx), self.curve_data(self.target_index, clsf_idx))
for clsf_idx in self.selected_classifiers] # type: List[Tuple[int, plot_curves, ROCData]]to retrieve triples of (classifier_index, plot_curves, curve_data). Use plot_curves similarly as curve_data in ROC averaging dispatch below, but use
plot_curves.merge().curve_item
plot_curves.avg_vertical().curve_item
plot_curves.avg_threshold().curve_itemto get the pg.PlotCurveItem instances
There was a problem hiding this comment.
My bad - should be fixed now. I've also fixed some other bugs:
- if mouse was hovered over a point on curve for some classifier and then the same classifier's curve would be hidden and mouse hovered over the same point as before, the cache would not reset and therefore an invalid threshold would be shown,
- if tooltip was not being shown and the cache point was "valid" (i. e. same point as previously shown), a tooltip with message "" would be created, which resulted in hiding a tooltip.
I have also encountered the following exception (just once), which I can not seem to be able to reproduce anymore:
--------------------------- RuntimeError Exception ----------------------------
Traceback (most recent call last):
File "/home/matej/.local/lib/python3.5/site-packages/pyqtgraph/GraphicsScene/GraphicsScene.py", line 162, in mouseMoveEvent
self.sendHoverEvents(ev)
File "/home/matej/.local/lib/python3.5/site-packages/pyqtgraph/GraphicsScene/GraphicsScene.py", line 228, in sendHoverEvents
items = self.itemsNearEvent(event, hoverable=True)
File "/home/matej/.local/lib/python3.5/site-packages/pyqtgraph/GraphicsScene/GraphicsScene.py", line 424, in itemsNearEvent
if hoverable and not hasattr(item, 'hoverEvent'):
RuntimeError: wrapped C/C++ object of type ScatterPlotItem has been deleted
When browsing through old issues, I've found out that there's been some instances of similar issues (#1316), though I can't say I completely understand your fix for this.
| # Find closest point on curve and display it | ||
| idx_closest = numpy.argmin( | ||
| [numpy.linalg.norm(mouse_pt - [curve_pts.fpr[idx], curve_pts.tpr[idx]]) | ||
| for idx in range(len(curve_pts.thresholds))]) |
There was a problem hiding this comment.
This loop could be vectorized
roc_points = numpy.column_stack((curve_pts.fpr, curve_pts.tpr))
query_pt = numpy.array([[mouse_pt.x(), mouse_pt.y()]])
diff = roc_points - query_pt
idx_closest = numpy.argmin(numpy.linalg.norm(diff, axis=1))(... although a slightly moot point as the sp.pointsAt call above already runs through all the points in a python loop)
There was a problem hiding this comment.
True - fixed.
I've also made another minor change: now QCursor.pos() is directly used everywhere instead of using it directly in some cases and in the other cases saving it and then using the variable (just a consistency fix).
I think the hit radius is sufficient as it is. However it might be better to collect points from all the curves under the mouse and display a list of all thresholds with classifier names. Right now if two or more curves are close to each other it is not clear to which does the shown threshold belong to.
I would use QTest.mouseMove targeting the QGraphicsView's viewport ( |
|
I've improved the tooltips so that it is clear, which classifier a threshold belongs to. Also, the tooltip now shows thresholds for all overlapping points. I'll see what I can do about the tests. Edit: I'm having a bit of trouble creating the tests - so far this is what I've got: # setup
data_in = Orange.data.Table("titanic")
res = Orange.evaluation.TestOnTrainingData(
data=data_in,
learners=[Orange.classification.KNNLearner(),
Orange.classification.LogisticRegressionLearner()],
store_data=True
)
self.send_signal(self.widget.Inputs.evaluation_results, res)
self.widget.roc_averaging = OWROCAnalysis.Merge
self.widget.target_index = 0
self.widget.selected_classifiers = [0, 1]
self.widget._replot()
# curve data for classifier 0
curve = self.widget.plot_curves(self.widget.target_index, 0)
curve_merge = curve.merge()
QTest.mouseMove(self.widget.plotview.viewport(), curve_merge.curve_item.mapToScene(0, 0))
# example assertion
self.assertTrue(QToolTip.isVisible())The |
Maybe QTBUG-5232. I swear I saw this when writing https://github.com/biolab/orange3/pull/3014/files#diff-da80849ceff100f955c9ec05920ba089R138. If I use that function and modify your test like this: ...
# curve data for classifier 0
curve = self.widget.plot_curves(self.widget.target_index, 0)
curve_merge = curve.merge()
view = self.widget.plotview
item = curve_merge.curve_item # type: pg.PlotCurveItem
xs, ys = item.getData()
x, y = xs[len(xs) // 2], ys[len(ys) // 2]
pos = item.mapToScene(x, y)
pos = view.mapFromScene(pos)
mouseMove(view.viewport(), pos)
# example assertion
self.assertTrue(QToolTip.isVisible())then the test passes. Can you rebase on current master and use the mouseMove function from |
* fix crash when 'Show convex ROC curves' is enabled * register mouse event only once * use QToolTip instead of TextItem for displaying thresholds * fix newly caused pylint warning
* removed dangerous retrieval of ROC curve data * changed tooltip caching so that it works smoother * fixed bug where cache value was not correct and would not reset properly
…inor consistency fix
|
Thank you, that does fix the issue. I've stumbled upon another issue though - the points which I'm trying to move the mouse to are not being mapped correctly. For example, if I try to simulate a mouse move to (x=0.0, y=0.0) and (x=1.0, y=1.0), it produces the same threshold/tooltip. I've noticed that these two points get mapped to very similar positions inside the test ( |
|
Try adding somewhere before mapping the points. |
7438477 to
099ee74
Compare
|
Thanks once again. 😃 |
| self._plot_curves = {} | ||
| self._rocch = None | ||
| self._perf_line = None | ||
| self._tooltip_cache = None |
There was a problem hiding this comment.
I do not understand why the tooltip cache is necessary. Can you add a comment explaining the reasoning for its use?
There was a problem hiding this comment.
My reasoning behind it was that instead of checking all points of a curve to find the closest one, it would first check the tooltip cache, which contains at most 1 threshold (point) per (selected) classifier.
It's hard to say if it's worth the additional 20-ish lines of code though.
There was a problem hiding this comment.
But it already checks all points in pts = sp.pointsAt(act_pos)
There was a problem hiding this comment.
Right, but it also does it again later (although it is vectorized): idx_closest = numpy.argmin(numpy.linalg.norm(diff, axis=1)).
Considering your point, it does seem that the performance gain from tooltip cache might not be as big as I thought. Should I remove it and make the code clearer that way?
Edit: also, the tooltip cache provides a convenient way to test the contents of tooltips - the first alternative way I can think of would be to test the entire tooltip text, e.g. assertEqual("Tooltips:\n(#1) 0.400", QTooltip.text()).
It does not matter really, you can leave it. |
Issue
Implements #3057.
Description of changes
When mouse is put over a point on ROC curve, the threshold that was used for calculation now appears. If threshold value is nan, the value does not get shown. Also, the displaying of thresholds is currently not implemented when showing individual curves.
My questions:
General comments are also welcome.
Includes