You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Single line functions for detailed visualizations
10
10
11
11
### The quickest and easiest way to go from analysis...
12
12
13
-

13
+

14
14
15
15
### ...to this.
16
16
@@ -24,58 +24,56 @@ That said, there are a number of visualizations that frequently pop up in machin
24
24
25
25
Say we use Naive Bayes in multi-class classification and decide we want to visualize the results of a common classification metric, the Area under the Receiver Operating Characteristic curve. Since the ROC is only valid in binary classification, we want to show the respective ROC of each class if it were the positive class. As an added bonus, let's show the micro-averaged and macro-averaged curve in the plot as well.
26
26
27
-
Using scikit-plot with the sample digits dataset from scikit-learn.
27
+
Let's use scikit-plot with the sample digits dataset from scikit-learn.
28
28
29
29
```python
30
-
from sklearn.datasets import load_digits as load_data
30
+
# The usual train-test split mumbo-jumbo
31
+
from sklearn.datasets import load_digits
32
+
from sklearn.model_selection import train_test_split
31
33
from sklearn.naive_bayes import GaussianNB
32
34
33
-
# This is all that's needed for scikit-plot
34
-
import matplotlib.pyplot as plt
35
-
from scikitplot import classifier_factory
36
-
37
-
X, y = load_data(return_X_y=True)
35
+
X, y = load_digits(return_X_y=True)
36
+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
So what happened here? First, we created a regular Naive Bayes classifier instance from scikit-learn and assigned it to `nb`. We then passed `nb` to `classifier_factory`. Then, like magic, we call `nb`'s *instance method*`plot_roc_curve` and pass it a features array and corresponding label array. Finally, we call `plt.show()` to display the corresponding plot.
48
-
49
-
Wait, what? The scikit-learn `GaussianNB` classifier doesn't have a `plot_roc_curve` method. How does this not throw an error? Well, `classifier_factory` is a function that modifies an __instance__ of a scikit-learn classifier. When we passed `nb` to `classifier_factory`, it __appended__ new plotting methods to the instance, one of which was `plot_roc_curve`, while leaving everything else alone.
50
-
51
-
This means that our classifier instance `nb` will behave the same way as before, with all its original variables and methods intact. In fact, if you take any of your existing scripts, pass your classifier instances to `classifier_factory` at the top and run them, you'll likely never notice a difference!
52
-
53
-
Classifiers aren't the only Scikit-learn objects. Scikit-plot offers a `clusterer_factory` function for generating common clustering plots. Visit the [docs](http://scikit-plot.readthedocs.io/en/latest/) for a complete list of what you can accomplish.
51
+
And... That's it. Encaptured in that small example is the entire philosophy of Scikit-plot: **single line functions for detailed visualization**. You simply browse the plots available in the documentation, and call the function with the necessary arguments. Scikit-plot tries to stay out of your way as much as possible. No unnecessary bells and whistles. And when you *do* need the bells and whistles, each function offers a myriad of parameters for customizing various elements in your plots.
54
52
55
53
Finally, compare and [view the non-scikit-plot way of plotting the multi-class ROC curve](http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html). Which one would you rather do?
56
54
57
55
## Maximum flexibility. Compatibility with non-scikit-learn objects.
58
56
59
-
Although convenient, the Factory API may feel a little restrictive for more advanced users and users of external libraries. Thus, to offer more flexibility over your plotting, Scikit-plot also exposes a Functions API that, well, exposes functions.
57
+
Although Scikit-plot is loosely based around the scikit-learn interface, you don't actually need Scikit-learn objects to use the available functions. As long as you provide the functions what they're asking for, they'll happily draw the plots for you.
60
58
61
59
Here's a quick example to generate the precision-recall curves of a Keras classifier on a sample dataset.
62
60
63
61
```python
64
62
# Import what's needed for the Functions API
65
63
import matplotlib.pyplot as plt
66
-
import scikitplot.plottersas skplt
64
+
import scikitplot as skplt
67
65
68
66
# This is a Keras classifier. We'll generate probabilities on the test set.
You can see clearly here that `skplt.plot_precision_recall_curve` needs only the ground truth y-values and the predicted probabilities to generate the plot. This lets you use *anything* you want as the classifier, from Keras NNs to NLTK Naive Bayes to that groundbreaking classifier algorithm you just wrote.
76
+
You can see clearly here that `skplt.metrics.plot_precision_recall_curve` needs only the ground truth y-values and the predicted probabilities to generate the plot. This lets you use *anything* you want as the classifier, from Keras NNs to NLTK Naive Bayes to that groundbreaking classifier algorithm you just wrote.
79
77
80
78
The possibilities are endless.
81
79
@@ -88,7 +86,7 @@ Then just run:
88
86
pip install scikit-plot
89
87
```
90
88
91
-
Or if you want, clone this repo and run
89
+
Or if you want the latest development version, clone this repo and run
Copy file name to clipboardExpand all lines: docs/Quickstart.rst
+19-32Lines changed: 19 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ Before we begin plotting, we'll need to import the following for Scikit-plot::
32
32
33
33
>>> import matplotlib.pyplot as plt
34
34
35
-
:class:`matplotlib.pyplot` is used by Matplotlib to make plotting work like it does in MATLAB and deals with things like axes, figures, and subplots. But don't worry. Unless you're an advanced user, you won't need to understand any of that while using Scikit-plot. All you need to remember is that we use the ``matplotlib.pyplot.show()`` function to show any plots generated by Scikit-plot.
35
+
:mod:`matplotlib.pyplot` is used by Matplotlib to make plotting work like it does in MATLAB and deals with things like axes, figures, and subplots. But don't worry. Unless you're an advanced user, you won't need to understand any of that while using Scikit-plot. All you need to remember is that we use the :func:`matplotlib.pyplot.show` function to show any plots generated by Scikit-plot.
36
36
37
37
Let's begin by generating our sample digits dataset::
38
38
@@ -46,24 +46,17 @@ We'll proceed by creating an instance of a RandomForestClassifier object from Sc
46
46
>>> from sklearn.ensemble import RandomForestClassifier
In detail, here's what happened. :func:`~scikitplot.classifier_factory` is a function that modifies an instance of a scikit-learn classifier. When we passed ``random_forest_clf`` to :func:`~scikitplot.classifier_factory`, it **appended** new plotting methods to the instance, while leaving everything else alone. The original variables and methods of ``random_forest_clf`` are kept intact. In fact, if you take any of your existing scripts, pass your classifier instances to :func:`~scikitplot.classifier_factory` at the top and run them, you'll likely never notice a difference! (If something does break, though, we'd appreciate it if you open an issue at Scikit-plot's `Github repository <https://github.com/reiinakano/scikit-plot>`_.)
54
+
For those not familiar with what :func:`cross_val_predict` does, it generates cross-validated estimates for each sample point in our dataset. Comparing the cross-validated estimates with the true labels, we'll be able to get evaluation metrics such as accuracy, precision, recall, and in our case, the confusion matrix.
61
55
62
-
Among the methods added to our classifier instance is the :func:`~scikitplot.classifiers.plot_confusion_matrix` method, used to generate a colored heatmap of the classifier's confusion matrix as evaluated on a dataset.
56
+
To plot and show our confusion matrix, we'll use the function :func:`~scikitplot.metrics.plot_confusion_matrix`, passing it both the true labels and predicted labels. We'll also set the optional argument ``normalize=True`` so the values displayed in our confusion matrix plot will be from the range [0, 1]. Finally, to show our plot, we'll call ``plt.show()``.
63
57
64
-
To plot and show how well our classifier does on the sample dataset, we'll run ``random_forest_clf``'s new instance method :func:`~scikitplot.classifiers.plot_confusion_matrix`, passing it the features and labels of our sample dataset. We'll also pass ``normalize=True`` to :func:`~scikitplot.classifiers.plot_confusion_matrix` so the values displayed in our confusion matrix plot will be from the range [0, 1]. Finally, to show our plot, we'll call ``plt.show()``.
65
-
66
-
>>> random_forest_clf.plot_confusion_matrix(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
68
61
>>> plt.show()
69
62
@@ -73,39 +66,33 @@ To plot and show how well our classifier does on the sample dataset, we'll run `
73
66
74
67
And that's it! A quick glance of our confusion matrix shows that our classifier isn't doing so well with identifying the digits 1, 8, and 9. Hmm. Perhaps a bit more tweaking of our Random Forest's hyperparameters is in order.
75
68
76
-
.. admonition:: Note
77
-
78
-
The more observant of you will notice that we didn't train our classifier at all. Exactly how was the confusion matrix generated? Well, :func:`~scikitplot.classifiers.plot_confusion_matrix` provides an optional parameter ``do_cv``, set to **True** by default, that determines whether or not the classifier will use cross-validation to generate the confusion matrix. If **True**, the predictions generated by each iteration in the cross-validation are aggregated and used to generate the confusion matrix.
79
-
80
-
If you do not wish to do cross-validation e.g. you have separate training and testing datasets, simply set ``do_cv`` to **False** and make sure the classifier is already trained prior to calling :func:`~scikitplot.classifiers.plot_confusion_matrix`. In this case, the confusion matrix will be generated on the predictions of the trained classifier on the passed ``X`` and ``y``.
69
+
One more example
70
+
----------------
81
71
82
-
The Functions API
83
-
-----------------
84
-
85
-
Although convenient, the Factory API may feel a little restrictive for more advanced users and users of external libraries. Thus, to offer more flexibility over your plotting, Scikit-plot also exposes a Functions API that, well, exposes functions.
86
-
87
-
The nature of the Functions API offers compatibility with non-scikit-learn objects.
72
+
Finally, let's show an example wherein we *don't* use Scikit-learn.
88
73
89
74
Here's a quick example to generate the precision-recall curves of a Keras classifier on a sample dataset.
90
75
91
76
>>> # Import what's needed for the Functions API
92
77
>>> import matplotlib.pyplot as plt
93
-
>>> import scikitplot.plottersas skplt
78
+
>>> import scikitplot as skplt
94
79
>>> # This is a Keras classifier. We'll generate probabilities on the test set.
And again, that's it! You'll notice that in this plot, all we needed to do was pass the ground truth labels and predicted probabilities to :func:`~scikitplot.plotters.plot_precision_recall_curve` to generate the precision-recall curves. This means you can use literally any classifier you want to generate the precision-recall curves, from Keras classifiers to NLTK Naive Bayes to XGBoost, as long as you pass in the predicted probabilities in the correct format.
91
+
And again, that's it! As in the example above, all we needed to do was pass the ground truth labels and predicted probabilities to :func:`~scikitplot.metrics.plot_precision_recall_curve` to generate the precision-recall curves. This means you can use literally any classifier you want to generate the precision-recall curves, from Keras classifiers to NLTK Naive Bayes to XGBoost, as long as you pass in the predicted probabilities in the correct format.
92
+
93
+
Now what?
94
+
---------
107
95
108
-
More Plots
109
-
----------
96
+
The recommended way to start using Scikit-plot is to just go through the documentation for the various modules and choose which plots you think would be useful for your work.
110
97
111
-
Want to know the other plots you can generate using Scikit-plot? Visit the :ref:`factoryapidocs` or the :ref:`functionsapidocs`.
Scikit-plot is the result of an unartistic data scientist's dreadful realization that *visualization is one of the most crucial components in the data science process, not just a mere afterthought*.
20
+
21
+
Gaining insights is simply a lot easier when you're looking at a colored heatmap of a confusion matrix complete with class labels rather than a single-line dump of numbers enclosed in brackets. Besides, if you ever need to present your results to someone (virtually any time anybody hires you to do data science), you show them visualizations, not a bunch of numbers in Excel.
22
+
23
+
That said, there are a number of visualizations that frequently pop up in machine learning. Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.
0 commit comments