@@ -594,9 +594,10 @@ knn = KNeighborsClassifier(n_neighbors=3)
594
594
X = cancer_train[["Smoothness", "Concavity"]]
595
595
y = cancer_train["Class"]
596
596
597
- knn_fit = make_pipeline(cancer_preprocessor, knn).fit(X, y)
597
+ knn_pipeline = make_pipeline(cancer_preprocessor, knn)
598
+ knn_pipeline.fit(X, y)
598
599
599
- knn_fit
600
+ knn_pipeline
600
601
```
601
602
602
603
### Predict the labels in the test set
@@ -614,7 +615,7 @@ variables in the output data frame.
614
615
615
616
``` {code-cell} ipython3
616
617
cancer_test_predictions = cancer_test.assign(
617
- predicted = knn_fit .predict(cancer_test[["Smoothness", "Concavity"]])
618
+ predicted = knn_pipeline .predict(cancer_test[["Smoothness", "Concavity"]])
618
619
)
619
620
cancer_test_predictions[["ID", "Class", "predicted"]]
620
621
```
@@ -645,7 +646,7 @@ for the predictors that we originally passed into `predict` when making predicti
645
646
and we provide the actual labels via the ` cancer_test["Class"] ` series.
646
647
647
648
``` {code-cell} ipython3
648
- cancer_acc_1 = knn_fit .score(
649
+ cancer_acc_1 = knn_pipeline .score(
649
650
cancer_test[["Smoothness", "Concavity"]],
650
651
cancer_test["Class"]
651
652
)
@@ -662,11 +663,9 @@ glue("cancer_acc_1", "{:0.0f}".format(100*cancer_acc_1))
662
663
663
664
The output shows that the estimated accuracy of the classifier on the test data
664
665
was {glue: text }` cancer_acc_1 ` %.
665
- We can also look at the * confusion matrix* for the classifier
666
- using the ` crosstab ` function from ` pandas ` . A confusion matrix shows how many
667
- observations of each (actual) label were classified as each (predicted) label.
668
- The ` crosstab ` function
669
- takes two arguments: the actual labels first, then the predicted labels second.
666
+ We can also look at the * confusion matrix* for the classifier
667
+ using the ` crosstab ` function from ` pandas ` . The ` crosstab ` function takes two
668
+ arguments: the actual labels first, then the predicted labels second.
670
669
671
670
``` {code-cell} ipython3
672
671
pd.crosstab(
@@ -884,10 +883,11 @@ cancer_subtrain, cancer_validation = train_test_split(
884
883
knn = KNeighborsClassifier(n_neighbors=3)
885
884
X = cancer_subtrain[["Smoothness", "Concavity"]]
886
885
y = cancer_subtrain["Class"]
887
- knn_fit = make_pipeline(cancer_preprocessor, knn).fit(X, y)
886
+ knn_pipeline = make_pipeline(cancer_preprocessor, knn)
887
+ knn_pipeline.fit(X, y)
888
888
889
889
# compute the score on validation data
890
- acc = knn_fit .score(
890
+ acc = knn_pipeline .score(
891
891
cancer_validation[["Smoothness", "Concavity"]],
892
892
cancer_validation["Class"]
893
893
)
@@ -908,10 +908,10 @@ for i in range(1, 5):
908
908
knn = KNeighborsClassifier(n_neighbors=3)
909
909
X = cancer_subtrain[["Smoothness", "Concavity"]]
910
910
y = cancer_subtrain["Class"]
911
- knn_fit = make_pipeline(cancer_preprocessor, knn).fit(X, y)
911
+ knn_pipeline = make_pipeline(cancer_preprocessor, knn).fit(X, y)
912
912
913
913
# compute the score on validation data
914
- accuracies.append(knn_fit .score(
914
+ accuracies.append(knn_pipeline .score(
915
915
cancer_validation[["Smoothness", "Concavity"]],
916
916
cancer_validation["Class"]
917
917
))
@@ -979,7 +979,6 @@ Since the `cross_validate` function outputs a dictionary, we use `pd.DataFrame`
979
979
dataframe for better visualization.
980
980
Note that the ` cross_validate ` function handles stratifying the classes in
981
981
each train and validate fold automatically.
982
- We begin by importing the ` cross_validate ` function from ` sklearn ` .
983
982
984
983
``` {code-cell} ipython3
985
984
from sklearn.model_selection import cross_validate
@@ -1183,17 +1182,14 @@ format. We will wrap it in a `pd.DataFrame` to make it easier to understand,
1183
1182
and print the ` info ` of the result.
1184
1183
1185
1184
``` {code-cell} ipython3
1186
- accuracies_grid = pd.DataFrame(
1187
- cancer_tune_grid.fit(
1188
- cancer_train[["Smoothness", "Concavity"]],
1189
- cancer_train["Class"]
1190
- ).cv_results_
1185
+ cancer_tune_grid.fit(
1186
+ cancer_train[["Smoothness", "Concavity"]],
1187
+ cancer_train["Class"]
1191
1188
)
1192
- ```
1193
-
1194
- ``` {code-cell} ipython3
1189
+ accuracies_grid = pd.DataFrame(cancer_tune_grid.cv_results_)
1195
1190
accuracies_grid.info()
1196
1191
```
1192
+
1197
1193
There is a lot of information to look at here, but we are most interested
1198
1194
in three quantities: the number of neighbors (` param_kneighbors_classifier__n_neighbors ` ),
1199
1195
the cross-validation accuracy estimate (` mean_test_score ` ),
@@ -1303,13 +1299,13 @@ large_cancer_tune_grid = GridSearchCV(
1303
1299
cv=10
1304
1300
)
1305
1301
1306
- large_accuracies_grid = pd.DataFrame(
1307
- large_cancer_tune_grid.fit(
1308
- cancer_train[["Smoothness", "Concavity"]],
1309
- cancer_train["Class"]
1310
- ).cv_results_
1302
+ large_cancer_tune_grid.fit(
1303
+ cancer_train[["Smoothness", "Concavity"]],
1304
+ cancer_train["Class"]
1311
1305
)
1312
1306
1307
+ large_accuracies_grid = pd.DataFrame(large_cancer_tune_grid.cv_results_)
1308
+
1313
1309
large_accuracy_vs_k = alt.Chart(large_accuracies_grid).mark_line(point=True).encode(
1314
1310
x=alt.X("param_kneighborsclassifier__n_neighbors").title("Neighbors"),
1315
1311
y=alt.Y("mean_test_score")
@@ -1903,7 +1899,6 @@ n_total = len(names)
1903
1899
# start with an empty list of selected predictors
1904
1900
selected = []
1905
1901
1906
-
1907
1902
# create the pipeline and CV grid search objects
1908
1903
param_grid = {
1909
1904
"kneighborsclassifier__n_neighbors": range(1, 61, 5),
@@ -1929,8 +1924,8 @@ for i in range(1, n_total + 1):
1929
1924
y = cancer_subset["Class"]
1930
1925
1931
1926
# Find the best K for this set of predictors
1932
- cancer_model_grid = cancer_tune_grid.fit(X, y)
1933
- accuracies_grid = pd.DataFrame(cancer_model_grid .cv_results_)
1927
+ cancer_tune_grid.fit(X, y)
1928
+ accuracies_grid = pd.DataFrame(cancer_tune_grid .cv_results_)
1934
1929
1935
1930
# Store the tuned accuracy for this set of predictors
1936
1931
accs[j] = accuracies_grid["mean_test_score"].max()
0 commit comments