You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from `scikit-learn`; but it is difficult to combine this approach with parameter tuning to find a good number of neighbors
1879
+
for each set of features. Instead we will code the forward selection algorithm manually.
1880
+
In particular, we need code that tries adding each available predictor to a model, finding the best, and iterating.
1956
1881
If you recall the end of the wrangling chapter, we mentioned
1957
1882
that sometimes one needs more flexible forms of iteration than what
1958
1883
we have used earlier, and in these cases one typically resorts to
1959
-
a *for loop*; see [the section on control flow (for loops)](https://wesmckinney.com/book/python-basics.html#control_for) in *Python for Data Analysis* {cite:p}`mckinney2012python`.
1960
-
Here we will use two for loops:
1961
-
one over increasing predictor set sizes
1884
+
a *for loop*; see
1885
+
the [control flow section](https://wesmckinney.com/book/python-basics.html#control_for) in
1886
+
*Python for Data Analysis* {cite:p}`mckinney2012python`.
1887
+
Here we will use two for loops: one over increasing predictor set sizes
1962
1888
(where you see `for i in range(1, n_total + 1):` below),
1963
1889
and another to check which predictor to add in each round (where you see `for j in range(len(names))` below).
1964
1890
For each set of predictors to try, we extract the subset of predictors,
1965
1891
pass it into a preprocessor, build a `Pipeline` that tunes
1966
-
a $K$-NN classifier using 10-fold cross-validation,
1892
+
a K-NN classifier using 10-fold cross-validation,
1967
1893
and finally records the estimated accuracy.
1968
1894
1969
1895
```{code-cell} ipython3
1970
-
:tags: [remove-cell]
1896
+
from sklearn.compose import make_column_selector
1971
1897
1972
-
# Finally, we need to write some code that performs the task of sequentially
1973
-
# finding the best predictor to add to the model.
1974
-
# If you recall the end of the wrangling chapter, we mentioned
1975
-
# that sometimes one needs more flexible forms of iteration than what
1976
-
# we have used earlier, and in these cases one typically resorts to
1977
-
# a *for loop*; see [the chapter on iteration](https://r4ds.had.co.nz/iteration.html) in *R for Data Science* [@wickham2016r].
1978
-
# Here we will use two for loops:
1979
-
# one over increasing predictor set sizes
1980
-
# (where you see `for (i in 1:length(names))` below),
1981
-
# and another to check which predictor to add in each round (where you see `for (j in 1:length(names))` below).
1982
-
# For each set of predictors to try, we construct a model formula,
1983
-
# pass it into a `recipe`, build a `workflow` that tunes
1984
-
# a $K$-NN classifier using 5-fold cross-validation,
0 commit comments