Skip to content

Commit 65b465b

Browse files
fix inconsistency in train/test split in reg1 and reg2
1 parent beed8a1 commit 65b465b

File tree

2 files changed

+12
-5
lines changed

2 files changed

+12
-5
lines changed

source/regression1.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,13 @@ the `train_test_split` function cannot stratify based on a
408408
quantitative variable.
409409
```
410410

411+
```{code-cell} ipython3
412+
:tags: [remove-cell]
413+
# fix seed right before train/test split for reproducibility with next chapter
414+
# make sure this seed is always the same as the one used before the split in Regression 2
415+
np.random.seed(1)
416+
```
417+
411418
```{code-cell} ipython3
412419
sacramento_train, sacramento_test = train_test_split(
413420
sacramento, train_size=0.75
@@ -698,7 +705,7 @@ to be too small or too large, we cause the RMSPE to increase, as shown in
698705

699706
{numref}`fig:07-howK` visualizes the effect of different settings of $K$ on the
700707
regression model. Each plot shows the predicted values for house sale price from
701-
our KNN regression model for 6 different values for $K$: 1, 3, {glue:text}`best_k_sacr`, 41, 250, and 699 (i.e., all of the training data).
708+
our KNN regression model for 6 different values for $K$: 1, 3, 25, {glue:text}`best_k_sacr`, 250, and 699 (i.e., all of the training data).
702709
For each model, we predict prices for the range of possible home sizes we
703710
observed in the data set (here 500 to 5,000 square feet) and we plot the
704711
predicted prices as a orange line.
@@ -709,8 +716,8 @@ predicted prices as a orange line.
709716
gridvals = [
710717
1,
711718
3,
719+
25,
712720
best_k_sacr,
713-
41,
714721
250,
715722
len(sacramento_train),
716723
]

source/regression2.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -371,7 +371,7 @@ np.random.seed(1)
371371
sacramento = pd.read_csv("data/sacramento.csv")
372372
373373
sacramento_train, sacramento_test = train_test_split(
374-
sacramento, train_size=0.6
374+
sacramento, train_size=0.75
375375
)
376376
```
377377

@@ -533,8 +533,8 @@ from sklearn.preprocessing import StandardScaler
533533
# preprocess the data, make the pipeline
534534
sacr_preprocessor = make_column_transformer((StandardScaler(), ["sqft"]))
535535
sacr_pipeline_knn = make_pipeline(
536-
sacr_preprocessor, KNeighborsRegressor(n_neighbors=25)
537-
) # 25 is the best parameter obtained through cross validation in regression1 chapter
536+
sacr_preprocessor, KNeighborsRegressor(n_neighbors=55)
537+
) # 55 is the best parameter obtained through cross validation in regression1 chapter
538538
539539
sacr_pipeline_knn.fit(sacramento_train[["sqft"]], sacramento_train[["price"]])
540540

0 commit comments

Comments
 (0)