Skip to content

Commit cc115e0

Browse files
improvements to reg1 error vert lines plot; minor clarification reg2
1 parent f73ced7 commit cc115e0

File tree

2 files changed

+9
-9
lines changed

2 files changed

+9
-9
lines changed

source/regression1.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -476,8 +476,8 @@ us the smallest RMSPE.
476476
from sklearn.neighbors import KNeighborsRegressor
477477
478478
# (synthetic) new prediction points
479-
pts = pd.DataFrame({"sqft": [1250, 1850, 2250], "price": [250000, 200000, 500000]})
480-
finegrid = pd.DataFrame({"sqft": np.arange(900, 3901, 10)})
479+
pts = pd.DataFrame({"sqft": [1200, 1850, 2250], "price": [300000, 200000, 500000]})
480+
finegrid = pd.DataFrame({"sqft": np.arange(600, 3901, 10)})
481481
482482
# preprocess the data, make the pipeline
483483
sacr_preprocessor = make_column_transformer((StandardScaler(), ["sqft"]))
@@ -495,22 +495,22 @@ sacr_full_preds_hid = pd.concat(
495495
)
496496
497497
sacr_new_preds_hid = pd.concat(
498-
(pts, pd.DataFrame(sacr_pipeline.predict(pts), columns=["predicted"])),
498+
(small_sacramento[["sqft", "price"]].reset_index(), pd.DataFrame(sacr_pipeline.predict(small_sacramento[["sqft", "price"]]), columns=["predicted"])),
499499
axis=1,
500-
)
500+
).drop(columns=["index"])
501501
502502
# to make altair mark_line works, need to create separate dataframes for each vertical error line
503-
sacr_new_preds_melted_df = sacr_new_preds_hid.melt(id_vars=["sqft"])
504503
errors_plot = (
505504
small_plot
506505
+ alt.Chart(sacr_full_preds_hid).mark_line(color="#ff7f0e").encode(x="sqft", y="predicted")
507506
+ alt.Chart(sacr_new_preds_hid)
508507
.mark_circle(opacity=1)
509508
.encode(x="sqft", y="price")
510509
)
510+
sacr_new_preds_melted_df = sacr_new_preds_hid.melt(id_vars=["sqft"])
511511
v_lines = []
512-
for i in pts["sqft"]:
513-
line_df = sacr_new_preds_melted_df.query("sqft == @i")
512+
for i in sacr_new_preds_hid["sqft"]:
513+
line_df = sacr_new_preds_melted_df.query(f"sqft == {i}")
514514
v_lines.append(alt.Chart(line_df).mark_line(color="black").encode(x="sqft", y="value"))
515515
516516
errors_plot = alt.layer(*v_lines, errors_plot)
@@ -526,7 +526,7 @@ glue("fig:07-verticalerrors", errors_plot, display=False)
526526
:::{glue:figure} fig:07-verticalerrors
527527
:name: fig:07-verticalerrors
528528

529-
Scatter plot of price (USD) versus house size (square feet) with example predictions (orange line) and the error in those predictions compared with true response values for three selected observations (vertical lines).
529+
Scatter plot of price (USD) versus house size (square feet) with example predictions (orange line) and the error in those predictions compared with true response values (vertical lines).
530530
:::
531531

532532
+++

source/regression2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ Scatter plot of sale price versus size with many possible lines that could be dr
313313

314314
Simple linear regression chooses the straight line of best fit by choosing
315315
the line that minimizes the **average squared vertical distance** between itself and
316-
each of the observed data points in the training data. {numref}`fig:08-verticalDistToMin` illustrates
316+
each of the observed data points in the training data (equivalent to minimizing the RMSE). {numref}`fig:08-verticalDistToMin` illustrates
317317
these vertical distances as lines. Finally, to assess the predictive
318318
accuracy of a simple linear regression model,
319319
we use RMSPE—the same measure of predictive performance we used with K-NN regression.

0 commit comments

Comments
 (0)