Skip to content

Commit 4259461

Browse files
dataset to data set
1 parent 1b15a5f commit 4259461

File tree

4 files changed

+10
-10
lines changed

4 files changed

+10
-10
lines changed

source/classification1.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -332,7 +332,7 @@ points_df = pd.DataFrame(
332332
)
333333
perim_concav_with_new_point_df = pd.concat((cancer, points_df), ignore_index=True)
334334
# Find the euclidean distances from the new point to each of the points
335-
# in the orginal dataset
335+
# in the orginal data set
336336
my_distances = euclidean_distances(perim_concav_with_new_point_df[attrs])[
337337
len(cancer)
338338
][:-1]
@@ -430,7 +430,7 @@ points_df2 = pd.DataFrame(
430430
)
431431
perim_concav_with_new_point_df2 = pd.concat((cancer, points_df2), ignore_index=True)
432432
# Find the euclidean distances from the new point to each of the points
433-
# in the orginal dataset
433+
# in the orginal data set
434434
my_distances2 = euclidean_distances(perim_concav_with_new_point_df2[attrs])[
435435
len(cancer)
436436
][:-1]
@@ -783,7 +783,7 @@ points_df4 = pd.DataFrame(
783783
)
784784
perim_concav_with_new_point_df4 = pd.concat((cancer, points_df4), ignore_index=True)
785785
# Find the euclidean distances from the new point to each of the points
786-
# in the orginal dataset
786+
# in the orginal data set
787787
my_distances4 = euclidean_distances(perim_concav_with_new_point_df4[attrs])[
788788
len(cancer)
789789
][:-1]

source/clustering.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -792,7 +792,7 @@ Total WSSD for K clusters ranging from 1 to 9.
792792
We can perform K-means in Python using a workflow similar to those
793793
in the earlier classification and regression chapters. We will begin
794794
by reading the original (i.e., unstandardized) subset of 18 observations
795-
from the penguins dataset.
795+
from the penguins data set.
796796

797797
```{code-cell} ipython3
798798
:tags: [remove-cell]
@@ -1056,7 +1056,7 @@ and guidance that the worksheets provide will function as intended.
10561056
clustering for when you expect there to be subgroups, and then subgroups within
10571057
subgroups, etc., in your data. In the realm of more general unsupervised
10581058
learning, it covers *principal components analysis (PCA)*, which is a very
1059-
popular technique for reducing the number of predictors in a dataset.
1059+
popular technique for reducing the number of predictors in a data set.
10601060

10611061
## References
10621062

source/regression1.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ By the end of the chapter, readers will be able to do the following:
5454
* Recognize situations where a simple regression analysis would be appropriate for making predictions.
5555
* Explain the K-nearest neighbor (KNN) regression algorithm and describe how it differs from KNN classification.
5656
* Interpret the output of a KNN regression.
57-
* In a dataset with two or more variables, perform K-nearest neighbor regression in Python using a `scikit-learn` workflow.
57+
* In a data set with two or more variables, perform K-nearest neighbor regression in Python using a `scikit-learn` workflow.
5858
* Execute cross-validation in Python to choose the number of neighbors.
5959
* Evaluate KNN regression prediction accuracy in Python using a test data set and the root mean squared prediction error (RMSPE).
6060
* In the context of KNN regression, compare and contrast goodness of fit and prediction properties (namely RMSE vs RMSPE).
@@ -795,8 +795,8 @@ In this case the orange line becomes extremely smooth, and actually becomes flat
795795
once $K$ is equal to the number of datapoints in the entire data set.
796796
This happens because our predicted values for a given x value (here, home
797797
size), depend on many neighboring observations; in the case where $K$ is equal
798-
to the size of the dataset, the prediction is just the mean of the house prices
799-
in the dataset (completely ignoring the house size).
798+
to the size of the data set, the prediction is just the mean of the house prices
799+
in the data set (completely ignoring the house size).
800800
In contrast to the $K=1$ example,
801801
the smooth, inflexible orange line does not follow the training observations very closely.
802802
In other words, the model is *not influenced enough* by the training data.

source/viz.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ than 5,000 rows. The simplest way to plot larger data sets is to enable the
183183
`vegafusion` data transformer right after you import the `altair` package:
184184
`alt.data_transformers.enable("vegafusion")`. This will allow you to plot up to
185185
100,000 graphical objects (e.g., a scatter plot with 100,000 points). To
186-
visualize *even larger* datasets, see [the `altair` documentation](https://altair-viz.github.io/user_guide/large_datasets).
186+
visualize *even larger* data sets, see [the `altair` documentation](https://altair-viz.github.io/user_guide/large_datasets).
187187
```
188188

189189
### Scatter plots and line plots: the Mauna Loa CO$_{\text{2}}$ data set
@@ -277,7 +277,7 @@ There are a few basic aspects of a plot that we need to specify:
277277
- Here, we use the `mark_point` function to visualize our data as a scatter plot.
278278
- The **encoding channels**, which tells `altair` how the columns in the data frame map to visual properties in the chart.
279279
- To create an encoding, we use the `encode` function.
280-
- The `encode` method builds a key-value mapping between encoding channels (such as x, y) to fields in the dataset, accessed by field name (column names)
280+
- The `encode` method builds a key-value mapping between encoding channels (such as x, y) to fields in the data set, accessed by field name (column names)
281281
- Here, we set the `x` axis of the plot to the `date_measured` variable,
282282
and on the `y` axis, we plot the `ppm` variable.
283283
- For the y-axis, we also provided the method

0 commit comments

Comments
 (0)