Skip to content

Commit a954d0d

Browse files
author
xhlulu
committed
ML Docs: Update regression page
1 parent 38ef59d commit a954d0d

File tree

1 file changed

+32
-6
lines changed

1 file changed

+32
-6
lines changed

doc/python/ml-regression.md

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ jupyter:
2020
name: python
2121
nbconvert_exporter: python
2222
pygments_lexer: ipython3
23-
version: 3.7.6
23+
version: 3.7.7
2424
plotly:
2525
description: Visualize regression in scikit-learn with Plotly.
2626
display_as: ai_ml
@@ -33,14 +33,29 @@ jupyter:
3333
thumbnail: thumbnail/ml-regression.png
3434
---
3535

36+
<!-- #region -->
37+
This page shows how to use Plotly charts for displaying various types of regression models, starting from simple models like [Linear Regression](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html), and progressively move towards models like [Decision Tree][tree] and [Polynomial Features][poly]. We highlight various capabilities of plotly, such as comparative analysis of the same model with different parameters, displaying Latex, [surface plots](https://plotly.com/python/3d-surface-plots/) for 3D data, and enhanced prediction error analysis with [Plotly Express](https://plotly.com/python/plotly-express/).
38+
39+
We will use [Scikit-learn](https://scikit-learn.org/) to split and preprocess our data and train various regression models. Scikit-learn is a popular Machine Learning (ML) library that offers various tools for creating and training ML algorithms, feature engineering, data cleaning, and evaluating and testing models. It was designed to be accessible, and to work seamlessly with popular libraries like NumPy and Pandas.
40+
41+
42+
[lasso]: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html
43+
[tree]: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
44+
[poly]: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
45+
<!-- #endregion -->
46+
3647
## Basic linear regression plots
3748

49+
In this section, we show you how to apply a simple regression model for predicting tips a server will receive based on various client attributes (such as sex, time of the week, and whether they are a smoker).
3850

39-
### Ordinary Least Square (OLS) with `plotly.express`
51+
We will be using the [Linear Regression][lr], which is a simple model that fit an intercept (the mean tip received by a server), and add a slope for each feature we use, such as the value of the total bill. We show you how to do that with both Plotly Express and Scikit-learn.
4052

53+
### Ordinary Least Square (OLS) with `plotly.express`
4154

4255
This example shows how to use `plotly.express`'s `trendline` parameter to train a simply Ordinary Least Square (OLS) for predicting the tips waiters will receive based on the value of the total bill.
4356

57+
[lr]: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
58+
4459
```python
4560
import plotly.express as px
4661

@@ -78,7 +93,7 @@ fig.show()
7893

7994
## Model generalization on unseen data
8095

81-
Easily color your plot based on a predefined data split.
96+
With `go.Scatter`, you can easily color your plot based on a predefined data split. By coloring the training and the testing data points with different colors, you can easily see if whether the model generalizes well to the test data or not.
8297

8398
```python
8499
import numpy as np
@@ -108,7 +123,11 @@ fig.show()
108123

109124
## Comparing different kNN models parameters
110125

111-
Compare the performance of two different models on the same dataset. This can be easily combined with discrete color legends from `px`, such as coloring by the assigned `sex`.
126+
In addition to linear regression, it's possible to fit the same data using [k-Nearest Neighbors][knn]. When you perform a prediction on a new sample, this model either takes the weighted or un-weighted average of the neighbors. In order to see the difference between those two averaging options, we train a kNN model with both of those parameters, and we plot them in the same way as the previous graph.
127+
128+
Notice how we can combine scatter points with lines using Plotly.py. You can learn more about [multiple chart types](https://plotly.com/python/graphing-multiple-chart-types/).
129+
130+
[knn]: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html
112131

113132
```python
114133
import numpy as np
@@ -136,9 +155,14 @@ fig.add_traces(go.Scatter(x=x_range, y=y_dist, name='Weights: Distance'))
136155
fig.show()
137156
```
138157

158+
<!-- #region -->
139159
## Displaying `PolynomialFeatures` using $\LaTeX$
140160

141-
It's easy to diplay latex equations in legend and titles by simply adding `$` before and after your equation.
161+
Notice how linear regression fits a straight line, but kNN can take non-linear shapes. Moreover, it is possible to extend linear regression to polynomial regression by using scikit-learn's `PolynomialFeatures`, which lets you fit a slope for your features raised to the power of `n`, where `n=1,2,3,4` in our example.
162+
163+
164+
With Plotly, it's easy to diplay latex equations in legend and titles by simply adding `$` before and after your equation. This way, you can see the coefficients that our polynomial regression fitted.
165+
<!-- #endregion -->
142166

143167
```python
144168
import numpy as np
@@ -220,7 +244,9 @@ fig.show()
220244

221245
## Visualizing coefficients for multiple linear regression (MLR)
222246

223-
When you are fitting a linear regression, you want to often know what feature matters the most in your regression's output.
247+
Visualizing regression with one or two variables is straightforward, since we can respectively plot them with scatter plots and 3D scatter plots. Moreover, if you have more than 2 features, you will need to find alternative ways to visualize your data.
248+
249+
One way is to use [bar charts](https://plotly.com/python/bar-charts/). In our example, each bar indicates the coefficients of our linear regression model for each input feature. Our model was trained on the [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris).
224250

225251
```python
226252
import pandas as pd

0 commit comments

Comments
 (0)