Skip to content

Commit 63fc85a

Browse files
authored
Merge pull request #214 from NelGson/master
R Services Getting Started update
2 parents 81c3de8 + c120944 commit 63fc85a

39 files changed

+2058
-4
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Build a predictive model with SQL Server Python
2+
3+
This sample shows how to create a predictive model in Python and operationalize it with SQL Server vNext.
4+
5+
### Contents
6+
7+
[About this sample](#about-this-sample)<br/>
8+
[Before you begin](#before-you-begin)<br/>
9+
[Sample details](#sample-details)<br/>
10+
[Related links](#related-links)<br/>
11+
12+
13+
<a name=about-this-sample></a>
14+
15+
## About this sample
16+
17+
Predictive modeling is a powerful way to add intelligence to your application. It enables applications to predict outcomes against new data.
18+
The act of incorporating predictive analytics into your applications involves two major phases:
19+
model training and model operationalization.
20+
21+
In this sample, you will learn how to create a predictive model in python and operationalize it with SQL Server vNext.
22+
23+
24+
<!-- Delete the ones that don't apply -->
25+
- **Applies to:** SQL Server vNext
26+
- **Key features:**SQL Server Machine Learning Services
27+
- **Workload:** SQL Server Machine Learning Services
28+
- **Programming Language:** T-SQL, Python
29+
- **Authors:** Nellie Gustafsson
30+
- **Update history:** Getting started tutorial for SQL Server ML Services - Python
31+
32+
<a name=before-you-begin></a>
33+
34+
## Before you begin
35+
36+
To run this sample, you need the following prerequisites: </br>
37+
Download a DB backup file and restore it using Setup.sql. [Download DB](https://deve2e.azureedge.net/sqlchoice/static/TutorialDB.bak)
38+
39+
**Software prerequisites:**
40+
41+
<!-- Examples -->
42+
1. SQL Server vNext CTP2.0 (or higher) with Machine Learning Services (Python) installed
43+
2. SQL Server Management Studio
44+
3. Python Tools for Visual Studio
45+
46+
## Run this sample
47+
1. From SQL Server Management Studio or SQL Server Data Tools connect to your SQL Server vNext database and execute setup.sql to restore the sample DB you have downloaded </br>
48+
2. From SQL Server Management Studio or SQL Server Data Tools, open the Predictive Model Python.sql script </br>
49+
This script sets up: </br>
50+
Necessary tables </br>
51+
Creates stored procedure to train a model </br>
52+
Creates a stored procedure to predict using that model </br>
53+
Saves the predicted results to a DB table </br>
54+
3. You can also try the python script on its own. Just remember to point the Python environment to the corresponding path "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES" if you run in-db Python Server, or
55+
"C:\Program Files\Microsoft SQL Server\140\PYTHON_SERVER" if you have the standalone Machine Learning Server installed.
56+
57+
<a name=sample-details></a>
58+
59+
## Sample details
60+
61+
This sample shows how to create a predictive model with Python and generate predictions using the model and deploy that in SQL Server with SQL Server Machine Learning Services.
62+
63+
### rental_prediction.py
64+
The Python script that generates a predictive model and uses it to predict rental counts
65+
66+
### rental_prediction.sql
67+
Takes the Python code in Predictive Model.py and deploys it inside SQL Server. Creating stored procedures and tables for training, storing models and creating stored procedures for prediction.
68+
69+
70+
71+
Service uses Tedious library for data access and built-in JSON functionalities that are available in SQL Server 2016 and Azure SQL Database.
72+
73+
<a name=disclaimers></a>
74+
75+
## Disclaimers
76+
The code included in this sample is not intended demonstrate some general guidance and architectural patterns for web development.
77+
It contains minimal code required to create a REST API.
78+
You can easily modify this code to fit the architecture of your application.
79+
80+
81+
<a name=related-links></a>
82+
83+
## Related Links
84+
<!-- Links to more articles. Remember to delete "en-us" from the link path. -->
85+
86+
For additional content, see these articles:
87+
88+
[SQL Server R Services - Upgrade and Installation FAQ](https://msdn.microsoft.com/en-us/library/mt653951.aspx)
89+
[Other SQL Server R Services Tutorials](https://msdn.microsoft.com/en-us/library/mt591993.aspx)
90+
[Watch a presentation about predictive modeling in SQL Server, that also goes through this sample](https://www.youtube.com/watch?v=YCyj9cdi4Nk&feature=youtu.be)
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import pandas as pd
2+
from sklearn.linear_model import LinearRegression
3+
from sklearn.metrics import mean_squared_error
4+
5+
from revoscalepy.computecontext.RxInSqlServer import RxInSqlServer
6+
from revoscalepy.computecontext.RxInSqlServer import RxSqlServerData
7+
from revoscalepy.etl.RxImport import rx_import_datasource
8+
9+
10+
def get_rental_predictions():
11+
conn_str = 'Driver=SQL Server;Server=MYSQLSERVER;Database=TutorialDB;Trusted_Connection=True;'
12+
column_info = {
13+
"Year" : { "type" : "integer" },
14+
"Month" : { "type" : "integer" },
15+
"Day" : { "type" : "integer" },
16+
"RentalCount" : { "type" : "integer" },
17+
"WeekDay" : {
18+
"type" : "factor",
19+
"levels" : ["1", "2", "3", "4", "5", "6", "7"]
20+
},
21+
"Holiday" : {
22+
"type" : "factor",
23+
"levels" : ["1", "0"]
24+
},
25+
"Snow" : {
26+
"type" : "factor",
27+
"levels" : ["1", "0"]
28+
}
29+
}
30+
31+
data_source = RxSqlServerData(table="dbo.rental_data",
32+
connectionString=conn_str, colInfo=column_info)
33+
computeContext = RxInSqlServer(
34+
connectionString = conn_str,
35+
numTasks = 1,
36+
autoCleanup = False
37+
)
38+
39+
40+
RxInSqlServer(connectionString=conn_str, numTasks=1, autoCleanup=False)
41+
42+
# import data source and convert to pandas dataframe
43+
df = pd.DataFrame(rx_import_datasource(data_source))
44+
print("Data frame:", df)
45+
# Get all the columns from the dataframe.
46+
columns = df.columns.tolist()
47+
# Filter the columns to remove ones we don't want.
48+
columns = [c for c in columns if c not in ["Year"]]
49+
# Store the variable we'll be predicting on.
50+
target = "RentalCount"
51+
# Generate the training set. Set random_state to be able to replicate results.
52+
train = df.sample(frac=0.8, random_state=1)
53+
# Select anything not in the training set and put it in the testing set.
54+
test = df.loc[~df.index.isin(train.index)]
55+
# Print the shapes of both sets.
56+
print("Training set shape:", train.shape)
57+
print("Testing set shape:", test.shape)
58+
# Initialize the model class.
59+
lin_model = LinearRegression()
60+
# Fit the model to the training data.
61+
lin_model.fit(train[columns], train[target])
62+
# Generate our predictions for the test set.
63+
lin_predictions = lin_model.predict(test[columns])
64+
print("Predictions:", lin_predictions)
65+
# Compute error between our test predictions and the actual values.
66+
lin_mse = mean_squared_error(lin_predictions, test[target])
67+
print("Computed error:", lin_mse)
68+
69+
get_rental_predictions()
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
2+
USE TutorialDB;
3+
4+
-- Table containing ski rental data
5+
SELECT * FROM [dbo].[rental_data];
6+
7+
8+
9+
-------------------------- STEP 1 - Setup model table ----------------------------------------
10+
DROP TABLE IF EXISTS rental_py_models;
11+
GO
12+
CREATE TABLE rental_py_models (
13+
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
14+
model VARBINARY(MAX) NOT NULL
15+
);
16+
GO
17+
18+
19+
-------------------------- STEP 2 - Train model ----------------------------------------
20+
-- Stored procedure that trains and generates an R model using the rental_data and a decision tree algorithm
21+
DROP PROCEDURE IF EXISTS generate_rental_py_model;
22+
go
23+
CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT)
24+
AS
25+
BEGIN
26+
EXECUTE sp_execute_external_script
27+
@language = N'Python'
28+
, @script = N'
29+
30+
df = rental_train_data
31+
32+
# Get all the columns from the dataframe.
33+
columns = df.columns.tolist()
34+
35+
36+
# Store the variable well be predicting on.
37+
target = "RentalCount"
38+
39+
from sklearn.linear_model import LinearRegression
40+
41+
# Initialize the model class.
42+
lin_model = LinearRegression()
43+
# Fit the model to the training data.
44+
lin_model.fit(df[columns], df[target])
45+
46+
import pickle
47+
#Before saving the model to the DB table, we need to convert it to a binary object
48+
trained_model = pickle.dumps(lin_model)
49+
'
50+
51+
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
52+
, @input_data_1_name = N'rental_train_data'
53+
, @params = N'@trained_model varbinary(max) OUTPUT'
54+
, @trained_model = @trained_model OUTPUT;
55+
END;
56+
GO
57+
58+
------------------- STEP 3 - Save model to table -------------------------------------
59+
TRUNCATE TABLE rental_py_models;
60+
61+
DECLARE @model VARBINARY(MAX);
62+
EXEC generate_rental_py_model @model OUTPUT;
63+
64+
INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model);
65+
66+
SELECT * FROM rental_py_models;
67+
68+
69+
70+
------------------ STEP 4 - Use the model to predict number of rentals --------------------------
71+
DROP PROCEDURE IF EXISTS py_predict_rentalcount;
72+
GO
73+
CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
74+
AS
75+
BEGIN
76+
DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model);
77+
78+
EXEC sp_execute_external_script
79+
@language = N'Python'
80+
, @script = N'
81+
82+
83+
import pickle
84+
rental_model = pickle.loads(py_model)
85+
86+
87+
df = rental_score_data
88+
#print(df)
89+
90+
# Get all the columns from the dataframe.
91+
columns = df.columns.tolist()
92+
# Filter the columns to remove ones we dont want.
93+
# columns = [c for c in columns if c not in ["Year"]]
94+
95+
# Store the variable well be predicting on.
96+
target = "RentalCount"
97+
98+
# Generate our predictions for the test set.
99+
lin_predictions = rental_model.predict(df[columns])
100+
print(lin_predictions)
101+
102+
# Import the scikit-learn function to compute error.
103+
from sklearn.metrics import mean_squared_error
104+
# Compute error between our test predictions and the actual values.
105+
lin_mse = mean_squared_error(linpredictions, df[target])
106+
#print(lin_mse)
107+
108+
import pandas as pd
109+
predictions_df = pd.DataFrame(lin_predictions)
110+
OutputDataSet = pd.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1)
111+
'
112+
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday" from rental_data where Year = 2015'
113+
, @input_data_1_name = N'rental_score_data'
114+
, @params = N'@py_model varbinary(max)'
115+
, @py_model = @py_model
116+
with result sets (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float));
117+
118+
END;
119+
GO
120+
121+
122+
---------------- STEP 5 - Create DB table to store predictions -----------------------
123+
DROP TABLE IF EXISTS [dbo].[py_rental_predictions];
124+
GO
125+
--Create a table to store the predictions in
126+
CREATE TABLE [dbo].[py_rental_predictions](
127+
[RentalCount_Predicted] [int] NULL,
128+
[RentalCount_Actual] [int] NULL,
129+
[Month] [int] NULL,
130+
[Day] [int] NULL,
131+
[WeekDay] [int] NULL,
132+
[Snow] [int] NULL,
133+
[Holiday] [int] NULL,
134+
[Year] [int] NULL
135+
) ON [PRIMARY]
136+
GO
137+
138+
139+
---------------- STEP 6 - Save the predictions in a DB table -----------------------
140+
TRUNCATE TABLE py_rental_predictions;
141+
--Insert the results of the predictions for test set into a table
142+
INSERT INTO py_rental_predictions
143+
EXEC py_predict_rentalcount 'linear_model';
144+
145+
-- Select contents of the table
146+
SELECT * FROM py_rental_predictions;
147+

samples/features/r-services/README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
1-
# Samples for SQL Server R Services
1+
# Samples for SQL Server Machine Learning Services
22

3-
Go to the getting started tutorials to learn more about:
43

5-
[Predictive Modeling with R Services](https://www.microsoft.com/en-us/sql-server/developer-get-started/rprediction)
4+
Go to the getting started tutorials to learn more about:
65

76
[Customer Clustering with R Services](https://www.microsoft.com/en-us/sql-server/developer-get-started/rclustering)
87

9-
108
[Telco Customer Churn](Telco Customer Churn)
119

1210
Telco Customer Churn sample using SQL Server R Services.
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
2+
##################### STEP1 - Connect to DB and read data ####################
3+
4+
#Connection string to connect to SQL Server named instance
5+
connStr <- paste("Driver=SQL Server; Server=", "MYSQLSERVER",
6+
";Database=", "Tutorialdb", ";Trusted_Connection=true;", sep = "");
7+
8+
#Get the data from SQL Server Table
9+
SQL_rentaldata <- RxSqlServerData(table = "dbo.rental_data",
10+
connectionString = connStr, returnDataFrame = TRUE);
11+
12+
#Import the data into a data frame
13+
rentaldata <- rxImport(SQL_rentaldata);
14+
15+
#Let's see the structure of the data and the top rows
16+
# Ski rental data, giving the number of ski rentals on a given date
17+
head(rentaldata);
18+
19+
20+
##################### STEP2 - Clean and prepare the data ####################
21+
22+
#Changing the three factor columns to factor types
23+
#This helps when building the model because we are explicitly saying that these values are categorical
24+
rentaldata$Holiday <- factor(rentaldata$Holiday);
25+
rentaldata$Snow <- factor(rentaldata$Snow);
26+
rentaldata$WeekDay <- factor(rentaldata$WeekDay);
27+
28+
#Visualize the dataset after the change
29+
str(rentaldata);
30+
31+
##################### STEP3 - train model ####################
32+
33+
#Now let's split the dataset into 2 different sets
34+
#One set for training the model and the other for validating it
35+
train_data = rentaldata[rentaldata$Year < 2015,];
36+
test_data = rentaldata[rentaldata$Year == 2015,];
37+
38+
#Use this column to check the quality of the prediction against actual values
39+
actual_counts <- test_data$RentalCount;
40+
41+
#Model 1: Use rxLinMod to create a linear regression model. We are training the data using the training data set
42+
model_linmod <- rxLinMod(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = train_data);
43+
44+
#Model 2: Use rxDTree to create a decision tree model. We are training the data using the training data set
45+
model_dtree <- rxDTree(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = train_data);
46+
47+
48+
#################### STEP4 - Predict using the models ########################
49+
50+
#Use the models we just created to predict using the test data set.
51+
#That enables us to compare actual values of RentalCount from the two models and compare to the actual values in the test data set
52+
predict_linmod <- rxPredict(model_linmod, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
53+
54+
predict_dtree <- rxPredict(model_dtree, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
55+
56+
#Look at the top rows of the two prediction data sets.
57+
head(predict_linmod);
58+
head(predict_dtree);
59+
60+
#################### STEP5 - Compare models ########################
61+
#Now we will use the plotting functionality in R to viusalize the results from the predictions
62+
#We are plotting the difference between actual and predicted values for both models to compare accuracy
63+
par(mfrow = c(2, 1));
64+
plot(predict_linmod$RentalCount_Pred - predict_linmod$RentalCount, main = "Difference between actual and predicted. rxLinmod");
65+
plot(predict_dtree$RentalCount_Pred - predict_dtree$RentalCount, main = "Difference between actual and predicted. rxDTree");
66+

0 commit comments

Comments
 (0)