Skip to content

Commit 0109c0b

Browse files
committed
Merge remote-tracking branch 'refs/remotes/Microsoft/master'
2 parents c1c274c + 8d42e22 commit 0109c0b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2796
-4
lines changed
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Build a predictive model with Python using SQL Server 2017 Machine Learning Services
2+
3+
This sample shows how to create a predictive model in Python and operationalize it with SQL Server 2017
4+
5+
### Contents
6+
7+
[About this sample](#about-this-sample)<br/>
8+
[Before you begin](#before-you-begin)<br/>
9+
[Sample details](#sample-details)<br/>
10+
11+
12+
13+
<a name=about-this-sample></a>
14+
15+
## About this sample
16+
17+
Predictive modeling is a powerful way to add intelligence to your application. It enables applications to predict outcomes against new data.
18+
The act of incorporating predictive analytics into your applications involves two major phases:
19+
model training and model operationalization.
20+
21+
In this sample, you will learn how to create a predictive model in python and operationalize it with SQL Server vNext.
22+
23+
24+
<!-- Delete the ones that don't apply -->
25+
- **Applies to:** SQL Server 2017 CTP2.0 or higher
26+
- **Key features:**SQL Server Machine Learning Services
27+
- **Workload:** SQL Server Machine Learning Services
28+
- **Programming Language:** T-SQL, Python
29+
- **Authors:** Nellie Gustafsson
30+
- **Update history:** Getting started tutorial for SQL Server ML Services - Python
31+
32+
<a name=before-you-begin></a>
33+
34+
## Before you begin
35+
36+
To run this sample, you need the following prerequisites: </br>
37+
Download a DB backup file and restore it using Setup.sql. [Download DB](https://deve2e.azureedge.net/sqlchoice/static/TutorialDB.bak)
38+
39+
**Software prerequisites:**
40+
41+
<!-- Examples -->
42+
1. SQL Server 2017 CTP2.0 (or higher) with Machine Learning Services (Python) installed
43+
2. SQL Server Management Studio
44+
3. Python Tools for Visual Studio or another Python IDE
45+
46+
## Run this sample
47+
1. From SQL Server Management Studio or SQL Server Data Tools connect to your SQL Server vNext database and execute setup.sql to restore the sample DB you have downloaded </br>
48+
2. From SQL Server Management Studio or SQL Server Data Tools, open the rental_prediction.sql script </br>
49+
This script sets up: </br>
50+
Necessary tables </br>
51+
Creates stored procedure to train a model </br>
52+
Creates a stored procedure to predict using that model </br>
53+
Saves the predicted results to a DB table </br>
54+
3. You can also try the Python script on its own, connecting to SQL Server and getting data using RevoScalePy Rx functions. Just remember to point the Python environment to the corresponding path "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES" if you run in-db Python Server, or
55+
"C:\Program Files\Microsoft SQL Server\140\PYTHON_SERVER" if you have the standalone Machine Learning Server installed.
56+
57+
<a name=sample-details></a>
58+
59+
## Sample details
60+
61+
This sample shows how to create a predictive model with Python and generate predictions using the model and deploy that in SQL Server with SQL Server Machine Learning Services.
62+
63+
### rental_prediction.py
64+
The Python script that generates a predictive model and uses it to predict rental counts
65+
66+
### rental_prediction.sql
67+
Takes the Python code in rental_prediction.py and deploys it inside SQL Server. Creating stored procedures and tables for training, storing models and creating stored procedures for prediction.
68+
69+
### setup.sql
70+
Restores the sample DB (Make sure to update the path to the .bak file)
71+
72+
73+
74+
75+
76+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
-- Before we start, we need to restore the DB for this tutorial.
2+
-- Step1: Download the compressed backup file (https://deve2e.azureedge.net/sqlchoice/static/TutorialDB.bak)
3+
--Save the file on a location where SQL Server can access it. For example: C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\
4+
-- In a new query window in SSMS, execute the following restore statement, but REMEMBER TO CHANGE THE FILE PATHS
5+
-- to match the directories of your installation!
6+
USE master;
7+
GO
8+
RESTORE DATABASE TutorialDB
9+
FROM DISK = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\TutorialDB.bak'
10+
WITH
11+
MOVE 'TutorialDB' TO 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TutorialDB.mdf'
12+
,MOVE 'TutorialDB_log' TO 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TutorialDB.ldf';
13+
GO
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import pandas as pd
2+
from sklearn.linear_model import LinearRegression
3+
from sklearn.metrics import mean_squared_error
4+
5+
from revoscalepy.computecontext.RxInSqlServer import RxInSqlServer
6+
from revoscalepy.computecontext.RxInSqlServer import RxSqlServerData
7+
from revoscalepy.etl.RxImport import rx_import_datasource
8+
9+
10+
def get_rental_predictions():
11+
conn_str = 'Driver=SQL Server;Server=MYSQLSERVER;Database=TutorialDB;Trusted_Connection=True;'
12+
column_info = {
13+
"Year" : { "type" : "integer" },
14+
"Month" : { "type" : "integer" },
15+
"Day" : { "type" : "integer" },
16+
"RentalCount" : { "type" : "integer" },
17+
"WeekDay" : {
18+
"type" : "factor",
19+
"levels" : ["1", "2", "3", "4", "5", "6", "7"]
20+
},
21+
"Holiday" : {
22+
"type" : "factor",
23+
"levels" : ["1", "0"]
24+
},
25+
"Snow" : {
26+
"type" : "factor",
27+
"levels" : ["1", "0"]
28+
}
29+
}
30+
31+
data_source = RxSqlServerData(table="dbo.rental_data",
32+
connectionString=conn_str, colInfo=column_info)
33+
computeContext = RxInSqlServer(
34+
connectionString = conn_str,
35+
numTasks = 1,
36+
autoCleanup = False
37+
)
38+
39+
40+
RxInSqlServer(connectionString=conn_str, numTasks=1, autoCleanup=False)
41+
42+
# import data source and convert to pandas dataframe
43+
df = pd.DataFrame(rx_import_datasource(data_source))
44+
print("Data frame:", df)
45+
# Get all the columns from the dataframe.
46+
columns = df.columns.tolist()
47+
# Filter the columns to remove ones we don't want.
48+
columns = [c for c in columns if c not in ["Year"]]
49+
# Store the variable we'll be predicting on.
50+
target = "RentalCount"
51+
# Generate the training set. Set random_state to be able to replicate results.
52+
train = df.sample(frac=0.8, random_state=1)
53+
# Select anything not in the training set and put it in the testing set.
54+
test = df.loc[~df.index.isin(train.index)]
55+
# Print the shapes of both sets.
56+
print("Training set shape:", train.shape)
57+
print("Testing set shape:", test.shape)
58+
# Initialize the model class.
59+
lin_model = LinearRegression()
60+
# Fit the model to the training data.
61+
lin_model.fit(train[columns], train[target])
62+
# Generate our predictions for the test set.
63+
lin_predictions = lin_model.predict(test[columns])
64+
print("Predictions:", lin_predictions)
65+
# Compute error between our test predictions and the actual values.
66+
lin_mse = mean_squared_error(lin_predictions, test[target])
67+
print("Computed error:", lin_mse)
68+
69+
get_rental_predictions()
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
2+
USE TutorialDB;
3+
4+
-- Table containing ski rental data
5+
SELECT * FROM [dbo].[rental_data];
6+
7+
8+
9+
-------------------------- STEP 1 - Setup model table ----------------------------------------
10+
DROP TABLE IF EXISTS rental_py_models;
11+
GO
12+
CREATE TABLE rental_py_models (
13+
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
14+
model VARBINARY(MAX) NOT NULL
15+
);
16+
GO
17+
18+
19+
-------------------------- STEP 2 - Train model ----------------------------------------
20+
-- Stored procedure that trains and generates an R model using the rental_data and a decision tree algorithm
21+
DROP PROCEDURE IF EXISTS generate_rental_py_model;
22+
go
23+
CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT)
24+
AS
25+
BEGIN
26+
EXECUTE sp_execute_external_script
27+
@language = N'Python'
28+
, @script = N'
29+
30+
df = rental_train_data
31+
32+
# Get all the columns from the dataframe.
33+
columns = df.columns.tolist()
34+
35+
36+
# Store the variable well be predicting on.
37+
target = "RentalCount"
38+
39+
from sklearn.linear_model import LinearRegression
40+
41+
# Initialize the model class.
42+
lin_model = LinearRegression()
43+
# Fit the model to the training data.
44+
lin_model.fit(df[columns], df[target])
45+
46+
import pickle
47+
#Before saving the model to the DB table, we need to convert it to a binary object
48+
trained_model = pickle.dumps(lin_model)
49+
'
50+
51+
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
52+
, @input_data_1_name = N'rental_train_data'
53+
, @params = N'@trained_model varbinary(max) OUTPUT'
54+
, @trained_model = @trained_model OUTPUT;
55+
END;
56+
GO
57+
58+
------------------- STEP 3 - Save model to table -------------------------------------
59+
TRUNCATE TABLE rental_py_models;
60+
61+
DECLARE @model VARBINARY(MAX);
62+
EXEC generate_rental_py_model @model OUTPUT;
63+
64+
INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model);
65+
66+
SELECT * FROM rental_py_models;
67+
68+
69+
70+
------------------ STEP 4 - Use the model to predict number of rentals --------------------------
71+
DROP PROCEDURE IF EXISTS py_predict_rentalcount;
72+
GO
73+
CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
74+
AS
75+
BEGIN
76+
DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model);
77+
78+
EXEC sp_execute_external_script
79+
@language = N'Python'
80+
, @script = N'
81+
82+
83+
import pickle
84+
rental_model = pickle.loads(py_model)
85+
86+
87+
df = rental_score_data
88+
#print(df)
89+
90+
# Get all the columns from the dataframe.
91+
columns = df.columns.tolist()
92+
# Filter the columns to remove ones we dont want.
93+
# columns = [c for c in columns if c not in ["Year"]]
94+
95+
# Store the variable well be predicting on.
96+
target = "RentalCount"
97+
98+
# Generate our predictions for the test set.
99+
lin_predictions = rental_model.predict(df[columns])
100+
print(lin_predictions)
101+
102+
# Import the scikit-learn function to compute error.
103+
from sklearn.metrics import mean_squared_error
104+
# Compute error between our test predictions and the actual values.
105+
lin_mse = mean_squared_error(linpredictions, df[target])
106+
#print(lin_mse)
107+
108+
import pandas as pd
109+
predictions_df = pd.DataFrame(lin_predictions)
110+
OutputDataSet = pd.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1)
111+
'
112+
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday" from rental_data where Year = 2015'
113+
, @input_data_1_name = N'rental_score_data'
114+
, @params = N'@py_model varbinary(max)'
115+
, @py_model = @py_model
116+
with result sets (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float));
117+
118+
END;
119+
GO
120+
121+
122+
---------------- STEP 5 - Create DB table to store predictions -----------------------
123+
DROP TABLE IF EXISTS [dbo].[py_rental_predictions];
124+
GO
125+
--Create a table to store the predictions in
126+
CREATE TABLE [dbo].[py_rental_predictions](
127+
[RentalCount_Predicted] [int] NULL,
128+
[RentalCount_Actual] [int] NULL,
129+
[Month] [int] NULL,
130+
[Day] [int] NULL,
131+
[WeekDay] [int] NULL,
132+
[Snow] [int] NULL,
133+
[Holiday] [int] NULL,
134+
[Year] [int] NULL
135+
) ON [PRIMARY]
136+
GO
137+
138+
139+
---------------- STEP 6 - Save the predictions in a DB table -----------------------
140+
TRUNCATE TABLE py_rental_predictions;
141+
--Insert the results of the predictions for test set into a table
142+
INSERT INTO py_rental_predictions
143+
EXEC py_predict_rentalcount 'linear_model';
144+
145+
-- Select contents of the table
146+
SELECT * FROM py_rental_predictions;
147+

samples/features/r-services/README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
1-
# Samples for SQL Server R Services
1+
# Samples for SQL Server Machine Learning Services
22

3-
Go to the getting started tutorials to learn more about:
43

5-
[Predictive Modeling with R Services](https://www.microsoft.com/en-us/sql-server/developer-get-started/rprediction)
4+
Go to the getting started tutorials to learn more about:
65

76
[Customer Clustering with R Services](https://www.microsoft.com/en-us/sql-server/developer-get-started/rclustering)
87

9-
108
[Telco Customer Churn](Telco Customer Churn)
119

1210
Telco Customer Churn sample using SQL Server R Services.
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
2+
##################### STEP1 - Connect to DB and read data ####################
3+
4+
#Connection string to connect to SQL Server named instance
5+
connStr <- paste("Driver=SQL Server; Server=", "MYSQLSERVER",
6+
";Database=", "Tutorialdb", ";Trusted_Connection=true;", sep = "");
7+
8+
#Get the data from SQL Server Table
9+
SQL_rentaldata <- RxSqlServerData(table = "dbo.rental_data",
10+
connectionString = connStr, returnDataFrame = TRUE);
11+
12+
#Import the data into a data frame
13+
rentaldata <- rxImport(SQL_rentaldata);
14+
15+
#Let's see the structure of the data and the top rows
16+
# Ski rental data, giving the number of ski rentals on a given date
17+
head(rentaldata);
18+
19+
20+
##################### STEP2 - Clean and prepare the data ####################
21+
22+
#Changing the three factor columns to factor types
23+
#This helps when building the model because we are explicitly saying that these values are categorical
24+
rentaldata$Holiday <- factor(rentaldata$Holiday);
25+
rentaldata$Snow <- factor(rentaldata$Snow);
26+
rentaldata$WeekDay <- factor(rentaldata$WeekDay);
27+
28+
#Visualize the dataset after the change
29+
str(rentaldata);
30+
31+
##################### STEP3 - train model ####################
32+
33+
#Now let's split the dataset into 2 different sets
34+
#One set for training the model and the other for validating it
35+
train_data = rentaldata[rentaldata$Year < 2015,];
36+
test_data = rentaldata[rentaldata$Year == 2015,];
37+
38+
#Use this column to check the quality of the prediction against actual values
39+
actual_counts <- test_data$RentalCount;
40+
41+
#Model 1: Use rxLinMod to create a linear regression model. We are training the data using the training data set
42+
model_linmod <- rxLinMod(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = train_data);
43+
44+
#Model 2: Use rxDTree to create a decision tree model. We are training the data using the training data set
45+
model_dtree <- rxDTree(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = train_data);
46+
47+
48+
#################### STEP4 - Predict using the models ########################
49+
50+
#Use the models we just created to predict using the test data set.
51+
#That enables us to compare actual values of RentalCount from the two models and compare to the actual values in the test data set
52+
predict_linmod <- rxPredict(model_linmod, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
53+
54+
predict_dtree <- rxPredict(model_dtree, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
55+
56+
#Look at the top rows of the two prediction data sets.
57+
head(predict_linmod);
58+
head(predict_dtree);
59+
60+
#################### STEP5 - Compare models ########################
61+
#Now we will use the plotting functionality in R to viusalize the results from the predictions
62+
#We are plotting the difference between actual and predicted values for both models to compare accuracy
63+
par(mfrow = c(2, 1));
64+
plot(predict_linmod$RentalCount_Pred - predict_linmod$RentalCount, main = "Difference between actual and predicted. rxLinmod");
65+
plot(predict_dtree$RentalCount_Pred - predict_dtree$RentalCount, main = "Difference between actual and predicted. rxDTree");
66+

0 commit comments

Comments
 (0)