Skip to content

Commit 3448c2a

Browse files
Added machine learning services folder and python tutorial
1 parent 91451d6 commit 3448c2a

File tree

3 files changed

+312
-0
lines changed

3 files changed

+312
-0
lines changed
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Build a predictive model with SQL Server Python
2+
3+
This sample shows how to create a predictive model in Python and operationalize it with SQL Server vNext.
4+
5+
### Contents
6+
7+
[About this sample](#about-this-sample)<br/>
8+
[Before you begin](#before-you-begin)<br/>
9+
[Sample details](#sample-details)<br/>
10+
[Related links](#related-links)<br/>
11+
12+
13+
<a name=about-this-sample></a>
14+
15+
## About this sample
16+
17+
Predictive modeling is a powerful way to add intelligence to your application. It enables applications to predict outcomes against new data.
18+
The act of incorporating predictive analytics into your applications involves two major phases:
19+
model training and model operationalization.
20+
21+
In this sample, you will learn how to create a predictive model in python and operationalize it with SQL Server vNext.
22+
23+
24+
<!-- Delete the ones that don't apply -->
25+
- **Applies to:** SQL Server vNext
26+
- **Key features:**SQL Server Machine Learning Services
27+
- **Workload:** SQL Server Machine Learning Services
28+
- **Programming Language:** T-SQL, Python
29+
- **Authors:** Nellie Gustafsson
30+
- **Update history:** Getting started tutorial for SQL Server ML Services - Python
31+
32+
<a name=before-you-begin></a>
33+
34+
## Before you begin
35+
36+
To run this sample, you need the following prerequisites: </br>
37+
Download a DB backup file and restore it using Setup.sql. [Download DB](https://deve2e.azureedge.net/sqlchoice/static/TutorialDB.bak)
38+
39+
**Software prerequisites:**
40+
41+
<!-- Examples -->
42+
1. SQL Server vNext CTP2.0 (or higher) with Machine Learning Services (Python) installed
43+
2. SQL Server Management Studio
44+
3. Python Tools for Visual Studio
45+
46+
## Run this sample
47+
1. From SQL Server Management Studio or SQL Server Data Tools connect to your SQL Server vNext database and execute setup.sql to restore the sample DB you have downloaded </br>
48+
2. From SQL Server Management Studio or SQL Server Data Tools, open the Predictive Model Python.sql script </br>
49+
This script sets up: </br>
50+
Necessary tables </br>
51+
Creates stored procedure to train a model </br>
52+
Creates a stored procedure to predict using that model </br>
53+
Saves the predicted results to a DB table </br>
54+
3. You can also try the python script on its own. Just remember to point the Python environment to the corresponding path "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES" if you run in-db Python Server, or
55+
"C:\Program Files\Microsoft SQL Server\140\PYTHON_SERVER" if you have the standalone Machine Learning Server installed.
56+
57+
<a name=sample-details></a>
58+
59+
## Sample details
60+
61+
This sample shows how to create a predictive model with Python and generate predictions using the model and deploy that in SQL Server with SQL Server Machine Learning Services.
62+
63+
### Predictive Model.py
64+
The Python script that generates a predictive model and uses it to predict rental counts
65+
66+
### Predictive Model.SQL
67+
Takes the Python code in Predictive Model.py and deploys it inside SQL Server. Creating stored procedures and tables for training, storing models and creating stored procedures for prediction.
68+
69+
### app.js
70+
File that contains startup code.
71+
### db.js
72+
File that contains functions that wrap Tedious library
73+
### predictions.js
74+
File that contains action that will be called to get the predictions
75+
76+
Service uses Tedious library for data access and built-in JSON functionalities that are available in SQL Server 2016 and Azure SQL Database.
77+
78+
<a name=disclaimers></a>
79+
80+
## Disclaimers
81+
The code included in this sample is not intended demonstrate some general guidance and architectural patterns for web development.
82+
It contains minimal code required to create a REST API.
83+
You can easily modify this code to fit the architecture of your application.
84+
85+
86+
<a name=related-links></a>
87+
88+
## Related Links
89+
<!-- Links to more articles. Remember to delete "en-us" from the link path. -->
90+
91+
For additional content, see these articles:
92+
93+
[SQL Server R Services - Upgrade and Installation FAQ](https://msdn.microsoft.com/en-us/library/mt653951.aspx)
94+
[Other SQL Server R Services Tutorials](https://msdn.microsoft.com/en-us/library/mt591993.aspx)
95+
[Watch a presentation about predictive modeling in SQL Server, that also goes through this sample](https://www.youtube.com/watch?v=YCyj9cdi4Nk&feature=youtu.be)
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import pandas as pd
2+
from sklearn.linear_model import LinearRegression
3+
from sklearn.metrics import mean_squared_error
4+
5+
from revoscalepy.computecontext.RxInSqlServer import RxInSqlServer
6+
from revoscalepy.computecontext.RxInSqlServer import RxSqlServerData
7+
from revoscalepy.etl.RxImport import rx_import_datasource
8+
9+
10+
def get_rental_predictions():
11+
conn_str = 'Driver=SQL Server;Server=NELLIELAPTOP\\SQLSERVER20;Database=TutorialDB;Trusted_Connection=True;'
12+
column_info = {
13+
"Year" : { "type" : "integer" },
14+
"Month" : { "type" : "integer" },
15+
"Day" : { "type" : "integer" },
16+
"RentalCount" : { "type" : "integer" },
17+
"WeekDay" : {
18+
"type" : "factor",
19+
"levels" : ["1", "2", "3", "4", "5", "6", "7"]
20+
},
21+
"Holiday" : {
22+
"type" : "factor",
23+
"levels" : ["1", "0"]
24+
},
25+
"Snow" : {
26+
"type" : "factor",
27+
"levels" : ["1", "0"]
28+
}
29+
}
30+
31+
data_source = RxSqlServerData(table="dbo.rental_data",
32+
connectionString=conn_str, colInfo=column_info)
33+
computeContext = RxInSqlServer(
34+
connectionString = conn_str,
35+
numTasks = 1,
36+
autoCleanup = False
37+
)
38+
39+
40+
RxInSqlServer(connectionString=conn_str, numTasks=1, autoCleanup=False)
41+
42+
# import data source and convert to pandas dataframe
43+
df = pd.DataFrame(rx_import_datasource(data_source))
44+
print("Data frame:", df)
45+
# Get all the columns from the dataframe.
46+
columns = df.columns.tolist()
47+
# Filter the columns to remove ones we don't want.
48+
columns = [c for c in columns if c not in ["Year"]]
49+
# Store the variable we'll be predicting on.
50+
target = "RentalCount"
51+
# Generate the training set. Set random_state to be able to replicate results.
52+
train = df.sample(frac=0.8, random_state=1)
53+
# Select anything not in the training set and put it in the testing set.
54+
test = df.loc[~df.index.isin(train.index)]
55+
# Print the shapes of both sets.
56+
print("Training set shape:", train.shape)
57+
print("Testing set shape:", test.shape)
58+
# Initialize the model class.
59+
lin_model = LinearRegression()
60+
# Fit the model to the training data.
61+
lin_model.fit(train[columns], train[target])
62+
# Generate our predictions for the test set.
63+
lin_predictions = lin_model.predict(test[columns])
64+
print("Predictions:", lin_predictions)
65+
# Compute error between our test predictions and the actual values.
66+
lin_mse = mean_squared_error(lin_predictions, test[target])
67+
print("Computed error:", lin_mse)
68+
69+
get_rental_predictions()
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
2+
USE TutorialDB;
3+
4+
-- Table containing ski rental data
5+
SELECT * FROM [dbo].[rental_data];
6+
7+
8+
9+
-------------------------- STEP 1 - Setup model table ----------------------------------------
10+
DROP TABLE IF EXISTS rental_py_models;
11+
GO
12+
CREATE TABLE rental_py_models (
13+
model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
14+
model VARBINARY(MAX) NOT NULL
15+
);
16+
GO
17+
18+
19+
-------------------------- STEP 2 - Train model ----------------------------------------
20+
-- Stored procedure that trains and generates an R model using the rental_data and a decision tree algorithm
21+
DROP PROCEDURE IF EXISTS generate_rental_py_model;
22+
go
23+
CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT)
24+
AS
25+
BEGIN
26+
EXECUTE sp_execute_external_script
27+
@language = N'Python'
28+
, @script = N'
29+
import pandas as pd
30+
df = pd.DataFrame(rental_train_data)
31+
print(df)
32+
33+
# Get all the columns from the dataframe.
34+
columns = df.columns.tolist()
35+
36+
37+
# Store the variable well be predicting on.
38+
target = "RentalCount"
39+
40+
from sklearn.linear_model import LinearRegression
41+
42+
# Initialize the model class.
43+
lin_model = LinearRegression()
44+
# Fit the model to the training data.
45+
lin_model.fit(df[columns], df[target])
46+
47+
import pickle
48+
#Before saving the model to the DB table, we need to convert it to a binary object
49+
trained_model = pickle.dumps(lin_model)
50+
'
51+
52+
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
53+
, @input_data_1_name = N'rental_train_data'
54+
, @params = N'@trained_model varbinary(max) OUTPUT'
55+
, @trained_model = @trained_model OUTPUT;
56+
END;
57+
GO
58+
59+
------------------- STEP 3 - Save model to table -------------------------------------
60+
TRUNCATE TABLE rental_py_models;
61+
62+
DECLARE @model VARBINARY(MAX);
63+
EXEC generate_rental_py_model @model OUTPUT;
64+
65+
INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model);
66+
67+
SELECT * FROM rental_py_models;
68+
69+
70+
71+
------------------ STEP 4 - Use the model to predict number of rentals --------------------------
72+
DROP PROCEDURE IF EXISTS py_predict_rentalcount;
73+
GO
74+
CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
75+
AS
76+
BEGIN
77+
DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model);
78+
79+
EXEC sp_execute_external_script
80+
@language = N'Python'
81+
, @script = N'
82+
83+
84+
import pickle
85+
rental_model = pickle.loads(py_model)
86+
87+
import pandas as pd
88+
df = pd.DataFrame(rental_score_data)
89+
#print(df)
90+
91+
# Get all the columns from the dataframe.
92+
columns = df.columns.tolist()
93+
# Filter the columns to remove ones we dont want.
94+
# columns = [c for c in columns if c not in ["Year"]]
95+
96+
# Store the variable well be predicting on.
97+
target = "RentalCount"
98+
99+
# Generate our predictions for the test set.
100+
lin_predictions = rental_model.predict(df[columns])
101+
print(lin_predictions)
102+
103+
# Import the scikit-learn function to compute error.
104+
from sklearn.metrics import mean_squared_error
105+
# Compute error between our test predictions and the actual values.
106+
lin_mse = mean_squared_error(linpredictions, df[target])
107+
#print(lin_mse)
108+
109+
import pandas as pd
110+
predictions_df = pd.DataFrame(lin_predictions)
111+
OutputDataSet = pd.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1)
112+
'
113+
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday" from rental_data where Year = 2015'
114+
, @input_data_1_name = N'rental_score_data'
115+
, @params = N'@py_model varbinary(max)'
116+
, @py_model = @py_model
117+
with result sets (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float));
118+
119+
END;
120+
GO
121+
122+
123+
---------------- STEP 5 - Create DB table to store predictions -----------------------
124+
DROP TABLE IF EXISTS [dbo].[py_rental_predictions];
125+
GO
126+
--Create a table to store the predictions in
127+
CREATE TABLE [dbo].[py_rental_predictions](
128+
[RentalCount_Predicted] [int] NULL,
129+
[RentalCount_Actual] [int] NULL,
130+
[Month] [int] NULL,
131+
[Day] [int] NULL,
132+
[WeekDay] [int] NULL,
133+
[Snow] [int] NULL,
134+
[Holiday] [int] NULL,
135+
[Year] [int] NULL
136+
) ON [PRIMARY]
137+
GO
138+
139+
140+
---------------- STEP 6 - Save the predictions in a DB table -----------------------
141+
TRUNCATE TABLE py_rental_predictions;
142+
--Insert the results of the predictions for test set into a table
143+
INSERT INTO py_rental_predictions
144+
EXEC py_predict_rentalcount 'linear_model';
145+
146+
-- Select contents of the table
147+
SELECT * FROM py_rental_predictions;
148+

0 commit comments

Comments
 (0)