Vous êtes sur la page 1sur 7

Step 6: Operationalize the Model InDatabase

Advanced Analytics Tutorial


SQL Server 2016 and later

Updated: April 19, 2016


Applies To: SQL Server 2016
In this step, you'll learn to operationalize the model using a stored procedure. This stored procedure can be called directly by
other applications, to make predictions on new observations.
In this step, you'll learn two methods for calling an R model from a stored procedure:
Batch scoring mode: Use a SELECT query to provide multiple rows of data. The stored procedure returns a table of
observations corresponding to the input cases.
Individual scoring mode: Pass a set of individual parameter values as input. The stored procedure returns a single row or
value.
First, let's see how scoring works in general.

Basic Scoring
The stored procedure PredictTip illustrates the basic syntax for wrapping a prediction call in a stored procedure.

CREATEPROCEDURE[dbo].[PredictTip]@inquerynvarchar(max)
AS
BEGIN

DECLARE@lmodel2varbinary(max)=(SELECTTOP1model
FROMnyc_taxi_models);
EXECsp_execute_external_script@language=N'R',
@script=N'
mod<unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,
predVarNames="Score",type="response",writeModelVars=FALSE,overwrite=TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
@input_data_1=@inquery,
@params=N'@modelvarbinary(max)',
@model=@lmodel2
WITHRESULTSETS((Scorefloat));


END

GO

The SELECT statement gets the serialized model from the database, and stores the model in the R variable mod for further
processing using R.
The new cases that will be scored are obtained from the TransactSQL query specified in @inquery, the first parameter to
the stored procedure. As the query data is read, the rows are saved in the default data frame, InputDataSet. This data
frame is passed to the rxPredict function in R, which generates the scores.
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,predVarNames=
"Score",type="response",writeModelVars=FALSE,overwrite=TRUE);
Because a data.frame can contain a single row, you can use the same code for batch or single scoring.
The value returned by the rxPredict function is a float that represents the probability that a tip of any amount will be
given.

Batch scoring using a SELECT query


Now let's see how batch scoring works.
1. Let's start by getting a smaller set of input data to work with.

selecttop10a.passenger_countaspassenger_count,
a.trip_time_in_secsastrip_time_in_secs,
a.trip_distanceastrip_distance,
a.dropoff_datetimeasdropoff_datetime,
dbo.fnCalculateDistance(pickup_latitude,pickup_longitude,
dropoff_latitude,dropoff_longitude)asdirect_distance
from
(
selectmedallion,hack_license,pickup_datetime,
passenger_count,trip_time_in_secs,trip_distance,
dropoff_datetime,pickup_latitude,pickup_longitude,dropoff_latitude,
dropoff_longitude
fromnyctaxi_sample
)a
leftouterjoin
(
selectmedallion,hack_license,pickup_datetime
fromnyctaxi_sample
tablesample(70percent)repeatable(98052)
)b
ona.medallion=b.medallionanda.hack_license=b.hack_licenseand

a.pickup_datetime=b.pickup_datetime
whereb.medallionisnull

This query creates a "top 10" list of trips with passenger count and other features needed to make a prediction.
Results
passenger_count trip_time_in_secs trip_distance dropoff_datetime direct_distance
1 283 0.7 20130327 14:54:50.000 0.5427964547
1 289 0.7 20130224 12:55:29.000 0.3797099614
1 214 0.7 20130626 13:28:10.000 0.6970098661
1 276 0.7 20130627 06:53:04.000 0.4478814362
1 282 0.7 20130221 07:59:54.000 0.5340645749
1 260 0.7 20130327 15:40:49.000 0.5513900727
1 230 0.7 20130205 09:47:59.000 0.5161578519
1 250 0.7 20130508 14:35:51.000 0.5626440915
1 280 0.7 20130511 12:22:01.000 0.5598517959
1 308 0.7 20130410 08:06:00.000 0.4478814362
You'll use this query as input to the stored procedure, PredictTipBatchMode, which was provided as part of the download.
2. First, take a minute to review the code of the stored procedure PredictTipBatchMode in Management Studio.

/******Object:StoredProcedure[dbo].[PredictTipBatchMode]******/
SETANSI_NULLSON
GO
SETQUOTED_IDENTIFIERON
GO
CREATEPROCEDURE[dbo].[PredictTipBatchMode]@inquerynvarchar(max)
AS
BEGIN
DECLARE@lmodel2varbinary(max)=(SELECTTOP1model
FROMnyc_taxi_models);
EXECsp_execute_external_script@language=N'R',
@script=N'
mod<unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,
predVarNames="Score",type="response",writeModelVars=FALSE,overwrite=
TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
@input_data_1=@inquery,
@params=N'@modelvarbinary(max)',
@model=@lmodel2
WITHRESULTSETS((Scorefloat));

END

3. To create predictions, you'll provide the query text in a variable and pass it as a parameter to the stored procedure, using
a TransactSQL statement like this.

Specifyinputquery
DECLARE@query_stringnvarchar(max)
SET@query_string='
selecttop10a.passenger_countaspassenger_count,
a.trip_time_in_secsastrip_time_in_secs,
a.trip_distanceastrip_distance,
a.dropoff_datetimeasdropoff_datetime,
dbo.fnCalculateDistance(pickup_latitude,pickup_longitude,
dropoff_latitude,dropoff_longitude)asdirect_distance
from
selectmedallion,hack_license,pickup_datetime,
passenger_count,trip_time_in_secs,trip_distance,
dropoff_datetime,pickup_latitude,pickup_longitude,dropoff_latitude,
dropoff_longitude
fromnyctaxi_sample
)a
leftouterjoin
(
selectmedallion,hack_license,pickup_datetime
fromnyctaxi_sample
tablesample(70percent)repeatable(98052)
)b
ona.medallion=b.medallionanda.hack_license=b.hack_licenseand
a.pickup_datetime=b.pickup_datetime
whereb.medallionisnull'
Callstoredprocedureforscoring
EXEC[dbo].[PredictTip]@inquery=@query_string;

4. The stored procedure returns a series of values representing the prediction for each of the "top 10 trips". Looking back at
the input values, all of the "top 10 trips" are singlepassenger trips with a relatively short trip distance. Based on the data,
the driver is very unlikely to get a tip on such trips.
Rather than returning just the yestip/notip results, you could also return the probability score for the prediction, and
then apply a WHERE clause to the Score column values to categorize the score as "likely to tip" or "unlikely to tip", using a
threshold value such as 0.5 or 0.7. This step is not included in the stored procedure but it would be easy to implement.

Score Individual Rows


Sometimes you want to pass in individual values from an application and get a single result based on those values. For example,
you could set up an Excel worksheet, web application, or Reporting Services report to call the stored procedure and provide

inputs typed or selected by users.


In this section, you'll learn how to create single predictions using a stored procedure.
1. Take a minute to review the code of the stored procedure PredictTipSingleMode, which was included as part of the
download.

CREATEPROCEDURE[dbo].[PredictTipSingleMode]@passenger_countint=0,
@trip_distancefloat=0,
@trip_time_in_secsint=0,
@pickup_latitudefloat=0,
@pickup_longitudefloat=0,
@dropoff_latitudefloat=0,
@dropoff_longitudefloat=0
AS
BEGIN
DECLARE@inquerynvarchar(max)=N'
SELECT*FROM[dbo].[fnEngineerFeatures](
@passenger_count,
@trip_distance,
@trip_time_in_secs,
@pickup_latitude,
@pickup_longitude,
@dropoff_latitude,
@dropoff_longitude)
'
DECLARE@lmodel2varbinary(max)=(SELECTTOP1model
FROMnyc_taxi_models);
EXECsp_execute_external_script@language=N'R',
@script=N'
mod<unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,
predVarNames="Score",type="response",writeModelVars=FALSE,overwrite=
TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
@input_data_1=@inquery,
@params=N'@modelvarbinary(max),@passenger_countint,@trip_distance
float,@trip_time_in_secsint,
@pickup_latitudefloat,@pickup_longitudefloat,@dropoff_latitudefloat
,@dropoff_longitudefloat',
@model=@lmodel2,
@passenger_count=@passenger_count,
@trip_distance=@trip_distance,
@trip_time_in_secs=@trip_time_in_secs,
@pickup_latitude=@pickup_latitude,
@pickup_longitude=@pickup_longitude,
@dropoff_latitude=@dropoff_latitude,
@dropoff_longitude=@dropoff_longitude
WITHRESULTSETS((Scorefloat));
END

This stored procedure takes multiple single values as input, such as passenger count, trip distance, and so forth.
If you call the stored procedure from an external application, make sure that the data matches the requirements of
the R model. This might include ensuring that the input data can be cast or converted to an R data type, or
validating data type and data length. For more information, see Working with R Data Types.
The stored procedure creates a score based on the stored R model.
2. Try it out, by providing the values manually.
Open a new Query window, and call the stored procedure, typing parameters for each of the feature columns.

EXEC[dbo].[PredictTipSingleMode]1,2.5,631,40.763958,73.973373,40.782139,73.977303

The values are for these feature columns , in order:


trip_time_in_secs
pickup_latitude
pickup_longitude
dropoff_latitude
dropoff_longitude
3. The results indicate that the probability of getting a tip is very low on these top 10 trips, all of which are singlepassenger
trips over a relatively short distance.

Conclusions
In this tutorial, you've learned how to work with R code embedded in stored procedures. The integration with TransactSQL
makes it much easier to deploy R models for prediction and to incorporate model retraining as part of an enterprise data
workflow.

Previous Step
Step 4: Create Data Features using TSQL

See Also
InDatabase Advanced Analytics for SQL Developers Tutorial
SQL Server R Services Tutorials

2016 Microsoft

Vous aimerez peut-être aussi