Académique Documents
Professionnel Documents
Culture Documents
Basic Scoring
The stored procedure PredictTip illustrates the basic syntax for wrapping a prediction call in a stored procedure.
CREATEPROCEDURE[dbo].[PredictTip]@inquerynvarchar(max)
AS
BEGIN
DECLARE@lmodel2varbinary(max)=(SELECTTOP1model
FROMnyc_taxi_models);
EXECsp_execute_external_script@language=N'R',
@script=N'
mod<unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,
predVarNames="Score",type="response",writeModelVars=FALSE,overwrite=TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
@input_data_1=@inquery,
@params=N'@modelvarbinary(max)',
@model=@lmodel2
WITHRESULTSETS((Scorefloat));
END
GO
The SELECT statement gets the serialized model from the database, and stores the model in the R variable mod for further
processing using R.
The new cases that will be scored are obtained from the TransactSQL query specified in @inquery, the first parameter to
the stored procedure. As the query data is read, the rows are saved in the default data frame, InputDataSet. This data
frame is passed to the rxPredict function in R, which generates the scores.
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,predVarNames=
"Score",type="response",writeModelVars=FALSE,overwrite=TRUE);
Because a data.frame can contain a single row, you can use the same code for batch or single scoring.
The value returned by the rxPredict function is a float that represents the probability that a tip of any amount will be
given.
selecttop10a.passenger_countaspassenger_count,
a.trip_time_in_secsastrip_time_in_secs,
a.trip_distanceastrip_distance,
a.dropoff_datetimeasdropoff_datetime,
dbo.fnCalculateDistance(pickup_latitude,pickup_longitude,
dropoff_latitude,dropoff_longitude)asdirect_distance
from
(
selectmedallion,hack_license,pickup_datetime,
passenger_count,trip_time_in_secs,trip_distance,
dropoff_datetime,pickup_latitude,pickup_longitude,dropoff_latitude,
dropoff_longitude
fromnyctaxi_sample
)a
leftouterjoin
(
selectmedallion,hack_license,pickup_datetime
fromnyctaxi_sample
tablesample(70percent)repeatable(98052)
)b
ona.medallion=b.medallionanda.hack_license=b.hack_licenseand
a.pickup_datetime=b.pickup_datetime
whereb.medallionisnull
This query creates a "top 10" list of trips with passenger count and other features needed to make a prediction.
Results
passenger_count trip_time_in_secs trip_distance dropoff_datetime direct_distance
1 283 0.7 20130327 14:54:50.000 0.5427964547
1 289 0.7 20130224 12:55:29.000 0.3797099614
1 214 0.7 20130626 13:28:10.000 0.6970098661
1 276 0.7 20130627 06:53:04.000 0.4478814362
1 282 0.7 20130221 07:59:54.000 0.5340645749
1 260 0.7 20130327 15:40:49.000 0.5513900727
1 230 0.7 20130205 09:47:59.000 0.5161578519
1 250 0.7 20130508 14:35:51.000 0.5626440915
1 280 0.7 20130511 12:22:01.000 0.5598517959
1 308 0.7 20130410 08:06:00.000 0.4478814362
You'll use this query as input to the stored procedure, PredictTipBatchMode, which was provided as part of the download.
2. First, take a minute to review the code of the stored procedure PredictTipBatchMode in Management Studio.
/******Object:StoredProcedure[dbo].[PredictTipBatchMode]******/
SETANSI_NULLSON
GO
SETQUOTED_IDENTIFIERON
GO
CREATEPROCEDURE[dbo].[PredictTipBatchMode]@inquerynvarchar(max)
AS
BEGIN
DECLARE@lmodel2varbinary(max)=(SELECTTOP1model
FROMnyc_taxi_models);
EXECsp_execute_external_script@language=N'R',
@script=N'
mod<unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,
predVarNames="Score",type="response",writeModelVars=FALSE,overwrite=
TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
@input_data_1=@inquery,
@params=N'@modelvarbinary(max)',
@model=@lmodel2
WITHRESULTSETS((Scorefloat));
END
3. To create predictions, you'll provide the query text in a variable and pass it as a parameter to the stored procedure, using
a TransactSQL statement like this.
Specifyinputquery
DECLARE@query_stringnvarchar(max)
SET@query_string='
selecttop10a.passenger_countaspassenger_count,
a.trip_time_in_secsastrip_time_in_secs,
a.trip_distanceastrip_distance,
a.dropoff_datetimeasdropoff_datetime,
dbo.fnCalculateDistance(pickup_latitude,pickup_longitude,
dropoff_latitude,dropoff_longitude)asdirect_distance
from
selectmedallion,hack_license,pickup_datetime,
passenger_count,trip_time_in_secs,trip_distance,
dropoff_datetime,pickup_latitude,pickup_longitude,dropoff_latitude,
dropoff_longitude
fromnyctaxi_sample
)a
leftouterjoin
(
selectmedallion,hack_license,pickup_datetime
fromnyctaxi_sample
tablesample(70percent)repeatable(98052)
)b
ona.medallion=b.medallionanda.hack_license=b.hack_licenseand
a.pickup_datetime=b.pickup_datetime
whereb.medallionisnull'
Callstoredprocedureforscoring
EXEC[dbo].[PredictTip]@inquery=@query_string;
4. The stored procedure returns a series of values representing the prediction for each of the "top 10 trips". Looking back at
the input values, all of the "top 10 trips" are singlepassenger trips with a relatively short trip distance. Based on the data,
the driver is very unlikely to get a tip on such trips.
Rather than returning just the yestip/notip results, you could also return the probability score for the prediction, and
then apply a WHERE clause to the Score column values to categorize the score as "likely to tip" or "unlikely to tip", using a
threshold value such as 0.5 or 0.7. This step is not included in the stored procedure but it would be easy to implement.
CREATEPROCEDURE[dbo].[PredictTipSingleMode]@passenger_countint=0,
@trip_distancefloat=0,
@trip_time_in_secsint=0,
@pickup_latitudefloat=0,
@pickup_longitudefloat=0,
@dropoff_latitudefloat=0,
@dropoff_longitudefloat=0
AS
BEGIN
DECLARE@inquerynvarchar(max)=N'
SELECT*FROM[dbo].[fnEngineerFeatures](
@passenger_count,
@trip_distance,
@trip_time_in_secs,
@pickup_latitude,
@pickup_longitude,
@dropoff_latitude,
@dropoff_longitude)
'
DECLARE@lmodel2varbinary(max)=(SELECTTOP1model
FROMnyc_taxi_models);
EXECsp_execute_external_script@language=N'R',
@script=N'
mod<unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<rxPredict(modelObject=mod,data=InputDataSet,outData=NULL,
predVarNames="Score",type="response",writeModelVars=FALSE,overwrite=
TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
@input_data_1=@inquery,
@params=N'@modelvarbinary(max),@passenger_countint,@trip_distance
float,@trip_time_in_secsint,
@pickup_latitudefloat,@pickup_longitudefloat,@dropoff_latitudefloat
,@dropoff_longitudefloat',
@model=@lmodel2,
@passenger_count=@passenger_count,
@trip_distance=@trip_distance,
@trip_time_in_secs=@trip_time_in_secs,
@pickup_latitude=@pickup_latitude,
@pickup_longitude=@pickup_longitude,
@dropoff_latitude=@dropoff_latitude,
@dropoff_longitude=@dropoff_longitude
WITHRESULTSETS((Scorefloat));
END
This stored procedure takes multiple single values as input, such as passenger count, trip distance, and so forth.
If you call the stored procedure from an external application, make sure that the data matches the requirements of
the R model. This might include ensuring that the input data can be cast or converted to an R data type, or
validating data type and data length. For more information, see Working with R Data Types.
The stored procedure creates a score based on the stored R model.
2. Try it out, by providing the values manually.
Open a new Query window, and call the stored procedure, typing parameters for each of the feature columns.
EXEC[dbo].[PredictTipSingleMode]1,2.5,631,40.763958,73.973373,40.782139,73.977303
Conclusions
In this tutorial, you've learned how to work with R code embedded in stored procedures. The integration with TransactSQL
makes it much easier to deploy R models for prediction and to incorporate model retraining as part of an enterprise data
workflow.
Previous Step
Step 4: Create Data Features using TSQL
See Also
InDatabase Advanced Analytics for SQL Developers Tutorial
SQL Server R Services Tutorials
2016 Microsoft