Vous êtes sur la page 1sur 10

SQL Server 2005 Data Mining Page 1

asp:feature
LANGUAGES: VB.NET
Subscription Services ASP.NET VERSIONS: ALL
Subscribe

SQL
Renew Subscription
asp.netNOW Newsletter
Change of Address

Server
Pay An Invoice
Subscription Packages

2005
asp.netPRO
Articles
Affiliate Spotlight

Data
411asp.net Directory
New Products

Mining
Book Reviews
Product Reviews
Opinion
Back Issues
Reprints/E-prints Create a Web Cross-
Search
Downloads sell Application
Premium Downloads

Informant By Raman Iyer and Jesper Lind


Contact Us
Advertise with Us The concept of cross-sell is
Write For Us familiar to most of us. W hat
your friendly neighborhood
McDonald’s salesperson does Latest Features
when you order a cheeseburger • NUnitASP
is exactly what Amazon.com or
• Under the Hood
buy.com are doing online when
you add items to your shopping • Dynamic Dropdowns
cart and you get a list of other • Validate User-entered
items you might also like. Data

You can add this functionality to • Total Recall


your ASP.NET page by
Article Rating
employing the power of data
mining, using simple SQL-like Rate this article on a
queries to produce high-quality scale from 0 to 5
recommendations. Microsoft j 5 Best
k
l
m
n
SQL Server 2005, currently
j 4
k
l
m
n
available to over 200,000
MSDN subscribers in Beta 2, j 3
k
l
m
n
includes advanced data mining j 2
k
l
m
n
capabilities that are available
j 1
k
l
m
n
programmatically via standard
interfaces like ADO.NET. This j 0 Worst
k
l
m
n
will allow you to integrate cross-
sell into your W eb store Submit
application with minimal effort.
Email
Before developing a W eb
cross-sell application, we need
to build the server-side
intelligence that will enable the Tell a friend
application to come up with about this article!
smart product
recommendations. This
process involves:
Preparing the data you already have about past customers
for mining;
Designing a mining model for the purpose of making
recommendations to new customers;
Deploying the model to Analysis Server and training it with
the data prepared earlier; and
Setting up security to allow ASP.NET to query the trained
model.

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 2

The first part of this article will explain key concepts and
walk you through the above process using the SQL
Server 2005 Data Mining tools. In the latter half we’ll
demonstrate the ease with which you can integrate the
predictive power of the mining model into your ASP.NET
W eb application using straightforward database access
code.

Before You Start


Install Microsoft SQL Server 2005 Analysis Services Beta
2. This will set up the server components as well as the
design and management tools that we’ll use in this article:
Business Intelligence Studio and SQL Server
Management Studio, respectively. W e also recommend
you go over the data mining tutorial included with Beta 2.
Note: W e’ll refer to your running instance of SQL Server
2005 Analysis Services as “Analysis Server” in the rest of
the article. The service shows up as “Analysis Services
(MSSQLSERVER)” under Services in Administrative
Tools.

Mining Your Customer Purchase Data


Here we outline the process of building the back-end
framework that mines your historical customer movie
purchase transactions and extracts the knowledge needed
to make cross-sell recommendations to new customers.
This knowledge is embedded in the mining model that
we’ll design.

Data preparation is an important aspect of any data


mining process. W ith SQL Server Data Mining, however,
it is also possible to mine the transaction data in your
relational database directly. For simplicity, we’ll take this
approach and assume that a single transaction table
contains your customer purchase information, as shown
in Figure 1.

Figure 1: Movie purchase data used by sample application.

The Analysis Services project in SQL Server 2005


Business Intelligence Development Studio provides the
framework for modeling data and building a mining model
that learns customer buying patterns from existing data
gathered from prior purchases. W e then use the trained
model to generate recommendations for new customers.

The first step is to identify the entity whose behavior we


are interested in analyzing for the purpose of our cross-
sell application. A case represents all information (also
referred to as “attributes”) known about this entity. In this
scenario, each distinct customer in the CustomerMovies
table and the set of movies they purchased forms a case.
SQL 2005 Data Mining uses the concept of a nested
table to represent a variable-length collection of attributes
of the same kind associated with a case. For each
customer there is a set of rows containing the list of
movies purchased, which can be represented as a nested
table (as shown in Figure 2).

Figure 2: The mining model’s view of the customer movie purchase data.

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 3

The definition of a case and its associated attributes is


known as a mining structure in SQL 2005 Data Mining.

The next step is to build a mining model by selecting a


mining algorithm and specifying how the columns in the
mining structure will be used by the algorithm to process
the input data and extract useful knowledge from it. The
Association Rules algorithm is a good fit for our scenario.
It learns which items are likely to be bought together and
utilizes that information to predict other items given the
items the customer has selected. Marking the Movies
nested table as both Input and Predictable enables the
model to make predictions using the movies provided as
input. Note that the mining model in SQL Server 2005
Data Mining is a database object that holds both the
definition of the input to the knowledge extraction process
and the output, which consists of patterns or rules
learned by applying the selected data mining algorithm to
process the input data.

As we’ll see, the Mining Model W izard in SQL Server


2005 Business Intelligence Development Studio builds a
mining structure and a model using the algorithm selected
on the first page.

Next, the mining model definition is sent as part of a


deployment package to the server where it is trained.
Training cases consist of information we have collected
from past purchases. The deployment package is
generated from the Analysis Services project that we
design the model in, and it includes bindings to the data
source that Analysis Server uses for obtaining the training
cases.

To improve the quality of recommendations made for new


customers the model can be periodically re-trained as
more customer data is added to the transaction database.
For large datasets this will typically be carried out during
off-peak hours or against a replica of the transaction
database. SQL Server Data Transformation Services
(DTS) can be used to set up a package to perform such
periodic updates.

Finally, we must set up permissions so our application


can query the trained model.

Building a Cross-sell Mining Model in SQL Server


2005
See the end of this article for information about
downloading the complete Analysis Server project built
using the steps outlined here for producing the mining
model we’ll utilize in the ASP.NET code sample later in
this article:
1) Create a new Analysis Services project named
MovieRecommendations in Business Intelligence
Development Studio.
2) Add a new DataSource pointing to
MovieData.mdb, the Access database included with the
sample project (available for download; see end of article
for details).
3) Add a DataSource View based on the
DataSource. Select the only table, CustomerMovies,
present in the DataSource.
4) Right-click on the Mining Models collection and
select New Mining Model to launch the Mining Model
W izard.
5) Pick From existing relational database or data
warehouse on the Select Definition Method page and click
Next.
6) Pick Microsoft Association Rules as the data
mining technique to use on the next page.
7) Select the DataSource View created in step 3.

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 4

8) On the next page, the single CustomerMovies


table present in our DataSource View is shown. Mark it as
both Case and Nested.
9) Click Next to go to the Training Data page where
we need to specify the columns we are interested in
including from each selected table (see Figure 3). In our
scenario, the CustomerMovies table serves as the source
for our cases as well as the nested table data associated
with each case (this is why it is listed twice). From the
first instance pick CustomerId as the key since it uniquely
identifies each customer. From the second instance pick
Movie as Key, Input, and Predictable.
10) Click Finish to complete the wizard and build the
Movie Recommendations mining structure and model (see
Figure 4).

Figure 3: Selecting columns we are interested in modeling for cross-sell in the Data
Mining Wizard.

Figure 4: The cross-sell mining model in Business Intelligence Development Studio.

The above steps create a definition of our cross-sell


mining model and associated objects in the development
environment. There are two steps to deploy the mining
model to the Analysis Server and train it:
1) Right-click on the MovieRecommendations project
in the Solution Explorer and select Properties. Verify that
the Server property in the Deployment section of
Configuration Properties points to the server hosting your
Analysis Services instance. Close the dialog box.
2) Right-click again on the MovieRecommendations
project in the Solution Explorer and select Deploy. This
sends the client-side definitions to the server and initiates
training of the mining model.

W e must set up access permissions in Analysis Server


for IIS using the SQL management tool for our ASP.NET
application to use the trained mining model:
1) Open SQL Server Management Studio.

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 5

2) Click Connect in the Object Explorer, select


Analysis Server, and connect to the Analysis Server
hosting your model.
3) Locate the MovieRecommendations database in
the Databases collection, right click on Roles, and select
New Role. This brings up the Create Role dialog box.
4) On the General page enter Internet_User as the
role name. Check the Read Definition checkbox under
Set the database permissions for this role.
5) Click Membership in the left pane and select a
page. Add your IIS user (the default is
IUSR_machinename) to the role by clicking Add.
6) Now click Mining Structure in the left pane. This
shows the Customer Movies mining structure and the
Customer Movies mining model owned by the mining
structure. Drop down the permission list under Access
and select Read for both objects. Also check the Read
Definition checkbox for both.
7) Click OK to add the Role with the above
permission set.

Recommending Products Based on the User’s


Shopping Basket
Now we’re ready to produce movie recommendations in
our W eb application by running a SQL-like query against
the Analysis Server that holds our trained mining model.

W e’ve put together a minimal application (shown in Figure


5) that demonstrates the ideas behind a real deployment,
focusing on the generation of the prediction query for
getting recommendations. The W eb customer is assumed
to have one or more items in the shopping basket, and for
simplicity we have a text box where items can be entered
manually (separated by semicolons). Clicking Add Items
to Cart displays the items in the shopping basket and also
shows a list of recommendations.

Figure 5: A simple shopping basket application.

The code behind the button click is shown in Figure 6;


you can see that it’s quite simple.

Private Sub Button1_Click(ByVal sender As


Object, _
ByVal e As System.EventArgs) ' Handles
Me.Button1.Click

' Parse the input into an ArrayList of strings.


Dim alInputItems As New ArrayList()
Dim splitchar As Char() = {";"c}
Dim szInputItems As String() =
Me.TextBox1.Text.Split(splitchar, 20)
Dim i As Integer
For i = 0 To szInputItems.Length - 1
alInputItems.Add(szInputItems(i).Trim())
Next i

' Add items to the shopping basket.


dgShoppingBasket.DataSource = alInputItems
dgShoppingBasket.DataBind()

' Get top 5 recommendations.


Dim alRecommendedItems As New ArrayList(5)

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 6

GetRecommendations(alInputItems,
alRecommendedItems)

' Display recommendations.


dgRecommendations.DataSource =
alRecommendedItems
dgRecommendations.DataBind()
End Sub 'Button1_Click
Figure 6: Populate shopping basket and recommendations.

The method builds an ArrayList of strings from the items


in the shopping basket and passes it to the
GetRecommendations subroutine, requesting the top five
recommendations based on the input items. W e use two
DataGrid objects: dgShoppingBasket, to hold the items in
the user’s shopping basket; and dgRecommendations, to
display the generated recommendations. The real
workhorse is the GetRecommendations subroutine.

The core of the GetRecommendations subroutine is the


construction of the prediction join query (see Figure 7)
that gets sent to Analysis Server and returns the list of
five recommendations.

SELECT FLATTENED
TopCount(Predict([Customer Movies],
INCLUDE_STATISTICS),
$AdjustedProbability,
5)
FROM [Movie Recommendations]
NATURAL PREDICTION JOIN
( SELECT ( SELECT 'Star Wars' AS [Movie]
UNION
SELECT 'The Matrix' AS
[Movie] )
AS [Customer Movies] ) AS t
Figure 7: Obtain recommendations using DMX prediction join against mining model.

The SQL-like query language supported by Analysis


Server for querying mining models is called DMX. The
DMX query in Figure 7 performs a prediction join that
takes the two movies from the user’s shopping basket,
forms a case, and joins it with the mining model to
produce an output rowset containing a list of predicted
recommendations.

The GetRecommendations subroutine, shown in Figure 8,


stores most of the query in compile-time string templates.
W hat needs to be filled in is the input data.

Private Shared Sub GetRecommendations( _


ByVal vInputItems As ArrayList, _
ByRef vRecommendedItems As ArrayList)

' Templates for generating DMX prediction join


statement.
Dim strDMX1 As String = _
"SELECT FLATTENED TopCount(" + _
"Predict([Customer Movies],
INCLUDE_STATISTICS)," + _
"$AdjustedProbability, 5) From [Movie
Recommendations] " + _
"NATURAL PREDICTION JOIN (SELECT ("
Dim strDMX2 As String = ") AS [Customer
Movies]) AS t"

' Iterate shopping basket and produce input


case.
Dim cItems As Integer = vInputItems.Count
Dim strDMX As String = ""
Dim i As Integer
For i = 0 To cItems - 1
Dim item As String = vInputItems(i).ToString()
item = item.Replace("'", "''")
strDMX += "SELECT " + "'" + item + "' AS "+
"[Movie]"
If i < cItems - 1 Then
strDMX += " UNION "
End If
Next i

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 7

' Put together DMX prediction query to


' get 5 recommendations.
strDMX = strDMX1 + strDMX + strDMX2

' Connect to Analysis Server and execute query.


Dim asSession As New AnalysisServerSession
asSession.Connect()
If False =
asSession.ExecuteAndFetchResult(strDMX) Then
Return
End If

' Read prediction results and build list


' of recommendations.
vRecommendedItems.Clear()
While asSession.asDataReader.Read()
Dim type As String =
asSession.asDataReader.GetDataTypeName(0)

If type = "DBTYPE_WVARCHAR" Or type = "String"


Then
Try
Dim val As String =
asSession.asDataReader.GetString(0)
vRecommendedItems.Add(val)
Catch e As Exception
Console.WriteLine(e.Message)
End Try
End If
End While

' Disconnect from Analysis Server.


asSession.DisConnect()

End Sub ' GetRecommendations.


Figure 8: Convert shopping basket to input case for mining model prediction and get
recommendations.

As explained briefly during the mining model design


phase, data is passed to mining models in the form of
cases. Each case represents all the information we have
about a single customer. W hen we make a prediction for
a new customer, the known information about the
customer is passed as an input case to the mining model.
In this scenario the input case is the list of movies in the
customer’s shopping basket.

Note that the list of movies was represented as a nested


table in the mining model definition; therefore, we need to
supply the input case in the same form. This is
accomplished by the nested sub-select after the
NATURAL PREDICTION JOIN clause in Figure 7. The
UNION operation is used to add each movie as a
separate row in the nested table. W hen we supply cases
inline like this, we don’t need to provide a value for the
case key, CustomerId, since that is not used for the
prediction operation.

W ith the above input case data format in mind it’s now
easy to walk through the GetRecommendations code and
understand what it’s doing.

W e start the query generation process by declaring


templates for the DMX statement. W e then iterate over
the input items, the contents of the user’s shopping
basket, to produce an input case with a nested table
format as described above. The input case is appended to
the rest of the template to produce the query that will be
sent to the server.

Next, an instance of the AnalysisServerSession class is


instantiated to connect to Analysis Server and execute the
query against the trained mining model. The code for this
class is included in the download. The
AnalysisServerSession class uses ADOMD.NET, the
managed provider for Analysis Server, to connect to the
server. ADOMD.NET implements the same connection,
command, and data reader interfaces as standard
ADO.NET (System.Data.Oledb), so the rest of the code

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 8

looks exactly like ADO.NET code — but offers better


performance. (To use ADOMD.NET you’ll need to add a
reference to Microsoft.AnalysisServices.AdomdClient.dll,
installed by the SQL Server 2005 Beta 2 Client
Components Setup. If it is not listed under .NET in the
Add Reference dialog box in Visual Studio, browse to
Program Files\Microsoft.NET\ADOMD.NET\90 and add it
from that location.)

The query results are fetched, again using a standard


data reader interface exposed by ADOMD.NET and the
array list of output items populated with the
recommendations returned by the prediction join. Note
that the DMX query we generate uses the FLATTENED
keyword to avoid having to perform hierarchical result
navigation to fetch the results.

Under the Hood


In this section we explain what happens on the server
when it receives a prediction query and how the
knowledge acquired by the mining model may be explored
further.

During the training process, the Association Rules model


learns a set of rules that are used to generate product
recommendations. If a rule such as “[Camera, Film] ->
Batteries” was discovered and the customer’s shopping
basket contains Camera and Film, then this rule fires. Of
course, there may be other rules that predict Batteries as
well, in which case the rule with the highest score is
used. The score assigned to a rule, also known as its
Importance, takes into account and compensates for the
fact that the probability for a rule may be high just
because the target item is popular in the dataset. Assume
that 10% of customers buy Star Wars irrespective of
other purchases, and Blade Runner is bought by just 3%.
If two rules predict Star Wars and Blade Runner with the
same probability, the score for Star Wars will be lower
since it’s so popular. However, the advanced user can
fine tune the score using algorithm parameters.

Finally, the top n highest scoring rules are used to


generate the recommendations. The last parameter in the
TopCount function (again, see Figure 7) sets an upper
limit on the number of items returned by the prediction
algorithm.

The rules are organized by the items predicted and sorted


in descending order based on the score. The prediction
algorithm can avoid looking at very large sets of rules to
achieve good prediction performance.

Additional insight into a customer’s purchasing behavior


may be gained by using the viewers supplied as part of
the Business Intelligence Development Studio. These
include the Association Rules Viewer (shown in Figure 9)
for browsing the rules and the Dependency Net Viewer
that graphically shows the relationship between items.
The graph layout algorithm illustrates how the strongest
correlated products are clustered.

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 9

Figure 9: Viewing model content.

The Association Rules viewer displays the importance for


each rule and ranks the rules by this score. As explained
earlier, importance serves as a better measure than
probability for finding interesting rules in your data.

Conclusion
This article merely scratches the surface of the
possibilities that SQL Server 2005 Data Mining presents
for your ASP.NET applications. One idea for extending
this sample would be to add other customer attributes
such as demographics as inputs to potentially improve the
quality of the recommendations. Other possible W eb
applications include:
Targeted ads based on analysis of browsing behavior using
the Sequence_Clustering algorithm.
An online Help desk that finds the most appropriate answer
for a user’s query by using the Neural_Network algorithm in
conjunction with text mining technologies available in SQL
Server 2005 Data Transformation Services.

The data mining tutorial for SQL Server 2005 Analysis


Services is available on the Readiness Kit CD included
with the Beta 2 package. Visit the Microsoft beta
newsgroup at
microsoft.beta.yukon.analysisservices.datamining or the
Data Mining forum at http://www.sqljunkies.com/Forums/
ShowForum.aspx?ForumID=38 if you have questions about
SQL Server 2005 Data Mining. This article offers an
outline of the simplified mining process used for this
specific scenario. For a generalized version of this
methodology, refer to CRISP-DM at http://www.crisp-
dm.org.

The sample code in this article is available for download .

Raman Iyer is a Software Design Engineer at Microsoft


Corp. and a founding member of the SQL Server Data
Mining development team there. He can be reached at
mailto:ramaniy@microsoft.com.

Jesper Lind is a Research Software Design Engineer at


Microsoft Research and a member of the Machine
Learning and Statistics team. He can be reached at
mailto:jesperl@microsoft.com .

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27


SQL Server 2005 Data Mining Page 10

Informant Communications Group, Inc.


5105 Florin Perkins Road
Sacramento, CA 95826
Phone: (916) 379-0609 • Fax: (916) 379-0610

Copyright © 2005 Informant Communications Group. All Rights Reserved. • Site Use Agreement • Send feedback
to the Webmaster • Important information about privacy

http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27

Vous aimerez peut-être aussi