Vous êtes sur la page 1sur 94

PREDICTIVE ANALYTICS AND SAP HANA

RDP267
Hands-on Exercises
SAP TechEd 2013
Getting started with your session
Login credentials and group numbers can be found in the My Reservation tab on the SAP TechEd Virtual
Hands-On Workshops website (https://saptechedhandson.sap.com/).
Important: Some of the sessions use place holders for users (e.g. CD300_XX) or objects (e.g.
ZCD400_Exercise_##). The place holders XX or ## must be replaced with your assigned group number,
which you find in the My Reservation tab on the above mentioned website.

2
INITIAL SETUP ................................................................................................................................................. 4
CHAPTER 1 .................................................................................................................................................... 10
Use HANA Studio and SQL Script to create a PAL procedure for C4.5 algorithm ................................. 11
Use HANA Studio and SQL Script to run the trained C4.5 Model ............................................................ 27
Use HANA Studio and AFM to create a PAL procedure for C4.5 algorithm ............................................ 36
CHAPTER 2 .................................................................................................................................................... 55
Use HANA Studio and SQL Script to create a PAL procedure for Outlier algorithm ............................. 56
Use HANA Studio and AFM to create a PAL procedure for Outlier algorithm ........................................ 65
CHAPTER 3 .................................................................................................................................................... 76
Use R Studio to develop a Generalized Linear Model ............................................................................... 77

HANA and Predictive

BEFORE YOU START


In the Hands-on session RDP267 you have the opportunity to select your exercises depending on your
personal area of interest. However, you find the solutions to all exercises as reference this way you can
also see the solution of the exercises you did not finish.
Due to time constraints during the Hands-on session, it is recommended that you first take a look at the
different exercises and then decide which ones you want to work through first.
HANA and Predictive
Chapter 1
45 minutes (4 exercises)
Chapter 2
30 minutes (2 exercises)
Chapter 3
15 minutes (2 exercises)

4
INITIAL SETUP
During the exercises, you will work on a SAP HANA system with the following system properties:
Host name:
coe-he-084.wdf.sap.corp
Instance number:
10
SAP System ID (SID):
M31
Database user name:
RDP267_# (# = your assigned student ID, maybe 1 or 2 digits)
Password:
Initial1
Database Schema
RDP267
Student exercise package RDP267.sessionX.# (X = your assigned session number, # = student ID)
Solution package
RDP267.solution

As preparatory steps, make sure a connection to the backend SAP HANA database system is defined with
your assigned user (RDP267_#).
Explanation
1.

Start the SAP HANA Studio (by clicking on the


desktop icon for it.

2. Open the Development Perspective from the


SAP HANA Studio start screen

Either Open Development from the overview


screen
or
select from the studio menu:
Windows > Open Perspective > other
> SAP HANA Development

For the virtual Hands-on workshops the user and


password is unique and was changed before you get
access to the system.
Therefore the password of the secure store needs to
be recovered/unlocked.

Screenshot

Explanation

You need to recover the password first.


Choose Window->Preferences

Then open
General -> Security
-> Secure Storage

Click on Recover Password

Screenshot

Explanation

Answer the questions


Question 1: 1972
Question 2: Hoffenheim
And click OK.

And click OK.

And click OK.

Unlock the Secure Store by clicking on Unlock.


Now you can continue with the exercises.

Screenshot

Explanation
3. Connect to the HANA System.

On the left, select the SAP HANA System


View ( ) from the available Views
(Workspace, SAP HANA Repository Browser,
SAP HANA System)

In the HANA Systems View right-click on the


background of the white area, then^

Select > Add System from the context


menu

Screenshot

Explanation

Screenshot

4. Specify Connection details:


Enter the connection details

Host Name:

Instance Number: 10

Click > Next

coe-he-084.wdf.sap.corp

Enter your assigned user and password


credentials

User Name: RDP267_XX


(replace XX with your assigned student id)

Password: Initial1

Click > Finish.

The SAP HANA System View will show the


new connection

Explanation
5. Explore the SAP HANA Database Catalog and
Repository Content structure for the Workshop
The Database Catalog, i.e. the SAP HANA
database schema with the tables for this
workshop is the schema RDP267.
To explore the schema:
Expand the Catalog folder > RDP267 >
Tables

Note: In order to browse a table, right-click on


the table and select > Open Content from the
context menu.

Screenshot

10
CHAPTER 1
In this chapter we will perform Hands-on Exercises in HANA Studio using the HANA PAL library
Estimated time: 45 minutes

Objective
Use both SQL Script and the new Application Function Modeler (AFM) within HANA Studio to create
and execute PAL procedures.
What you will learn

How to create and execute PAL Algorithms via SQL Script


How to run the trained model via SQL script
How to create and execute PAL Algorithms via AFM

Exercise description

Use HANA Studio and SQL Script to create a PAL procedure for C4.5 algorithm
Use HANA Studio and SQL Script to run the trained C4.5 Model
Use HANA Studio and AFM to create a PAL procedure for C4.5 algorithm

11

Use HANA Studio and SQL Script to create a PAL procedure for C4.5 algorithm
Explanation

Screenshot

1. Click on the system M31


(RDP267_XX) to open the
connection (where XX represents
your group number).

2. Click the

icon.

12

Explanation

3. . Right mouse click in the middle


of the blank area. Click the Open
File...
menu item to execute it.
You can also press o.

Open the PAL C4.5 CREATEDT


RDP267XX version1.sql template
script file from the Student (Local)
directory:
D:\Files\Session\RDP267
(type this share directory into the top line of
the Open File dialog box)

Screenshot

13

Explanation

4. Click the system M31


(RDP267_XX, where XX represents
your group number) select it, so that
you can open another SQL Console.

5. Click the
icon to launch
another SQL Console window
(separate tab).

This opens a new SQL Console


window in which to create your own
script for your user and group
number. You can either type the text
yourself, or paste in text from the
template In either case, remember to
replace XX with your group number.
Note: if you choose to copy and
paste, be sure to stop and take the
time to understand the copied code!
6. Click the tab M31 - PAL C4.5
CREATEDT RDP267XX
version1.sql
to select it.

Here you see the text that goes into


other SQL Console tab. You can
Select the relevant script text area to
be copied and copy with ctrl+c, or
simply read the text and type the
exact same thing in the SQL
Console in the other tab.

Screenshot

14

Explanation

Screenshot

7. Click the tab M31 - SQL


Console
to select it.

Type the text, or paste the copied


text with ctrl+v; in either case, be
sure to change XX to your group
number

8. Click the tab M31 - PAL C4.5


CREATEDT RDP267XX
version1.sql
to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

9. Click the tab *M31 - SQL


Console
to
select it.

15

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
Here we define the set of columns
that you will train the Decision Tree
on. By convention, the final column
(FIVEYEARSURVIVAL) is assumed
to be the "Dependent Column" of this
algorithm and all other columns are
assumed to be "Independent".

10. Click the tab M31 - PAL C4.5


CREATEDT RDP267XX
version1.sql
to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste
select the relevant script text area to
be copied and copy with ctrl+c.

11. Click the *M31 - SQL


Console
tab to
select it.

16

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
Here we define the columns of the
two output tables that will be
populated by this CREATEDT
algorithm. One output will be the
Decision Tree in JSON format. The
other will be the same tree in PMML
format.

12. Click the M31 - PAL C4.5


CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

13. Click the *M31 - SQL


Console
tab to
select it.

17

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
And finally, here we define the
column definition of the generic
"Input Control Parameter" table that
is used by every PAL algorithm.

14. Click the M31 - PAL C4.5


CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

15. Click the *M31 - SQL Console


tab to
select it.

18

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and make
sure you change XX to your group
number.
Here we define and populate the
"Signature table" for this algorithm.
You define the 2 input tables and 2
output tables that this particular
CREATEDT PAL algorithm expects.
These are the table (types) you
created in the script above.
16. Click the M31 - PAL C4.5
CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
17. Click the *M31 - SQL
Console
tab to
select it.

19

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.

Here we ensure that the system user


has select rights on your Signature
table. This is because the AFL
generate wrapper proc is owned by
SYSTEM and is run with definer's
rights.

18. Click the M31 - PAL C4.5


CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

19. Click the *M31 - SQL


Console
tab to
select it.

20

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.

This part of the script calls the


wrapper proc to create your own
PAL CREATEDT proc.

20. Click the M31 - PAL C4.5


CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

21. Click the *M31 - SQL


Console
tab to
select it.

21

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
Here we create a temporary table for
the control parameters to be used
during training of this CREATEDT
model. The definition of the
temporary table is in turn based on
the table type definition you
established earlier. See PAL
Development guide for full
explanation of all parameters.
http://help.sap.com/hana/SAP_HAN
A_Predictive_Analysis_Library_PAL
_en.pdf
22. Click the M31 - PAL C4.5
CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

23. Click the *M31 - SQL


Console
tab to
select it.

22

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
Here we create the two physical
output tables (based on the table
type definitions you established
earlier in the script).

24. Click the M31 - PAL C4.5


CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
25. Click the *M31 - SQL
Console
tab to
select it.

23

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.
Here we define a new DB view which
is subest of all columns available
from the column view
sap.hhp.fnd/CA_INTERACTIONS_P
RED and which matches to the input
data table type you defined earlier.
Then we call your PAL CREATEDT
procedure passing in the two input
tables/views it expects.
26. Click the M31 - PAL C4.5
CREATEDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

27. Click the *M31 - SQL


Console
tab to
select it.

24

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.

This part of the script will review both


output tables.

28. Click the Execute icon

Note: Upon execution, you may see


numerous error messages for the
DROP TYPE, DROP TABLE, and
DROP VIEW statements. This is to
be expected, we drop first simply as
best practice, and these errors are
not a problem.
If you see other types of errors,
review your code and look for
discrepancies. In particular, look for
cases where you missed substituting
XX for your group number.

29. There are several Result tabs.


Click the first Result
tab
to select it. Your content should look
similar to the illustration here.

25

Explanation

Screenshot

30. Click the


nextResult
it.

tab to select

Review the output table. While the


JSON and PMML formats are not
easily "human readable" there are
Viz options on top of this trained
model. We can also use this JSON
format to predict the outcome of a
new patient via SQL Script proc and
that is what we plan to do later in this
TechEd Hands on workshop for SAP
HANA and Predictive.

31. Click the third


Result

tab to select it.

26

Explanation

32. Review the content of this Result


tab, it should look similar to the
depiction shown here.
This concludes this exercise; close
the open SQL Console tabs to clean
up your work area. To do so, simply
click the X in the upper left of each
tab. You can select No to the
question, Save Changes?.

Screenshot

27

Use HANA Studio and SQL Script to run the trained C4.5 Model
Explanation

1. Click the M31 (RDP267_XX)


system to select it.

2. Click the
icon, to open an
SQL Console again.

3. Right lick in the blank space in the


SQL Console, and then choose the
Open
File...
menu item to execute it.
Alternatively, you can press o.

Open the PAL C4.5


PREDICTWITHDT XX version1.sql
script from the file share.

Screenshot

28

Explanation

4. Click the system M31


RDP267_XX, where XX represents
your group number

5. Click the SQL Console icon


to
open again another SQL Console..

6. Click the tab M31 - PAL C4.5


PREDICTWITHDT RDP267XX
version1.sql
to select it.
This opens a new SQL Console
window in which you create your
own code for your user and group
number. You can choose to type the
code manually by looking at the
provided script and reproducing each
section precisely. Alternately, you
can copy paste in text from the
template, but be sure to stop and
understand the steps you are taking!
In any case, remember to replace
XX with your group number.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

Screenshot

29

Explanation

Screenshot

7. Click the M31 - SQL Console


tab to
select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.

This step sets your schema for your


user. Also, here you create the input
table type for the data to be
predicted.

8. Click the M31 - PAL C4.5


PREDICTWITHDT RDP267XX
version1.sql
tab to select it.
Note the comments in the template
script. For the JSON model input
table to the PREDICTWITH DT PAL
algorithm we can simply use the
JSON Model output table from the
CREATEDT PAL Algorithm that you
created in a previous step exercise
in Chapter 1. This is included for
informational purposes; no action
needs to be taken regarding these
comments.

30

Explanation

Screenshot

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
9. Click the *M31 - SQL
Console
tab to
select it.
Either type the aforementioned block
of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
In this step, we create the table
types for the Input Control
Parameters table and the Result
table
10. Click the M31 - PAL C4.5
PREDICTWITHDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

11. Click the *M31 - SQL Console


tab to
select it.

31

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v. In either
case,change XX to your group
number.
Here we create and populate the
Signature table and allow select
access on it by SYSTEM user.
12. Click the M31 - PAL C4.5
PREDICTWITHDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
13. Click the M31 - SQL
Console
tab to
select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v. In either
case, change XX to your group
number.
Here we create the PAL
PREDICTWITHDT procedure for
your user group number.
14. Click the M31 - PAL C4.5
PREDICTWITHDT RDP267XX
version1.sql
tab to select it.

32

Explanation

Screenshot

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
15. Click the M31 - SQL
Console
tab to
select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
Here we create and populate a
Temp table as the Input Control
Parameter table.

16. Click the M31 - PAL C4.5


PREDICTWITHDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste
select the relevant script text area to
be copied and copy with ctrl+c.

17. Click the M31 - SQL


Console
tab to
select it.

33

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply
pastethe copied text with ctrl+v.
This creates the physical output
table for the results of the prediction
18. Click the M31 - PAL C4.5
PREDICTWITHDT RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

19. Click the *M31 - SQL


Console
tab to
select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v. In either
case,change XX to your group
number.
This will call you PAL procedure,
passing in some diagnosis and
genomic biomarker information from
a newly diagnosed patient and then
review the predicted results

34

Explanation

Screenshot

20. Click Execute (F8)

Note: Upon execution, you may see


numerous error messages for the
DROP TYPE, DROP TABLE, and
DROP VIEW statements. This is to
be expected, we drop first simply as
best practice, and these errors are
not a problem.
If you see other types of errors,
review your code and look for
discrepancies. In particular, look for
cases where you missed substituting
XX for your group number.

21. Click the first


Result

tab to select it.

The contents should resemble the


depiction shown here.

22. Click the second


Result

tab to select it.

35

Explanation

The contents of your Result tab


should resemble the depiction shown
here.
This prediction suggests that a drug
chemotherapy protocol of CAV and
Protocol Timing of Neo adjuvant
(before Surgery) would give this
patient the best chance of 5 year
survival given their diagnosis and
biomarker information.
This concludes this exercise; close
the open SQL Console tabs to clean
up your work area. To do so, simply
click the X in the upper left of each
tab. You can select No to the
question, Save Changes?.

Screenshot

36
Use HANA Studio and AFM to create a PAL procedure for C4.5 algorithm
Explanation

Screenshot

1. Click the SAP HANA


Development button
.

2. Click the Project


Explorer
select it.

tab to

Right click in the blank part of the


Project Explorer area, to invoke the
menu.

3. Click the
Project...
menu item to execute it.
You can also press r.

37

Explanation

4. Click Project

5. Click Next

Screenshot

..

38

Explanation

Screenshot

6. Enter PROJ_RDP267_XX, where


XX represents your student number
in the Project name: .field.

7. Click Next
You can also press Alt+n.

39

Explanation

Screenshot

8. Do not select any referenced


projects; instead simply click Finish
.

9. Click the
Window

menu item.

10. Click the


Preferences
menu item to execute it.

40

Explanation

Screenshot

11. Click SAP HANA Development


.

12. Click Repository Access


.

13. Check to make sure the regi


location in your preferences matches
the illustration shown here. If it does,
simply click the Cancel button. If it
does not, Enter C:\Program
Files\sap\hdbclient\regi.exe in
the Location: box and hit the OK
button.

14.

41

Explanation

Screenshot

15. Right click on


PROJ_RDP267_XX ,where XX
refers to your student number.

16. From the menu, choose Team,


then click the
ShareProject
menu
item to execute it..

In the first dialog box for Share


Project, choose SAP HANA
Repository, then click on the Next
button.

42

Explanation

Screenshot

17. Click Add Workspace...


.

18. Click M31 (RDP267_XX) Tech


Ed 2013, where XX represents your
student number.
.

19. Enter WS_RDP267_XX, where


XX represents your student
number, in the Workspace
Name: box. Leave the value for
Workspace Root as the default.

43

Explanation

20. Click Finish

Screenshot

21. Select the entry


WS_RDP267_XX [M31
(RDP267_XX), coe-he084.wdf.sap.corp, 10] where XX
represents your student number, by
clicking it. Do not click Finish button
here yet though, you still have to
select the Repository Package in the
next step.

44

Explanation

Screenshot

22. Click the Browse... button next


to the Repository Package field. .

23. Expand WS_RDP267_XX [M31


(RDP267_99), coe-he084.wdf.sap.corp, 10], where XX
represents your student number. .

24. Click RDP267

45

Explanation

25. Click OK

26. Click Finish


You can also press Alt+f.

Screenshot

27. Expand your project by clicking


on the arrow icon next to it,

46

Explanation

Screenshot

28. Right mouse click on


PROJ_RDP267_XX (where XX
represents your student number)

29. Choose New, then click the


Other...
Ctrl+N
me
menu item to execute it.

30. Click AFL Connector File


.
31. Click &Next
You can also press Alt+n.

47

Explanation

Screenshot

33. Select your project, and enter


MY_AFM_CDT in the File
name: box. Next, click Finish
.

34. Double Click


MY_AFM_CDT.aflpmml
.

35. Click the arrow icon next to the


Classification functio
.

48

Explanation

Screenshot

Select
. and drag and
drop to the main design panel

36. Click the

icon.

37. Expand the hierarchy tree as


shown under your project; Catalog >
SAP_HHP > Tables. Select the
CREATEDT_TECHED table.

49

Explanation

Screenshot

Drag the table


CREATEDT_TECHED
to the main
area.

38. On the object for this table, click


the icon for Open Data
Preview

39. Click in the area to the right of


the scroll bar to scroll to the right.

40. Click the


*MY_AFM_CDT
tab to select
it.

50

Explanation

Screenshot

41. Drag a connecting line from


CREATEDT... and release it onto the
Training space.

42. Next, click on the object for


JsonModel (upper right). This will
launch the Properties tab.

43. Use the Plus icon (+) on the right


hand side of the Properties tab to
add two entries to the Output. Enter
these values as shown here.

44. Click the

button

Again make the following entries as


shown here

51

Explanation

Next, click anywhere in the white


space of your model. This will launch
the Procedure Properties dialog box
below.

45. Click Open .


You can also press Alt+Down
Arrow.

Select your user's schema from the


dropdown list

Screenshot

52

Explanation

Screenshot

46. Click the

object.

47. Enter 100 in


the (INTEGER) box.
Adjust the MIN_NUMS_RECORDS
parameter to 100

48. Click Save (Ctrl+S)

49. Click on
MY_AFM_CDT.aflmodel
with the right
mouse button.

53

Explanation

Screenshot

50. Click the


Activate
menu item to execute it.
You can also press a.

51. Click
RDP267.PROJ_RDP267_99::MY_A
FM_CDT.model
.

Select the Call button


right of AFM screen

52. Click OK

53. Click the

at top

object.

54

Explanation

Screenshot

You should see data resembling the


content shown in the example here.

Chapter Summary:
In this chapter you learned via HANDS ON exercises how to create and execute PAL Algorithms via both
SQL Script and via AFM.

55

CHAPTER 2
In this chapter we will perform Hands-on Exercises using the HANA R integration.
Estimated time: 30 mins

Objective
Use both SQL Script and the new Application Function Modeler (AFM) within HANA Studio to create and
execute PAL procedures.
What you will learn

How to create and execute PAL Algorithms via SQL Script


How to create and execute PAL Algorithms via AFM

Exercise description

Use HANA Studio and SQL Script to create a PAL procedure for Outlier algorithm
Use HANA Studio and AFM to create a PAL procedure for Outlier algorithm

56
Use HANA Studio and SQL Script to create a PAL procedure for Outlier algorithm
Explanation

Screenshot

1. Click M31 (RDP267_XX), where


XX represents your student number,
to open another SQL Console.

2. Click the icon for the SQL


Console

57

Explanation

3. Right mouse click the blank


space, and from the menu, select the
Open
File...
menu item to execute it.

Open the template script PAL


Anomaly Detection RDP267XX
version1.sql from the file share.
Enter the share location
D:\Files\Session\RDP267
In the top field of the Open File dialog box.

4. Click M31 (RDP267_XX), where


XX represents your group number, to
open another SQL Console.

Screenshot

58

Explanation

5. Click the the icon for the SQL


Console

This opens a new SQL Console


window in which to create your own
script for your user and group
number. You can either type the text
yourself, or paste in text from the
template In either case, remember to
replace XX with your group number.
Note: if you choose to copy and
paste, be sure to stop and take the
time to understand the copied code!
6. Click the M31 - PAL Anomaly
Detection RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

7. Click the M31 - SQL


Console
tab to select it.

Screenshot

59

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.
Here we set to your user's schema,
create the table types that define the
input table, and the output table for
this PAL Anomaly Algorithm.
8. Click the M31 - PAL Anomaly
Detection RDP267XX
version1.sql
tab to select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply select
the relevant script text area to be
copied and copy with ctrl+c.
9. Click the *M31 - SQL
Console
tab to
select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
Here we create the generic PAL
table type for the input control
parameters

60

Explanation

Screenshot

10. Click the M31 - PAL Anomaly


Detection RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste
select the relevant script text area to
be copied and copy with ctrl+c.

11. Click the *M31 - SQL


Console
tab to
select it.
Either type the aforementioned block
of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.
Here we create and populate the
signature table for this Algorithm
(which in this case contains 2 input
table (types) and 1 output table
type). Here we also allow the
SYSTEM user select access on your
signature table.
12. Click the M31 - PAL Anomaly
Detection RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.

61

Explanation

Screenshot

13. Click the *M31 - SQL


Console
tab to
select it.
Either type the aforementioned block
of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.
Here we are calling the AFL wrapper
procedure to create your new PAL
procedure
14. Click the M31 - PAL Anomaly
Detection RDP267XX
version1.sql
tab to select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply select
the relevant script text area to be
copied and copy with ctrl+c.
15. Click the *M31 - SQL
Console
tab to
select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
Create and populate the Input
Control Paramater table for this
algorithm. See the PAL Development
Guide for more details.

62

Explanation

Screenshot

16. Click the M31 - PAL Anomaly


Detection RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
17. Click the *M31 - SQL
Console
tab to
select it.

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v.
This step creates the physical output
table based on the table type
definition established earlier in this
script

18. Click the M31 - PAL Anomaly


Detection RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
19. Click the *M31 - SQL
Console
tab to
select it.

63

Explanation

Screenshot

Either type the aforementioned block


of code precisely by looking at the
script in the other tab and
reproducing it here, or simply paste
the copied text with ctrl+v and
change XX to your group number.
This will call your PAL procedure
passing in data from the mentioned
column view
20. Click the M31 - PAL Anomaly
Detection RDP267XX
version1.sql
tab to select it.

The corresponding block of code


shown here should be reproduced in
the other SQL Console tab. If typing
manually, be sure to reproduce it
precisely. If utilizing copy/paste,
select the relevant script text area to
be copied and copy with ctrl+c.
21. Click the *M31 - SQL
Console
tab to
select it.

Paste the copied text with ctrl+v


Review the Output table to see
which Patients are statistical outliers
based on the 4 clusters that were
defined in the Input Control
Parameters.

22. Click Execute (F8)


. You
may see some error messages
about the Drop Type and Drop Table
statements, but you can ignore those
errors.

64

Explanation

Screenshot

23. Click thefirst


Result

tab to select it.

Your output should look similar to the


depiction shown here.

24. Click the second


Result

tab to select it.

These patients are the statistical


outliers based on our model
parameters. This may lead to insight
and further analysis - e.g. why do
some patients live longer than others
after Diagnosis, and is their longevity
only related to their age at
Diagnosis?

65

Use HANA Studio and AFM to create a PAL procedure for Outlier algorithm
Explanation

Screenshot

1. Click on the SAP HANA


Development button
.

2. Click the Project


Explorer
select it.

tab to

3. Right mouse click on


PROJ_RDP267_XX (where XX
represents your group number) .
This is the project you created in a
previous exercise.

4. From the menu, select new, and


then chose the Other... menu item.

66

Explanation

Screenshot

5. Click AFL Connector File .

6. Click Next

67

Explanation

Screenshot

7. Enter MY_AFM_AD in the File


name: box.

8. Click Finish
You can also press Alt+f.

9. Double-click on
MY_AFM_AD.aflpmml
.

10. Click .
Open the Clustering group of PAL
Algorithms by clicking on the arrow
next to the arrow icon for Clustering.
.

68

Explanation

Screenshot

Drag Anomaly Detection and drop it


in the main space.

11. Click the plus sign icon.

12. In the left hand Catalog hierarchy


under your project, in the schema
SAP_HHP, under Tables, choose
the ANOMALIES table and drag it
into the main space.

Drag an arrow from the

ANOMALIES icon
drop it on the Data icon.

and

69

Explanation

13. On the ANOMALIES icon, click


the icon for Open Data Preview.

14. Click the Refresh icon


.
Change the max rows to 47000 and
click refresh again.

15. Click the Analysis


tab.

Drag PATIENT_ID and drop it on the


Labels axis space.

Screenshot

70

Explanation

Drag DAYS_DIAG_DEATH and


drop it on the value axis space.

Drag AGE_DIAG and drop it on the


value axis space.

Screenshot

71

Explanation

16. Click the button for Scatter


charts
. Review the Scatter
Plot. Notice some outliers.

17. Click the


*MY_AFM_AD
tab to select it.

18. Click the Result

icon.

19. Select the


entry

by clicking it.

Screenshot

72

Explanation

Screenshot

Select anywhere in the whitespace


of your model.

20. Click Open .


You can also press Alt+Down
Arrow.

21. Select the entry RDP267_XX


(where XX represents your group
number)
by clicking
it.

22. Click Save

73

Explanation

Screenshot

23. In the right hand hierarchy, select


your project and right click to invoke
the menu. Choose Team, then the
Activate
menu item to execute it.

24. Click on the procedure


RDP267.PROJ_RDP267_XX::MY_A
FM_AD.model (where XX
represents your group number)
.

25. Click the SQL


it.

tab to select

74

Explanation

Screenshot

26. Click the


Overview

tab to select it.

Select the Call button


right of AFM screen

at top

27. In the Call Procedure Success


dialog box, click the OK button.
.

28. On the Result icon, select the


Open Data Preview icon.

75

Explanation

Screenshot

In the result set, look for the the


outliers.

Chapter Summary
In this chapter you learned via HANDS ON exercises how to create and execute PAL Algorithms via both
SQL Script and via AFM

76
CHAPTER 3
In this chapter we will perform Hands-on Exercises to run our trained predictive models
Estimated time: 20 minutes

Objective
The objective of this chapter is give you an understanding of the fundamentals of the HANA/R connectivity
through a real-life example and application of a widely used statistical method.
What you will learn

How to use R Studio for the Generalized Linear Model (GLM)


How to create a HANA SQL Script that calls an R GLM Algorithm

Exercise description

Use R Studio to develop a Generalized Linear Model (GLM)

77

Use R Studio to develop a Generalized Linear Model


Explanation

1. Click Start

Screenshot

2. From the All Programs menu,


expand the RStudio folder and click
the RStudio
execute it.

menu item to

3. Click the Open File icon to open


the R script.

78

Explanation

Screenshot

4.
Enter the share location in the top field of
the Open File dialog.
D:\Files\Session\RDP267 and hit the green
arrow button next to that.

Select the script named "R GLM GROUP


XX template scrtipt v1.R", and then click
the Open button.

5. Click the Maximize icon in the


upper right of the window showing
the script

6. Locate the uid (user id) parameter


in the script. Replace the XX with
your student number.

79

Explanation

Screenshot

7. For the pwd parameter, change


the value to Initial1.

We now want to extract the data we


need for developing statistical
models from HANA. This requires
setting the parameters, but also
creating the necessary connections.
Please use the access data that
has been provided to you in this
workshop.
8. Select the region of the script as
shown (only items #1. Through #5.)
Next, click the Run button
.

9. In the right-hand window


Workspace tab,
click

This opens up the GLM_Analysis dataset


so you can view its contents.

80

Explanation

10. In the upper right part of the window


containing the result set, click
To maximize the screen to display the
data.

11. Scroll to the right.


Here you can see the variables that were
read from SAP HANA.
The dataset contains the demographic
information of the patient and also the
type of cancer with which they have
been diagnosed.

Screenshot

81

Explanation

13. Close the tab containing the


result set by clicking on the X at the
top right of the tab.

14. Select region shown here (#6.


only) and click
.

15. In the right-hand window


Workspace tab,
Click
. This will
display the result set from the
command previously executed.

Screenshot

82

Explanation

16. In the upper right part of the window


containing the result set the icon to
maximize the size of the result set
window
.

17. Scroll to the right.

Screenshot

83

Explanation

19. Close the tab after reviewing the


data.

20. Select region depicted here (a


specific part of #7. only) and click
.

Screenshot

84

Explanation

21. Maximize the Console window by


clicking the icon in the upper right
.

Screenshot

85

Explanation
Generalized Linear Models can be
viewed as an extension to
Regression Models in that they allow
2 fundamental additions:

They allow for the error models to


be extended beyond the
normality assumption.
They allow for a generic use of
categorical variables (as opposed
to continuous variables).

In the current example we are using


a categorical binary variable
(ONEYEARSURVIVAL) and are
modeling it with the AGE_DIAG
which is the age at which a specific
patient was diagnosed with cancer.
In other words: we are trying to
measure the effect that the time of
diagnosis has in life on the
probability of someone surviving a
year.
1.) The first section of the output we
see here explains the main
characteristics of the Residuals
distribution.
2.) Following it, the estimates of the
parameters are provided, with an
estimated standard error, a z value
and a p-value associated to it.
3.) Then the deviance section + the
AIC (Akaike Information Criterion)
are given. In theory, the deviance
has a Chi Squared distribution. The
smaller the deviance, the better the
model. Similarily the AIC represents
a goodness-of-fit statistic that allows
to evaluate the model adequacy.
4.) The Number of Fisher Scoring
iterations equals the number of times
the convergence criterion function
had to be evaluated for the method
to obtain the numerical result.

Screenshot

86

Explanation

22.
For the generation of predictions, the
function "predict" is used, indicating with
which model the values are to be
created.
In the window showing the script, select
the region depicted here (just that one
line) and click "Run".

23. In the Console window, type in


the text View(predicted1) an hit the
Enter key.
This allows you to view the predicted
values.

24. The window in the top left shows


the result set. Maximize it by clicking
the icon, click
.

Screenshot

87

Explanation

25. Use the vertical scroll bar to


display the desired screen area.

26. Click

After viewing the values, please close


the tab.

Screenshot

88

Explanation

There are four more models


provided for you to test (model2
through model5). You are welcome
to test these using the techniques
described previously, and are invited
to attempt to interpret the results.

Now that we have developed the


model in RStudio, we want to use it
in HANA and are going to develop
an R procedure that is directly
embedded and executable in HANA.
For that purpose:
29. Go back into the SAP HANA
Studio. Click the button in the upper
left for the Modeler perspective. In
the SAP HANA Systems area in the
upper left, Right mouse click on your
system connection M31
(RDP267_XX) TechEd where XX
represents your student number.

Screenshot

89

Explanation

Screenshot

30. From the menu, choose the SQL


Console
menu item to execute it.

31. Click the File


to execute it.

menu item

32. Click the Open


File...
menu item to execute it.

90

Explanation

Screenshot

Enter the share location in the field


at the top of the dialog box, and hit
the green arrow button:
D:\Files\Session\RDP267

Select the file GLM scoring function


calling R from HANA template
XX.sql

33. Click Open

34. This is the code you will see.

35. Now copy the first section into


the SQL console you opened before.

36. Replace all XX with your user


ID. Run the code.
Here you are creating the table with
which you will create the models.

91

Explanation

37. Go back to the code and select


the next part and copy. Go back to
your SQL console.

38. Paste it. Replace all XX with


your user ID and run the code.
Here you will be creating the data of
patients that are going to be scored
with the created model. The new
table will also contain the prediction.

39. Go back to the code and copy


the part, where the R procedure is
created. Go back to your SQL
console.

40. Paste the code and replace all


XX with your user ID. Now run it.
Here you are creating the R
procedure. Note that we took the
code we created in RStudio to
perform this action.

Screenshot

92

Explanation

Screenshot

41. Finally, go back to the code and


copy the last part. Go back with it to
your SQL console.

42. Make the replacements of the


XX with your user ID and run it.

43. Scrolling to the right of the


results you will now see the
predicted values!

Chapter Summary:
In this chapter you learned the fundamentals of the HANA/R connectivity through a real-life HANDS-ON
example and application of Generalized Linear Models, a widely used statistical method.
Thank you for participating in this SAP TechEd Virtual Hands-On Workshop!
Please, take a few minutes to answer a couple of feedback questions concerning your session.

Find a shortcut to the survey on the desktop of your virtual laptop image or visit
https://www.sapsurvey.com/cgi-bin/qwebcorporate.dll?idx=FSQCZ7

93

2013 by SAP AG. All rights reserved.


SAP and the SAP logo are registered trademarks of SAP AG in Germany and other countries. Business
Objects and the Business Objects logo are trademarks or registered trademarks of Business Objects
Software Ltd. Business Objects is an SAP company. Sybase and the Sybase logo are registered trademarks
of Sybase Inc. Sybase is an SAP company. Crossgate is a registered trademark of Crossgate AG in
Germany and other countries. Crossgate is an SAP company.

94

Vous aimerez peut-être aussi