Vous êtes sur la page 1sur 29

Reference Implementation for Application Development

on HANA
By TIP D&NA Data Management
June 10th, 2011
2011 SAP AG. All rights reserved. 2
Agenda

Motivations
Languages available for application development
Demo CarShop, V1
CarShop, V2
Resource

We like to make this session as an open discussion instead of
presentation; your feedback is very welcome.



2011 SAP AG. All rights reserved. 3
Motivations
Whats this?
It is a sample application built on HANA. It leverages
HANA to provide features like Analysis, Forecast,
What-if Planning, Sales promotion like cross-selling
etc.
Why we do this?
HANA has great features which other DBMSs dont have,
such as column-based modeling, in-memory computing,
build-in business library, build-in predictive library, R
integration etc. The official HANA document cant cover
all the details, especially the sample codes. We make this
app so that other developers can use this as a reference
and quickly develop new apps on HANA.
What are the benefits for SAP?
Other HANA Content/App developers can quickly
:master HANA advanced features like column-based
modeling, in-memory computing, build-in business library,
predictive library, R integration, etc. As sample codes,
other teams/developers can know how to make 2-tired
and 3-tired application based on HANA.

S
Q
L
B
F
L
R
Analy
sis
What-
if
Cross
-
sellin
g
Forec
ast
2011 SAP AG. All rights reserved. 4
Motivations
Project Definition
Just like Sun provides petStore for Java EE platform, here the HANA Reference Implementation
Application, named carShop, is designed to illustrate how the HANA can be used to develop an amazing
application. By learning this project, the learner can get the following things:
SQLscript
SQLscript V2
L
IMSL
R
BFL /PAL
.net/java frontend
Target Audience: Everyone who is interested in how to develop new applications on HANA
Virtual Business Scenario
This project is based on a virtual car sales scenario. A company has lots of salesmen in different cities to sell
cars. That company uses carShop system to analyze historical sales data, forecast and plan future sales
data, set KPI to salesman based on the plan data, calculate the volume-driven bonus, analysis the potential
customer information, cluster them, find selling opportunities.





2011 SAP AG. All rights reserved. 5
Languages
The following languages could be used to access the HANA functionalities:
IMSL (International Math &
Statistics Lib)
R
BFL (Business Function Lib) /
PAL (Predictive Analysis Lib)
L
SQL Script V1 and V2
calEngine
few others (e.g. logic/inference)





HANA

BFL / PAL
SQL Script
IMSL
L R
L
L
2011 SAP AG. All rights reserved. 6
IMSL
The IMSL Numerical Libraries have been the cornerstone of high-performance and
desktop computing applications in science, technical and business environments for
well over three decades.
Its developed by Visual Numerics, which has achieved an OEM agreement with
SAP to embed IMSL C Numerical Library Into TREX Component to offer advanced
analytics for SAP applications.

IMSL C math

IMSL C Statitacs



2011 SAP AG. All rights reserved. 7
IMSL
Functional areas included in the IMSL Numerical Libraries:
Mathematics Statistics
Matrix Operations
Linear Algebra
Eigensystems
Interpolation & Approximation
Numerical Quadrature
Differential Equations
Nonlinear Equations
Optimization
Special Functions
Finance & Bond Calculations
Genetic Algorithm
Basic Statistics
Time Series &
Forecasting
Nonparametric Tests
Correlation & Covariance
Data Mining
Regression
Analysis of Variance
Transforms
Goodness of Fit
Distribution Functions
Random Number
Generation
Neural Networks
2011 SAP AG. All rights reserved. 8
IMSL
The IMSL sample code to access HANA







Note: Currently, IMSL functions are only available in the DEV branch of
NewDB

2011 SAP AG. All rights reserved. 9
IMSL
Benefits of Embedding the IMSL

Accelerate Development
Develop Better Software Applications
Develop Flexible Software Applications
Improve Quality and Reduce Uncertainty
Reduce Costs (?)
Fair or better results than other packages


Limitations of IMSL:
OpenMP-based parallelism is not compatible w NewDB
Do not work for partitioned tables
Governance issue: Cannot monitor its memory usage and threading

2011 SAP AG. All rights reserved. 10
What is R?
Aims at building an open source version of S (under
GNU GPL)
Project Home: http://www.r-project.org/
Available on Windows, Linux, and MaxOS
Latest version 2.12.1 (dated on 16/12/2010)
Now has a core team of about 19 people
Support for multiple languages
CRAN
a network of ftp and web servers around the world that store
identical, up-to-date, versions of code and documentation for R
The R Journal
a refereed journal of the R project for statistical computing
Some well-known weakness
is not particularly efficient in handling large data sets
it is rather slow in executing a large number of for loops
Learning curve is somewhat steep compared to point and click
software
2011 SAP AG. All rights reserved. 11
Join
OP
ROP
OLAP
OP
Calc. Engine
Rich other
Plug-in
(Forecasting,
Parallelism,
statistics, etc.)
SHM Channel
Plug-in
REngine
Parser
Runtime
Operators
RClient
TCP/IP Channel
Plug-in
SHM Solution
Single Server
TCP/IP Solution
Different Servers
NewDB Space OpenSource R Space
1
3
2
NOTE:
1. SHM (SHared Memory) Solutoin
we use LGPL to solve the potential IP issue.
NewDB and R need to be in the same server
2. TCP/IP solution NewDB and R can be deployed in different machines.
3. REngine In discussion.
R Integration in NewDB (Available w HANA 1.0GA)
R Runtime
NewDB R Integration Open Source R
Milestones
1. May, 2010 NewDB team had JDBC and CSV
version for R Integration, but it was very slow.
D&NA team joined to develop better solutions.
2. Oct, 2010 Checked in SHM solution into
NewDB Standard build. Got at least 50X
performance improvement V.S. old solution.
3. March, 2011 Checked in Parallemlism
handling for data transition between NewDB and
R. Gained at least another 3X improvements.
4. (In plan) HANA 1.5 Release Release SHM
LGPL version to lower possible IP issues.
5. (In plan) HANA 1.5 Release Release TCP/IP
solution to support multi server requirement.
Internal Customers
1. Oct, 2010 DNA for SalesForecasting
2. J an, 2011 EPM SBC for spend analysis
3. Mar, 2011 PIO for personal financial analysis.
4. April, 2011 IDDC PA for predictive analysis



2011 SAP AG. All rights reserved. 12
Language and tools
Packages
2011 SAP AG. All rights reserved. 13
Languages
SQLScript + R: determine the Poisson Regression Model


CREATE FUNCTION LR( IN input1 SUCC_PREC_TYPE, OUT output0 R_COEF_TYPE)
LANGUAGE RLANG AS'''
CHANGE_FREQ<-input1$CHANGE_FREQ;
SUCC_PREC<-input1$SUCC_PREC;
coefs<-coef(glm(SUCC_PREC ~ CHANGE_FREQ, family = poisson ));
INTERCEPT<-coefs["(Intercept)"];
CHANGEFREQ<-coefs["CHANGE_FREQ"];
names(INTERCEPT)<-NULL;
names(CHANGEFREQ)<-NULL;
result<-as.data.frame(cbind(INTERCEPT,CHANGEFREQ))
''';

TRUNCATE TABLE r_coef_tab;
CALLS LR(SUCC_PREC_tab,r_coef_tab );
SELCET * FROM r_coef_tab;

2011 SAP AG. All rights reserved. 14
Business Function Library
Business Function Library (BFL) is now the calculation library for the Applications which is built on top of NewDB. It resides
in NewDB CalcEngine, consists of many Business Functions executing at NewDB layer and is written in C++.

Significant performance improvements
for SAP apps
1. Utilizing new hardware ( i.e. multi core,
built in vector engine)
2. Massive parallel main memory
processing
3. Changing the boundaries between
application server and data management
layer

Simplification of application
programming model
1. Usage of extended SQL(SQLScript)
2. Rich Functionalities in Calculation Engine
3. Quick apps delivery

Design Goals
BFL Wiki
2011 SAP AG. All rights reserved. 15
BFL Governance
Adam Their
Ralf Ehret
Wen-Syan Li
Volkmar Soehner (LiveCache, Planning Eng)

Kai Stammerjohann
Nico Bohnsack
Volkmar Soehner
Peter Goertz
Thorsten Glebe
Franz Faerber
Daniel Boo
Andrei Suvernev
Volkmar Soehner (LiveCache, Planning eng, )
Wen-Syan Li (BFL)
2011 SAP AG. All rights reserved. 16
BFL Framework
BFL Framework:
Core Service+ RUNTIME EINVIRONNEMENT, will be residence in NewDB. Can be
configured/Plug-in/Invoke BFL . With core service, the application teams can build BFL without
whole NewDB code.

Future Release
As one proposal, we plan to develop BDK (BFL Development Kit) for BFL
development environment, and the BRE (BFL Runtime Environment) for BFL
runtime, including memory allocation, error handing and so on.
BDK plus BRE is the future BFL framework. With the new framework, clients dont
need directly interact with NewDB development environment.
Support the stateful execution of each function.
2011 SAP AG. All rights reserved. 17
L Language
L is tailored to NewDB by SAP.
The programming language L is targeted as a robust, low-level, high-performance
programming language inside NewDB.
L can be described as a safe subset of C++ with NewDB data types and
additional support for processing table like data
L provides direct access to the table and column objects which are used in the
Calculation Engine.
2011 SAP AG. All rights reserved. 18
L Language
Llang The L Programming Language
2011 SAP AG. All rights reserved. 19
L Language
Type Mappings
SQL Type Column Store Type L Null Type L Non-Null Type L Raw Type Notes on L Type
NullBool Bool
Size
TINYINT INT NullInt32 Int32
SMALLINT INT NullInt32 Int32
INTEGER INT NullInt32 Int32
BIGINT FIXED8
NullFixed8<
0>
Fixed8<0>
default 8 bytes
length
REAL FLOAT NullFloat Float RawFloat
DOUBLE DOUBLE NullDouble Double RawDouble
DATE DAYDATE NullDate Date
CHAR(a) FIXEDSTRING(a)
NullFixedStri
ng<a>
FixedString<a>
.......... . .
2011 SAP AG. All rights reserved. 20
L Language
Embed L code in the SQLScript
2011 SAP AG. All rights reserved. 21
SQLScript
SQL is the main interface to applications. NewDB supports standard SQL with a set of NewDB
specific extensions

SQLScript
A new language for processing application-specific code in the database layer, The
main goal of SQL Script is to allow the execution of data intensive calculations inside
NewDB
The main concept in SQL Script is the function. SQL Script functions can have
multiple input and output parameters. They are composed of calls of other functions,
and of SQL queries.
Intermediate results can be assigned to variables that are local to the function. Basic
control flow is possible via if/else clauses and error handing is supported via try/catch
blocks.
The recursion (direct or indirect) is not allowed.
A SQL Script function is free of side effects, that means it computes the values of the
output parameters but modifies no other data. delete, update, insert statements are
not allowed inside SQL Script functions. These restrictions ensure that two function
calls that are not connected via data flows can be executed in parallel.


2011 SAP AG. All rights reserved. 22
SQLScript
Datatype Extension
SQLScripts datatype extension also allows the definition of table types. These table
types are used to define parameters for functions
A table type is created using the CREATE TABLE TYPE statement

Functional Extension
The functional extension allows its users to describe complex data-flow logic using
side-effect free table functions
Functions can be created using CREATE FUNCTION and dropped using DROP
FUNCTION

2011 SAP AG. All rights reserved. 23
SQLScript
Functional Extension




Built-in Functions
There are different categories of built-in functions
Tracing and debugging
Data source access
Relational Operators
2011 SAP AG. All rights reserved. 24
SQLScript
SQLScript version 2
Coming up soon (in a week)
Support loop flow control statements




2011 SAP AG. All rights reserved. 25
Comparisons
IMSL R BFL L SQLScript
Open
Source?
No

Yes No No No
Directly
Called by
Clients

Via L
Via SQLscript
Excel (soon)

Via SQLscript
Via R console
Excel (soon)
Via
SQLscrpt/L
Excel

No Yes
Known
Limitations

Not comply w
IM-DB
governance

not particularly
efficient in
handling large
data sets

Limited
availability
Pre-fined input
and output

No flow
control

Parallelism Limited via
OpenMP
Limited via
OpenMP etc
Yes No Partially Yes?
2011 SAP AG. All rights reserved. 26
Our suggestions
1. Use SQLscript as much as possible because
Reasonable safer than C
You control the development process independent from NewDB
Good for reporting / simple aggregation
2. Use R if
You need to develop algorithms and need interact with the data
Quick prototyping / PoC / small data set for analysis
Have flow control / GUI /Debugging tool

3. Use IMSL if computation is complex.

4. Use BFL/PAL if computation is complex, data set is large, and algorithms need
customization. If product level quality is needed. If partitioned table and cluster
supported are needed.

2011 SAP AG. All rights reserved. 27
Demo
CarShop
V1

2011 SAP AG. All rights reserved. 28
CarShop, V2
In the next version of CarShop, the following features will be considered to
enrich its functionalities to make it more useful for SAP internal users:
SQLscript V2 (control flow)
Support transaction
Planning capability via planning engine
Reduce the memory footprint during execution
Support map/reduce on cluster (HANA 1.5)
Best practice in term of selecting right languages to implement applications
Testing related features? Cancel flag, profiling, .
Q / A ?