Vous êtes sur la page 1sur 39

Copyright 2014 Oracle and/or its affiliates. All rights reserved.

|
Oracle Advanced Analytics
Oracle R Enterprise
Fekete Zoltn
Principal sales consultant
CEE, Oracle Database and Database Options

https://blogs.oracle.com/zfekete


Copyright 2014 Oracle and/or its affiliates. All rights reserved. |


Safe Harbor Statement
The following is intended to outline our general product direction. It is
intended for information purposes only, and may not be incorporated
into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing
decisions.
The development, release, and timing of any features or functionality
described for Oracles products remains at the sole discretion of Oracle.
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle In-Database Analytics
14 years stem celling analytics into Oracle
Designed into database kernel to leverage relational database strengths
Nave Bayes and Association Rules1
st
algorithms added in 9i Release
Leverages counting, conditional probabilities, and much more
Now, analytical database platform
13+ in-DB implementation mining algorithms and 50+ (free) SQL statistical functions
Data mining model is DB object, built via PL/SQL API; scored via SQL functions
When building models, leverage existing scalable technology
For example: parallel execution, bitmap indexes, aggregation techniques) and add new core database
technology (e.g., recursion within the parallel infrastructure, IEEE float, etc.
True power of embedding within the database is evident when scoring models using built-in SQL
functions (incl. Exadata)



R
select cust_id
from customers
where region = US
and prediction_probability(churnmod, Y using *) > 0.8;
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Key Features
Oracle Advanced Analytics
Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics
In-database data mining algorithms
and open source R algorithms
SQL, PL/SQL, R languages
Scalable, parallel in-database
execution
Workflow GUI and IDEs
Integrated component of Database
Enables enterprise analytical
applications
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Classification
Association
Rules
Clustering
Attribute
Importance
Problem Algorithms Applicability
Classical statistical technique
Popular / Rules / transparency
Embedded app
Wide / narrow data / text
Minimum Description Length (MDL)
Attribute reduction
Identify useful data
Reduce data noise
Hierarchical K-Means
Hierarchical O-Cluster
Expectation Maximization

Product grouping
Text mining
Gene and protein analysis
Apriori
Market basket analysis
Link analysis
Multiple Regression (GLM)
Support Vector Machine
Classical statistical technique
Wide / narrow data / text
Regression
Feature
Extraction
Nonnegative Matrix Factorization
Principal Components Analysis (PCA)
Singular Vector Decomposition (SVD)

Feature reduction
Text analysis
Logistic Regression (GLM)
Decision Trees
Nave Bayes
Support Vector Machine (SVM)
One Class SVM
Lack examples of target field
Anomaly
Detection
A1 A2 A3 A4 A5 A6 A7
F1 F2 F3 F4
Oracle Advanced Analytics
SQL Data Mining Algorithms12c
R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics
SQL Statistics and SQL Analytics (free)
Descriptive Statistics
DBMS_STAT_FUNCS: summarizes numerical columns of a
table and returns count, min, max, range, mean, median,
stats_mode, variance, standard deviation, quantile values,
+/- n sigma values, top/bottom 5 values
Correlations
Pearsons correlation coefficients, Spearman's and Kendall's
(both nonparametric).
Cross Tabs
Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa
Hypothesis Testing
Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks
test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov
test, One-way ANOVA
Distribution Fitting
Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-
Squared Test, Normal, Uniform, Weibull, Exponential
Ranking functions
rank, dense_rank, cume_dist, percent_rank, ntile
Window Aggregate functions
(moving & cumulative)
Avg, sum, min, max, count, variance, stddev,
first_value, last_value
LAG/LEAD functions
Direct inter-row reference using offsets
Reporting Aggregate functions
Sum, avg, min, max, variance, stddev, count,
ratio_to_report
Statistical Aggregates
Correlation, linear regression family, covariance
Linear regression
Fitting of an ordinary-least-squares regression line to
a set of number pairs.
Frequently combined with the COVAR_POP,
COVAR_SAMP, and CORR functions
Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition and Enterprise Edition
In-DB SQLStatistics
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics Architecture
OBIEE





Oracle Database Enterprise Edition
Oracle R Distribution
Oracle Advanced Analytics
Native SQL-PL/SQL Analytic Libraries plus high-performance R interface
Scalable, Distributed, Parallel Execution
SQL Developer




Applications





R Client





Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Efficient in-Database Advanced Analytics
Algorithms work-horse, scalable, parallel, distributed
Augmented by curated, supported CRAN packages
Oracle distributes and supports open source R on many OS platforms
Interaction Language for quants R
Graphical User interface for Business Users
Deployment language - SQL
Compute framework to roll your own macros
Big data
Database In-Memory, Database 12c
What does it take?
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics Architecture
OBIEE





Oracle Database Enterprise Edition
Oracle R Distribution
Oracle Advanced Analytics
Native SQL-PL/SQL Analytic Libraries plus high-performance R interface
Scalable, Distributed, Parallel Execution
SQL Developer




Applications





R Client





Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Strategy for R
Provide high-performance, scalable R environment tightly integrated with
Oracle RDBMS and Hadoop

Full access to Database and HDFS objects
High performance and scalability for all R
operations
Scalable, Natively integrated machine
learning algorithms
Deploy R scripts and store R calculation
results in Database or Hadoop
For R users
Execute embedded R scripts containing
any R algorithm or calculation
Access stored R results in Database or
Hadoop HDFS
Retrieve R computation results in
graphical formats like XML or PNG
Integrate R results into BI Applications
For Database &
Big Data developers
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Embedded R Execution
Ability to execute R code on the database server
Execution controlled and managed by Oracle Database
Eliminates loading data to the users R engine and result write-back to
Oracle Database
Enables data- and task-parallel execution of R functions
Enables SQL access to R: invocation and results
Supports use of open source CRAN packages at the database server
R scripts can be stored and managed in the database
Schedule R scripts for automatic execution
11
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracles R Technologies
Oracle R Distribution
ROracle
Oracle R Enterprise (Component of Oracle Advanced Analytics option)
Oracle R Advanced Analytics for Hadoop (Component of Big Data Connectors option)
To learn more:
http://www.oracle.com/goto/R
http://www.amazon.com/Using-Unlock-Value-Big-Data/dp/0071824383


Software available to
R Community for free
12
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
ROracle
In benchmark comparison tests, ROracle performed up to
79 times faster than RJDBC and
2.5 times faster than RODBC
for reading data across a range of 1000 rows to 1 million rows, and 10 to
1000 columns. ROracle shows scalability across NUMBER, VARCHAR2,
TIMESTAMP, and BINARY_DOUBLE data types.
Similarly, for writing data to Oracle Database, ROracle was
61 times faster for 10 columns at 10 thousand rows than RODBC, and
630 times faster for the same data than RJDBC.

13
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
How Oracle R Enterprise Works
Oracle R Enterprise tightly integrates R with the database and fully manages the
data operated upon by R code.
The database is always involved in serving up data to the R code.
Oracle R Enterprise runs in the Oracle Database.
Oracle R Enterprise eliminates data movement and duplication, maintains security
and minimizes latency time from raw data to new information.
Three ORE Computation Engines
Oracle R Enterprise provides three different interfaces between the open-source R engine and the
Oracle database:
1. Oracle R Enterprise (ORE) Transparency Layer
2. Oracle Statistics Engine
3. Embedded R
ORE Computation Engines
R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
How Oracle R Enterprise Works
Transparency Layer: set of packages that map R data types to
Oracle Database objects.
automatically generates SQL for R expressions on mapped data types,
direct interaction:Oracle Database while using R language constructs.
access to database tables from R as a type of data.frame: a base R data
representation with rows and columns. ORE calls this an ore.frame.
when you invoke an R function on an ore.frame, the R operation is sent
to the database for execution as SQL.
ORE Computation Engines
R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
How Oracle R Enterprise Works
Statistics Engine:
database library: supports a variety of statistical computations.
This engine includes existing in-database advanced analytics and new
features added specifically in ORE.
SQL extensions:
enable in-database embedded R execution, which is particularly
valuable for third-party R packages, or custom functions, that do not
have equivalent in-database functionality.
ORE Computation Engines
R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics

R-SQL Transparency Framework intercepts R
functions for scalable in-database execution
Function intercept for data transforms,
statistical functions and advanced analytics
Interactive display of graphical results and flow
control as in standard R
Submit entire R scripts for execution by database

Scale to large datasets
Access tables, views, and external tables, as well as
data through
DB LINKS
Leverage database SQL parallelism
Leverage new and existing in-database statistical
and data mining capabilities
R Engine
Other R
packages
Oracle R Enterprise packages
User R Engine on desktop

Database can spawn multiple R engines for
database-managed parallelism
Efficient data transfer to spawned R engines
Emulate map-reduce style algorithms and
applications
Enables lights-out execution of R scripts

1
User tables
Oracle Database
SQL
Results
Database Compute Engine
2
R Engine
Other R
packages
Oracle R Enterprise packages
R Engine(s) spawned by Oracle DB
R
Results
3
?x
R
Open Source
R Enterprise Compute Engines
R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
R: Transparency through function overloading
Invoke in-database aggregation function
> aggdata <- aggregate(ONTIME_S$DEST,
+ by = list(ONTIME_S$DEST),
+ FUN = length)
ONTIME_S
In-db
Stats
Oracle Database

Oracle SQL
select DEST, count(*)
from ONTIME_S
group by DEST
Oracle R Enterprise
Client Packages

Transparency Layer
> class(aggdata)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
> head(aggdata)
Group.1 x
1 ABE 237
2 ABI 34
3 ABQ 1357
4 ABY 10
5 ACK 3
6 ACT 33
Database Server
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle R Advanced Analytics for Hadoop
(part of the Oracle Big Data Connectors)


Direct Execution of R Analytics on the Hadoop Cluster, and Integration with the ORE in
the Database
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle R Advanced Analytics
for Hadoop

Oracle Advanced Analytics: Hadoop Integration
Using the Hadoop-HDFS Integration, Custom and Open-Source R Packages
Translation of R requests to Hadoop:
HDFS Utilities: Data Movement and Statistics,
pushing data to R, Data Sampling
ORCH Utilities: Connect/Disconnect R Sessions
HIVE Interfaces: Load table metadata and
interface
ORCH Custom R algorithms: Neural, GLM, LM
kMeans,NMF,LMF
Custom R Analytics are written once for a Mapper
& Reducer framework, and are reused as is. I/O is
then built for both the Database and Hadoop
Client Interfaces
HDFS engine
R Client Interface
Oracle R Advanced
Analytics for Hadoop
packages:
Hadoop
MapReduce
HIVE Transparency
Layer
Oracle R Enterprise
packages:
Transparency
Embedded R
R, Java
Hadoop Cluster
Parallel MapReduce Calls
(x)
(x)
Oracle Database

Advanced Analytics Option
Oracle R Distribution
SQL, PL/SQL, R
Big Data Connectors
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle R Advanced Analytics for Hadoop
Invoke Linear Model from data in HDFS, and get Graphical output
Hadoop Job Controller

dfs <- hdfs.put(iris, key='Species')
res <- NULL
dfs.res <- hadoop.run(
dfs,
mapper = function(key, vals) {
keyval(key, vals)
},
reducer = function(key, vals) {
dat <- do.call(rbind.data.frame, vals)
orch.dlogv(colnames(dat))
mod = lm(Petal.Length ~ Sepal.Length+Petal.Width, data=dat)
fname <- paste("fit-",key,".png",sep="")
png(fname)
par(mfrow=c(2, 2), cex=0.6, mar=c(6, 6, 6, 4), mex=0.8)
plot(mod,id.n=1, cex.caption=0.8, which=1:4)
dev.off()
hdfs.fdir <- "/user/pngfiles"
hdfs.fname <- paste(hdfs.fdir,"/",fname, sep="")
system(paste("hadoop fs -copyFromLocal", fname, hdfs.fdir))
pred <- predict(mod, dat)
keyval(NULL, orch.pack(pred, hdfs.fname))
}
)
Hadoop Cluster
iris
Oracle RAAH
Client Packages

Map/Reduce Layer
2
Mapper
Reducer
R Result
Objects: Linear
Model Plots
3
res <- hdfs.get(dfs.res)
finalres = list()
for (i in 1:nrow(res))
{
finalres[[i]] <-
orch.unpack(res[i,])
}
4
5
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics Architecture
OBIEE





Oracle Database Enterprise Edition
Oracle R Distribution
Oracle Advanced Analytics
Native SQL-PL/SQL Analytic Libraries plus high-performance R interface
Scalable, Distributed, Parallel Execution
SQL Developer




Applications





R Client





Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Exadata with Analytics and Business Intelligence
Better Together
In-database data mining
builds predictive models
that predict customer
behavior
OBIEEs integrated spatial
mapping shows where
Customer most likely to be
HIGH and VERY HIGH value
customer in the future



Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Visualizing R Results from OBIEE Dashboards
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics Architecture
OBIEE





Oracle Database Enterprise Edition
Oracle R Distribution
Oracle Advanced Analytics
Native SQL-PL/SQL Analytic Libraries plus high-performance R interface
Scalable, Distributed, Parallel Execution
SQL Developer




Applications





R Client





Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Enabling Predictive Applications
Example Oracle Applications Using Oracle Advanced Analytics
HCM Fusion
Predictive Workforceemployee turnover and performance prediction
and What if? analysis
CRM Fusion
Sales Prediction Engine--prediction of sales opportunities, what to sell,
amount, timing, etc.
Supply Chain Management
Spend Classificationreal-time flagging of noncompliance and
anomalies in expense submissions
Identity Management
Oracle Adaptive Access Managerreal-time security and fraud analytics
Industry Data Models
Communications Data Model implements churn prediction,
segmentation, profiling, etc.
Retail Data Model implements loyalty and market basket analysis
Airline Data Model implements analysis frequent flyers, loyalty, etc.



Oracle Fin. Services Analytic Applications
Customer Insight, Enterprise Risk Management
Enterprise Performance, Financial Crime and Compliance
OFSAA CI Retail Customer Analytics
Attrition Analysis- Mortgage Prepay, Savings Account Attrition, Term
Deposit, Cards
Survival analysis
Customer Lifetime value
Propensity Models- Credit Cards <-> Auto loans, Savings <-> Cards
Retail Analytics
Oracle Retail Customer Analyticsshopping cart analysis and next
best offers
Customer Support
Predictive Incident Monitoring (PIM) Customer Service offering for
Database customers
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics
Oracle Advanced Analytics 12c
New SQL data mining algorithms (Expectation-Maximization, PCA,
Singular Value Decomposition, Text Mining and other algorithm improvements)
Predictive SQL Queriesautomatic build, apply within SQL query
Oracle Data Miner/SQL Developer 4.0 (for Oracle Database 11g and 12c)
New Graph node (box, scatter, bar, histograms)
SQL Query node + integration of R scripts
Automatic SQL script generation for deployment
Oracle R Enterprise 1.4 (for Oracle Database 11g and 12c)
Distributed and scalable parallel models: MLP Neural Networks (ore.neural),
Linear Models (ore.lm), Generalized and Logistic Regression (ore.glm),
Principal Component Analysis (princomp), Time Series Analysis (Exponential Smoothing)
Scoring Database tables with open-source R Models; in-Database Sampling
Persist and Manage R Objects in-Database; Improved integration with OBIEE
Summary New Features
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
SQL Developer/Oracle Data Miner 4.0
New Nodes in Oracle
Data Miner GUI
New SQL Query node
Allows any form of
query/transformation/statistics
Insert anywhere within flow
Allows integration of R Scripts
New Predictive Query
nodes (requires 12c)
R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
SQL Query Node to Integrate R Scripts

R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
SQL Query Node to Integrate R Scripts

R
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics Example

Use of All 3 ORE Engines Within 1 R Script
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Stream Acquire Organize Analyze
Oracle BI Foundation
Suite
Oracle Real-Time
Decisions
Endeca Information
Discovery
Decide
Oracle Event
Processing Oracle Big Data
Connectors
Oracle Data
Integrator
Oracle
Advanced
Analytics
Oracle
Database
Oracle Spatial
& Graph
Apache
Flume
Oracle
GoldenGate
Oracle NoSQL
Database
Cloudera
Hadoop
Oracle R
Distribution
Crystal
Ball
Oracle Big Data Solution Architecture
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Kiprbls: Oracle Big Data Lite Virtual Machine
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
34
Oracle Enterprise Linux 6.4
Oracle Database 12c Release 1 Enterprise Edition (12.1.0.2) - including Oracle Big Data SQL-enabled external tables
Cloudera Distribution including Apache Hadoop (CDH5.1.2)
Cloudera Manager (5.1.2)
Oracle Big Data Connectors 4.0
Oracle SQL Connector for HDFS 3.1.0
Oracle Loader for Hadoop 3.2.0
Oracle Data Integrator 12c
Oracle R Advanced Analytics for Hadoop 2.4.1
Oracle XQuery for Hadoop 4.0.1
Oracle NoSQL Database Enterprise Edition 12cR1 (3.0.14)
Oracle JDeveloper 12c (12.1.3)
Oracle SQL Developer and Data Modeler 4.0.3
Oracle Data Integrator 12cR1 (12.1.3)
Oracle GoldenGate 12c
Oracle R Distribution 3.1.1
Oracle Perfect Balance 2.2

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Kiprbls: Oracle Big Data Lite Virtual Machine
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
35
Using SQL Pattern Matching
This series features both OBEs and recorded webcasts. Learn how to use SQL for Pattern Matching. Row
pattern matching in native SQL improves application and development productivity and query efficiency
for row-sequence analysis.
Oracle Data Mining 12c Tutorial Series
The OBE's in this series provide you with instructions on how to perform data mining with Oracle
Database 12c, by using Oracle Data Miner 4.0. Oracle Data Miner 4.0 is included as an extension of Oracle
SQL Developer, version 4.0.
Oracle R Enterprise v 1.4 - Tutorial Series
Oracle R Enterprise (ORE), a component of the Oracle Advanced Analytics Option, makes the open source
R statistical programming language and environment ready for the enterprise and big data. This series
teaches you how to use Oracle R Enterprise, version 1.4.

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
OAA Links and Resources
Oracle Advanced Analytics Overview:
OAA data sheet on OTN
Oracle Internal OAA Product Management Wiki and Workspace
YouTube recorded OAA Presentations and Demos:
Oracle Advanced Analytics and Data Mining at the YouTube Movies (6 + OAA Demos on Retail, Fraud, Loyalty, Overview, etc.)
Getting Started:
Link to Getting Started w/ ODM blog entry
Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course.
Link to OAA/Oracle Data Mining 4.0 Oracle by Examples (free) Tutorials on OTN
Take a Free Test Drive of Oracle Advanced Analytics (Oracle Data Miner GUI) on the Amazon Cloud (Vlamis Partner)
Link to SQL Developer Days Virtual Event w/ downloadable Virtual Machine (VM) images of Oracle Database + ODM/ODMr and e-training
for Hands on Labs
Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN
Additional Resources:
Oracle Advanced Analytics Option on OTN page
OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog
OAA/Oracle R Enterprise page on OTN page, ORE Documentation & ORE Blog
Oracle SQL based Basic Statistical functions on OTN
Business Intelligence, Warehousing & AnalyticsBIWA Summit 2014, Jan 14-16 at Oracle HQ Conference Center

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Resources
Book: Using R to Unlock the Value of Big Data

Blog: https://blogs.oracle.com/R/

Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397

Oracle R Distribution
ROracle
Oracle R Enterprise
Oracle R Advanced Analytics for Hadoop
http://oracle.com/goto/R
37
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Other Resources
Oracle R Enterprise:
http://www.oracle.com/technetwork/database/database-technologies/r/r-
enterprise/overview/index.html
Oracle Data Mining:
http://www.oracle.com/technetwork/database/options/advanced-
analytics/odm/index.html
Oracle R Advanced Analytics for Hadoop:
http://www.oracle.com/technetwork/database/database-
technologies/bdc/r-advanalytics-for-hadoop/overview/index.html

38
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

Vous aimerez peut-être aussi