Vous êtes sur la page 1sur 36

OBIEE, Hadoop and Big Data Analysis

Mark Rittman, CTO, Rittman Mead


ODTUG KScope14, Seattle, June 2014
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

About the Speaker


Mark Rittman, Co-Founder of Rittman Mead
Oracle ACE Director, specialising in Oracle BI&DW
14 Years Experience with Oracle Technology
Regular columnist for Oracle Magazine
Author of two Oracle Press Oracle BI books
Oracle Business Intelligence Developers Guide
Oracle Exalytics Revealed
Writer for Rittman Mead Blog :
http://www.rittmanmead.com/blog
Email : mark.rittman@rittmanmead.com
Twitter : @markrittman

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

About Rittman Mead


Oracle BI and DW Gold partner
Winner of five UKOUG Partner of the Year awards in 2013 - including BI
World leading specialist partner for technical excellence,
solutions delivery and innovation in Oracle BI
Approximately 80 consultants worldwide
All expert in Oracle BI and DW
Offices in US (Atlanta), Europe, Australia and India
Skills in broad range of supporting Oracle tools:
OBIEE, OBIA
ODIEE
Essbase, Oracle OLAP
GoldenGate
Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive


OBIEE 11.1.1.7 can now access Hadoop environments, through Hive
Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
Approach used by ODI and OBIEE to gain access to Hadoop data
Allows Hadoop data to be accessed just like any other data source

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Hadoop, and the Big Data Ecosystem


Apache Hadoop is one of the most well-known Big Data technologies
Family of open-source products used to store, and analyze distributed datasets
Hadoop is the enabling framework, automatically parallelises and co-ordinates jobs
MapReduce is the programming framework
for filtering, sorting and aggregating data
Map : filter data and pass on to reducers
Reduce : sort, group and return results
MapReduce jobs can be written in any
language (Java etc), but it is complicated
Can be used as an extension of the DW staging layer - cheap processing & storage
And there may be data stored in Hadoop that our BI users might benefit from

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Oracles Big Data Products


Oracle Big Data Appliance - Engineered System for Big Data Acquisition and Processing
Cloudera Distribution of Hadoop
Cloudera Manager
Open-source R
Oracle NoSQL Database Community Edition
Oracle Enterprise Linux + Oracle JVM
Oracle Big Data Connectors
Oracle Loader for Hadoop (Hadoop > Oracle RBDMS)
Oracle Direct Connector for HDFS (HFDS > Oracle RBDMS)
Oracle Data Integration Adapter for Hadoop
Oracle R Connector for Hadoop
Oracle NoSQL Database (column/key-store DB based on BerkeleyDB)

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

BigDataLite Demonstration VM
Demo / Training VM downloadable from OTN
Contains Cloudera Hadoop 4.5 + Oracle Big Data Connectors
Similar to setup on Oracle BDA
Contains OBIEE enabling technologies:
Apache Hive (SQL access over Hadoop)
Apache HDFS (file storage)
Oracle Direct Connector for HDFS
Oracle R Advanced Analytics for Hadoop
Great way to get started with Hadoop
Requires 8GB RAM, modern laptop etc

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

HDFS: Low-Cost, Clustered, Fault-Tolerant Storage


The filesystem behind Hadoop, used to store data for Hadoop analysis
Unix-like, uses commands such as ls, mkdir, chown, chmod
Fault-tolerant, with rapid fault detection and recovery
High-throughput, with streaming data access and large block sizes
Designed for data-locality, placing data closed to where it is processed
Accessed from the command-line, via internet (hdfs://), GUI tools etc
[oracle@bigdatalite mapreduce]$ hadoop fs -mkdir /user/oracle/my_stuff
[oracle@bigdatalite mapreduce]$ hadoop fs -ls /user/oracle
Found 5 items
drwx------ oracle hadoop
0 2013-04-27 16:48 /user/oracle/.staging
drwxrwxrwx
- oracle hadoop
0 2012-09-18 17:02 /user/oracle/moviedemo
drwxrwxrwx
- oracle hadoop
0 2012-10-17 15:58 /user/oracle/moviework
drwxr-xr-x
- oracle hadoop
0 2013-05-03 17:49 /user/oracle/my_stuff
drwxr-xr-x
- oracle hadoop
0 2012-08-10 16:08 /user/oracle/stage

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Hive as the Hadoop Data Warehouse


MapReduce jobs are typically written in Java, but Hive can make this simpler
Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
Approach used by ODI and OBIEE
to gain access to Hadoop data
Allows Hadoop data to be accessed just like
any other data source (sort of...)

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

How Hive Provides SQL Access over Hadoop


Hive uses a RBDMS metastore to hold
table and column definitions in schemas
Hive tables then map onto HDFS-stored files
Managed tables
External tables
Oracle-like query optimizer, compiler,
executor
HDFS
JDBC and OBDC drivers,
plus CLI etc

Hive Driver
(Compile
Optimize, Execute)

Metastore

Managed Tables

External Tables

/user/hive/warehouse/

/user/oracle/
/user/movies/data/

HDFS or local files


loaded into Hive HDFS
area, using HiveQL
CREATE TABLE
command

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

HDFS files loaded into HDFS


using external process, then
mapped into Hive using
CREATE EXTERNAL TABLE
command

Transforming HiveQL Queries into MapReduce Jobs


HiveQL queries are automatically translated into Java MapReduce jobs
Selection and filtering part becomes Map tasks
Aggregation part becomes the Reduce tasks

Map
Task

Map
Task

SELECT a, sum(b)
FROM myTable
WHERE a<100

Map
Task

GROUP BY a
Reduce
Task

Reduce
Task

Result

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Leveraging Hadoop with OBIEE 11g


Demonstration of Hive

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

DW 2013: The Mixed Architecture with Federated Queries


Where many organisations are going:
Traditional DW at core of strategy
Making increasing use of low-cost,
cloud/big data tech for storage /
pre-processing
Access to non-traditional data sources,
usually via ETL in to the DW
Federated data access through
OBIEE connectivity & metadata layer

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Oracle Business Analytics and Big Data Sources


OBIEE 11g can also make use of big data sources
OBIEE 11.1.1.7+ supports Hive/Hadoop as a data source
Oracle R Enterprise can expose R models through DB functions, columns
Oracle Exalytics has InfiniBand connectivity to Oracle BDA
Endeca Information Discovery can analyze unstructured and semi-structured sources
Increasingly tighter-integration between
OBIEE and Endeca

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

OBIEE 11g and Hadoop/Big Data Access


Two main scenarios for OBIEE 11g accessing big data sources
1. Through the data warehouse - no different to any other data provided through the DW
2. Directly - through OBIEE 11.1.1.7+ Hadoop/Hive connectivity
1

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Importing Hadoop/Hive Metadata into RPD


HiveODBC driver has to be installed into Windows environment, so that
BI Administration tool can connect to Hive and return table metadata
Import as ODBC datasource, change physical DB type to Apache Hadoop afterwards
Note that OBIEE queries cannot span >1 Hive schema (no table prefixes)
2

1
3

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Set up ODBC Connection at the OBIEE Server


OBIEE 11.1.1.7+ ships with HiveODBC drivers, need to use 7.x versions though (only Linux
supported)
Configure the ODBC connection in odbc.ini, name needs to match RPD ODBC name
BI Server should then be able to connect to the Hive server, and Hadoop/MapReduce
[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
bigdatalite=Oracle 7.1 Apache Hive Wire Protocol
[bigdatalite]
Driver=/u01/app/Middleware/Oracle_BI1/common/ODBC/
Merant/7.0.1/lib/ARhive27.so
Description=Oracle 7.1 Apache Hive Wire Protocol
ArraySize=16384
Database=default
DefaultLongDataBuffLen=1024
EnableLongDataBuffLen=1024
EnableDescribeParam=0
Hostname=bigdatalite
LoginTimeout=30
MaxVarcharSize=2000
PortNumber=10000
RemoveColumnQualifiers=0
StringDescribeType=12
TransactionMode=0
UseCurrentSchema=0

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Leveraging Hadoop with OBIEE 11g


Demonstration of OBIEE 11.1.1.7 accessing Hadoop
through Hive Connectivity

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Dealing with Hadoop / Hive Latency Option 1 : Exalytics


Hadoop access through Hive can be slow - due to inherent latency in Hive
Hive queries use MapReduce in the background to query Hadoop
Spins-up Java VM on each query
Generates MapReduce job
Runs and collates the answer
Great for large, distributed queries ...
... but not so good for speed-of-thought dashboards
So what if we could use Exalytics to speed-up Hadoop queries?

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Oracle Exalytics In-Memory Machine


Engineered system, complements Oracle Exadata Database Machine (but can work standalone)
Combination of high-end hardware (Sun x86_64 architecture, 3RU rack-mountable, 1-2TB RAM)
and optimized versions of Oracles BI, In-Memory Database and OLAP software
Delivers in-memory analytics focusing on analysis, aggregation and UI
Rich, interactive dashboards with split-second response times
1-2TB (and now 4TB) of RAM, to run your analysis in-memory
Infiniband connection to Exadata and Oracle BDA
40 CPU cores (and now 128) to support high user numbers
Lower TCO through known configuration,
combined patch sets
Contains software features only licensable through
Exalytics package

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Exalytics as the Query Performance Enhancer

Aggregates

Data Warehouse
Detail-level
Data

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Exalytics

In conjunction with a well-tuned data warehouse, Exalytics adds an in-memory analysis layer
Based around Oracle TimesTen for Exalytics, Oracles In-Memory Database
Aggregates are recommended based on query patterns, automatically created in TimesTen
Summary Advisor makes recommendations, which adapt as queries change
Meant to be plug-and-play - no need for
expensive data warehouse tuning
TimesTen
BI Server
So can we use this for speeding-up Hadoop/Hive queries?

Summary Advisor for Aggregate Recommendation & Creation


Utility within Oracle BI Administrator tool that recommends aggregates
Bases recommendations on usage tracking and summary statistics data
Captured based on past activity
Runs an iterative algorithm that searches,
each iteration, for the best aggregate

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Running Some Sample Hadoop / Hive Queries


A simple Hadoop / Hive BMM was created, based off of a single Hive table
Queries run against that BMM that requested aggregates
Query details, and requested aggregates, go in usage tracking & summary statistics tables
Avg. query response time = 30 secs+

select avg(T44678.age) as c1,


T44678.sales_pers as c2,
sum(T44678.age) as c3,
count(T44678.age) as c4
from
dwh_customer T44678
group by T44678.sales_pers

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Generate Aggregate Recommendations using Summary Advisor


Ensure BMM has one or more logical dimensions + 2 or more logical levels
Ensure S_NQ_SUMMARY_ADVISOR table has aggregate recordings + level details
Generate summary recommendations using Summary Advisor, output as nqcmd script

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Implement Recommendations, Review Updated RPD


Run generated logical SQL (Aggregate Persistence) script to create & populate TT tables
Automatically updates RPD to plug-in new TimesTen aggregate tables

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Re-run Reports, now with TimesTen for Exalytics Acceleration


Reports can now be re-run to test improvements from
in-memory aggregation
Response time is now instantaneous
Aggregates will need to be refreshed once new data is
loaded into Hadoop
Can also be used to improve speed of federated
RDBMS - Hadoop - OLAP queries too
But - relies on query caching - doesnt make
Hadoop faster

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Dealing with Hadoop / Hive Latency Option 2 : Use Impala


Hive is slow - because its meant to be used for batch-mode queries
Many companies / projects are trying to improve Hive - one of which is Cloudera
Cloudera Impala is an open-source but
commercially-sponsored in-memory MPP platform
Replaces Hive and MapReduce in the Hadoop stack
Can we use this, instead of Hive, to access Hadoop?
It will need to work with OBIEE
Warning - it wont be a supported data source (yet)

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

How Impala Works


A replacement for Hive, but uses Hive concepts and
data dictionary (metastore)
MPP (Massively Parallel Processing) query engine
that runs within Hadoop
Uses same file formats, security,
resource management as Hadoop
Processes queries in-memory
Accesses standard HDFS file data
Option to use Apache AVRO, RCFile,
LZO or Parquet (column-store)
Designed for interactive, real-time
SQL-like access to Hadoop

BI Server
Presentation Svr

Cloudera Impala
ODBC Driver

Impala

Impala

Hadoop

Hadoop

HDFS etc

Hadoop

HDFS etc

Impala
Hadoop

HDFS etc

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

Impala

E : info@rittmanmead.com
W : www.rittmanmead.com

HDFS etc

Impala
Hadoop
HDFS etc

Multi-Node
Hadoop Cluster

Connecting OBIEE 11.1.1.7 to Cloudera Impala


Warning - unsupported source - limited testing and no support from MOS
Requires Cloudera Impala ODBC drivers - Windows or Linux (RHEL etc/SLES) - 32/64 bit
ODBC Driver / DSN connection steps similar to Hive

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Importing Impala Metadata


Import Impala tables (via the Hive metastore) into RPD
Set database type to Apache Hadoop
Warning - dont set ODBC type to Hadoop- leave at ODBC 2.0
Create physical layer keys, joins etc as normal

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Importing RPD using Impala Metadata


Create BMM layer, Presentation layer as normal
Use View Rows feature to check connectivity back to Impala / Hadoop

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Impala / OBIEE Issue with ORDER BY Clause


Although checking rows in the BI Administration tool worked, any query that aggregates
data in the dashboard will fail
Issue is that Impala requires LIMIT with all ORDER BY clauses
OBIEE could use LIMIT, but doesnt for Impala
at the moment (because not supported)
Workaround - disable ORDER BY in
Database Features, have the BI Server do sorting
Not ideal - but it works, until Impala supported

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Leveraging Hadoop with OBIEE 11g


Demonstration of OBIEE 11.1.1.7 accessing Hadoop
through Impala Connectivity - Flight Delays Dataset

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

So Does Impala Work, as a Hive Substitute?


With ORDER BY disabled in DB features, it appears to
But not extensively tested by me, or Oracle
But its certainly interesting
Reduces 30s, 180s queries down to 1s, 10s etc
Impala, or one of the competitor projects
(Drill, Dremel etc) assumed to be the
real-time query replacement for Hive, in time
Oracle announced planned support for
Impala at OOW2013 - watch this space

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Thank You for Attending!


Thank you for attending this presentation, and more information can be found at http://
www.rittmanmead.com
Contact us at info@rittmanmead.com or mark.rittman@rittmanmead.com
Look out for our book, Oracle Business Intelligence Developers Guide out now!
Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

OBIEE, Hadoop and Big Data Analysis


Mark Rittman, CTO, Rittman Mead
ODTUG KScope14, Seattle, June 2014
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)

E : info@rittmanmead.com
W : www.rittmanmead.com

Vous aimerez peut-être aussi