Mrittman 140629051559 Phpapp02

OBIEE, Hadoop and Big Data Analysis
Mark Rittman, CTO, Rittman Mead

ODTUG KScope14, Seattle, June 2014
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
About the Speaker

Mark Rittman, Co-Founder of Rittman Mead
Oracle ACE Director, specialising in Oracle BI&DW
14 Years Experience with Oracle Technology
Regular columnist for Oracle Magazine
Author of two Oracle Press Oracle BI books
Oracle Business Intelligence Developers Guide
Oracle Exalytics Revealed
Writer for Rittman Mead Blog :
http://www.rittmanmead.com/blog
Email : mark.rittman@rittmanmead.com
Twitter : @markrittman
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

About Rittman Mead

Oracle BI and DW Gold partner
Winner of five UKOUG Partner of the Year awards in 2013 - including BI
World leading specialist partner for technical excellence,
solutions delivery and innovation in Oracle BI
Approximately 80 consultants worldwide
All expert in Oracle BI and DW
Offices in US (Atlanta), Europe, Australia and India
Skills in broad range of supporting Oracle tools:
OBIEE, OBIA
ODIEE
Essbase, Oracle OLAP
GoldenGate
Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive

OBIEE 11.1.1.7 can now access Hadoop environments, through Hive
Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
Approach used by ODI and OBIEE to gain access to Hadoop data
Allows Hadoop data to be accessed just like any other data source
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Hadoop, and the Big Data Ecosystem

Apache Hadoop is one of the most well-known Big Data technologies
Family of open-source products used to store, and analyze distributed datasets
Hadoop is the enabling framework, automatically parallelises and co-ordinates jobs
MapReduce is the programming framework
for filtering, sorting and aggregating data
Map : filter data and pass on to reducers
Reduce : sort, group and return results
MapReduce jobs can be written in any
language (Java etc), but it is complicated
Can be used as an extension of the DW staging layer - cheap processing & storage
And there may be data stored in Hadoop that our BI users might benefit from
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Oracles Big Data Products

Oracle Big Data Appliance - Engineered System for Big Data Acquisition and Processing
Cloudera Distribution of Hadoop
Cloudera Manager
Open-source R
Oracle NoSQL Database Community Edition
Oracle Enterprise Linux + Oracle JVM
Oracle Big Data Connectors
Oracle Loader for Hadoop (Hadoop > Oracle RBDMS)
Oracle Direct Connector for HDFS (HFDS > Oracle RBDMS)
Oracle Data Integration Adapter for Hadoop
Oracle R Connector for Hadoop
Oracle NoSQL Database (column/key-store DB based on BerkeleyDB)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

BigDataLite Demonstration VM
Demo / Training VM downloadable from OTN
Contains Cloudera Hadoop 4.5 + Oracle Big Data Connectors
Similar to setup on Oracle BDA
Contains OBIEE enabling technologies:
Apache Hive (SQL access over Hadoop)
Apache HDFS (file storage)
Oracle Direct Connector for HDFS
Oracle R Advanced Analytics for Hadoop
Great way to get started with Hadoop
Requires 8GB RAM, modern laptop etc
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

HDFS: Low-Cost, Clustered, Fault-Tolerant Storage

The filesystem behind Hadoop, used to store data for Hadoop analysis
Unix-like, uses commands such as ls, mkdir, chown, chmod
Fault-tolerant, with rapid fault detection and recovery
High-throughput, with streaming data access and large block sizes
Designed for data-locality, placing data closed to where it is processed
Accessed from the command-line, via internet (hdfs://), GUI tools etc
[oracle@bigdatalite mapreduce]$ hadoop fs -mkdir /user/oracle/my_stuff
[oracle@bigdatalite mapreduce]$ hadoop fs -ls /user/oracle
Found 5 items
drwx------ oracle hadoop
0 2013-04-27 16:48 /user/oracle/.staging
drwxrwxrwx
- oracle hadoop
0 2012-09-18 17:02 /user/oracle/moviedemo
drwxrwxrwx
- oracle hadoop
0 2012-10-17 15:58 /user/oracle/moviework
drwxr-xr-x
- oracle hadoop
0 2013-05-03 17:49 /user/oracle/my_stuff
drwxr-xr-x
- oracle hadoop
0 2012-08-10 16:08 /user/oracle/stage
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Hive as the Hadoop Data Warehouse

MapReduce jobs are typically written in Java, but Hive can make this simpler
Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
Approach used by ODI and OBIEE
to gain access to Hadoop data
Allows Hadoop data to be accessed just like
any other data source (sort of...)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

How Hive Provides SQL Access over Hadoop

Hive uses a RBDMS metastore to hold
table and column definitions in schemas
Hive tables then map onto HDFS-stored files
Managed tables
External tables
Oracle-like query optimizer, compiler,
executor
HDFS
JDBC and OBDC drivers,
plus CLI etc
Hive Driver
(Compile
Optimize, Execute)
Metastore
Managed Tables
External Tables
/user/hive/warehouse/
/user/oracle/
/user/movies/data/
HDFS or local files

loaded into Hive HDFS
area, using HiveQL
CREATE TABLE
command
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

HDFS files loaded into HDFS

using external process, then
mapped into Hive using
CREATE EXTERNAL TABLE
command
Transforming HiveQL Queries into MapReduce Jobs

HiveQL queries are automatically translated into Java MapReduce jobs
Selection and filtering part becomes Map tasks
Aggregation part becomes the Reduce tasks
Map
Task
Map
Task
SELECT a, sum(b)
FROM myTable
WHERE a<100
Map
Task
GROUP BY a
Reduce
Task
Reduce
Task
Result
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Leveraging Hadoop with OBIEE 11g

Demonstration of Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

DW 2013: The Mixed Architecture with Federated Queries

Where many organisations are going:
Traditional DW at core of strategy
Making increasing use of low-cost,
cloud/big data tech for storage /
pre-processing
Access to non-traditional data sources,
usually via ETL in to the DW
Federated data access through
OBIEE connectivity & metadata layer
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Oracle Business Analytics and Big Data Sources

OBIEE 11g can also make use of big data sources
OBIEE 11.1.1.7+ supports Hive/Hadoop as a data source
Oracle R Enterprise can expose R models through DB functions, columns
Oracle Exalytics has InfiniBand connectivity to Oracle BDA
Endeca Information Discovery can analyze unstructured and semi-structured sources
Increasingly tighter-integration between
OBIEE and Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

OBIEE 11g and Hadoop/Big Data Access

Two main scenarios for OBIEE 11g accessing big data sources
1. Through the data warehouse - no different to any other data provided through the DW
2. Directly - through OBIEE 11.1.1.7+ Hadoop/Hive connectivity
1
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Importing Hadoop/Hive Metadata into RPD

HiveODBC driver has to be installed into Windows environment, so that
BI Administration tool can connect to Hive and return table metadata
Import as ODBC datasource, change physical DB type to Apache Hadoop afterwards
Note that OBIEE queries cannot span >1 Hive schema (no table prefixes)
2
1
3
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Set up ODBC Connection at the OBIEE Server

OBIEE 11.1.1.7+ ships with HiveODBC drivers, need to use 7.x versions though (only Linux
supported)
Configure the ODBC connection in odbc.ini, name needs to match RPD ODBC name
BI Server should then be able to connect to the Hive server, and Hadoop/MapReduce
[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
bigdatalite=Oracle 7.1 Apache Hive Wire Protocol
[bigdatalite]
Driver=/u01/app/Middleware/Oracle_BI1/common/ODBC/
Merant/7.0.1/lib/ARhive27.so
Description=Oracle 7.1 Apache Hive Wire Protocol
ArraySize=16384
Database=default
DefaultLongDataBuffLen=1024
EnableLongDataBuffLen=1024
EnableDescribeParam=0
Hostname=bigdatalite
LoginTimeout=30
MaxVarcharSize=2000
PortNumber=10000
RemoveColumnQualifiers=0
StringDescribeType=12
TransactionMode=0
UseCurrentSchema=0
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


Demonstration of OBIEE 11.1.1.7 accessing Hadoop
through Hive Connectivity
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Dealing with Hadoop / Hive Latency Option 1 : Exalytics

Hadoop access through Hive can be slow - due to inherent latency in Hive
Hive queries use MapReduce in the background to query Hadoop
Spins-up Java VM on each query
Generates MapReduce job
Runs and collates the answer
Great for large, distributed queries ...
... but not so good for speed-of-thought dashboards
So what if we could use Exalytics to speed-up Hadoop queries?
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Oracle Exalytics In-Memory Machine

Engineered system, complements Oracle Exadata Database Machine (but can work standalone)
Combination of high-end hardware (Sun x86_64 architecture, 3RU rack-mountable, 1-2TB RAM)
and optimized versions of Oracles BI, In-Memory Database and OLAP software
Delivers in-memory analytics focusing on analysis, aggregation and UI
Rich, interactive dashboards with split-second response times
1-2TB (and now 4TB) of RAM, to run your analysis in-memory
Infiniband connection to Exadata and Oracle BDA
40 CPU cores (and now 128) to support high user numbers
Lower TCO through known configuration,
combined patch sets
Contains software features only licensable through
Exalytics package
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Exalytics as the Query Performance Enhancer
Aggregates
Data Warehouse
Detail-level
Data
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Exalytics
In conjunction with a well-tuned data warehouse, Exalytics adds an in-memory analysis layer
Based around Oracle TimesTen for Exalytics, Oracles In-Memory Database
Aggregates are recommended based on query patterns, automatically created in TimesTen
Summary Advisor makes recommendations, which adapt as queries change
Meant to be plug-and-play - no need for
expensive data warehouse tuning
TimesTen
BI Server
So can we use this for speeding-up Hadoop/Hive queries?
Summary Advisor for Aggregate Recommendation & Creation

Utility within Oracle BI Administrator tool that recommends aggregates
Bases recommendations on usage tracking and summary statistics data
Captured based on past activity
Runs an iterative algorithm that searches,
each iteration, for the best aggregate
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Running Some Sample Hadoop / Hive Queries

A simple Hadoop / Hive BMM was created, based off of a single Hive table
Queries run against that BMM that requested aggregates
Query details, and requested aggregates, go in usage tracking & summary statistics tables
Avg. query response time = 30 secs+
select avg(T44678.age) as c1,

T44678.sales_pers as c2,
sum(T44678.age) as c3,
count(T44678.age) as c4
from
dwh_customer T44678
group by T44678.sales_pers
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Generate Aggregate Recommendations using Summary Advisor

Ensure BMM has one or more logical dimensions + 2 or more logical levels
Ensure S_NQ_SUMMARY_ADVISOR table has aggregate recordings + level details
Generate summary recommendations using Summary Advisor, output as nqcmd script
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Implement Recommendations, Review Updated RPD

Run generated logical SQL (Aggregate Persistence) script to create & populate TT tables
Automatically updates RPD to plug-in new TimesTen aggregate tables
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Re-run Reports, now with TimesTen for Exalytics Acceleration

Reports can now be re-run to test improvements from
in-memory aggregation
Response time is now instantaneous
Aggregates will need to be refreshed once new data is
loaded into Hadoop
Can also be used to improve speed of federated
RDBMS - Hadoop - OLAP queries too
But - relies on query caching - doesnt make
Hadoop faster
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Dealing with Hadoop / Hive Latency Option 2 : Use Impala

Hive is slow - because its meant to be used for batch-mode queries
Many companies / projects are trying to improve Hive - one of which is Cloudera
Cloudera Impala is an open-source but
commercially-sponsored in-memory MPP platform
Replaces Hive and MapReduce in the Hadoop stack
Can we use this, instead of Hive, to access Hadoop?
It will need to work with OBIEE
Warning - it wont be a supported data source (yet)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

How Impala Works

A replacement for Hive, but uses Hive concepts and
data dictionary (metastore)
MPP (Massively Parallel Processing) query engine
that runs within Hadoop
Uses same file formats, security,
resource management as Hadoop
Processes queries in-memory
Accesses standard HDFS file data
Option to use Apache AVRO, RCFile,
LZO or Parquet (column-store)
Designed for interactive, real-time
SQL-like access to Hadoop
BI Server
Presentation Svr
Cloudera Impala
ODBC Driver
Impala
Impala
Hadoop
Hadoop
HDFS etc
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Impala
HDFS etc
Impala
Hadoop
HDFS etc
Multi-Node
Hadoop Cluster
Connecting OBIEE 11.1.1.7 to Cloudera Impala

Warning - unsupported source - limited testing and no support from MOS
Requires Cloudera Impala ODBC drivers - Windows or Linux (RHEL etc/SLES) - 32/64 bit
ODBC Driver / DSN connection steps similar to Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Importing Impala Metadata

Import Impala tables (via the Hive metastore) into RPD
Set database type to Apache Hadoop
Warning - dont set ODBC type to Hadoop- leave at ODBC 2.0
Create physical layer keys, joins etc as normal
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Importing RPD using Impala Metadata

Create BMM layer, Presentation layer as normal
Use View Rows feature to check connectivity back to Impala / Hadoop
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Impala / OBIEE Issue with ORDER BY Clause

Although checking rows in the BI Administration tool worked, any query that aggregates
data in the dashboard will fail
Issue is that Impala requires LIMIT with all ORDER BY clauses
OBIEE could use LIMIT, but doesnt for Impala
at the moment (because not supported)
Workaround - disable ORDER BY in
Database Features, have the BI Server do sorting
Not ideal - but it works, until Impala supported
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or


Demonstration of OBIEE 11.1.1.7 accessing Hadoop
through Impala Connectivity - Flight Delays Dataset
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

So Does Impala Work, as a Hive Substitute?

With ORDER BY disabled in DB features, it appears to
But not extensively tested by me, or Oracle
But its certainly interesting
Reduces 30s, 180s queries down to 1s, 10s etc
Impala, or one of the competitor projects
(Drill, Dremel etc) assumed to be the
real-time query replacement for Hive, in time
Oracle announced planned support for
Impala at OOW2013 - watch this space
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Thank You for Attending!

Thank you for attending this presentation, and more information can be found at http://
www.rittmanmead.com
Contact us at info@rittmanmead.com or mark.rittman@rittmanmead.com
Look out for our book, Oracle Business Intelligence Developers Guide out now!
Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

OBIEE, Hadoop and Big Data Analysis

Mark Rittman, CTO, Rittman Mead
ODTUG KScope14, Seattle, June 2014
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Mrittman 140629051559 Phpapp02

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Mrittman 140629051559 Phpapp02

Transféré par

Droits d'auteur :

Formats disponibles

OBIEE, Hadoop and Big Data Analysis

Mark Rittman, CTO, Rittman Mead

About the Speaker

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

About Rittman Mead

New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Hadoop, and the Big Data Ecosystem

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Oracles Big Data Products

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

HDFS: Low-Cost, Clustered, Fault-Tolerant Storage

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Hive as the Hadoop Data Warehouse

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

How Hive Provides SQL Access over Hadoop

HDFS or local files

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

HDFS files loaded into HDFS

Transforming HiveQL Queries into MapReduce Jobs

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Leveraging Hadoop with OBIEE 11g

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

DW 2013: The Mixed Architecture with Federated Queries

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Oracle Business Analytics and Big Data Sources

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

OBIEE 11g and Hadoop/Big Data Access

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Importing Hadoop/Hive Metadata into RPD

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Set up ODBC Connection at the OBIEE Server

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Leveraging Hadoop with OBIEE 11g

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Dealing with Hadoop / Hive Latency Option 1 : Exalytics

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Oracle Exalytics In-Memory Machine

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Exalytics as the Query Performance Enhancer

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Summary Advisor for Aggregate Recommendation & Creation

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Running Some Sample Hadoop / Hive Queries

select avg(T44678.age) as c1,

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Generate Aggregate Recommendations using Summary Advisor

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Implement Recommendations, Review Updated RPD

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Re-run Reports, now with TimesTen for Exalytics Acceleration

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Dealing with Hadoop / Hive Latency Option 2 : Use Impala

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

How Impala Works

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Connecting OBIEE 11.1.1.7 to Cloudera Impala

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Importing Impala Metadata

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or

Importing RPD using Impala Metadata

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or