Académique Documents
Professionnel Documents
Culture Documents
Full Range of
Analytics
Integrated
Analytic Apps
On Premise,
On Cloud,
On Mobile
Your Data :
Decisions based on
your data
Big Data :
Decisions based on
all data relevant to you
Transactions
Documents
& Social
Data
Machine-Generated
Data
Oracle Big Data Appliance - Engineered System for Big Data Acquisition and Processing
Cloudera Distribution of Hadoop
Cloudera Manager
Open-source R
Oracle NoSQL Database Community Edition
Oracle Enterprise Linux + Oracle JVM
Oracle Big Data Connectors
Oracle Loader for Hadoop (Hadoop > Oracle RBDMS)
Oracle Direct Connector for HDFS (HFDS > Oracle RBDMS)
Oracle Data Integration Adapter for Hadoop
Oracle R Connector for Hadoop
Oracle NoSQL Database (column/key-store DB based on BerkeleyDB)
ODI is the data integration tool for extracting data from Hadoop/MapReduce, and loading
into Oracle Big Data Appliance, Oracle Exadata and Oracle Exalytics
Oracle Application Adaptor for Hadoop provides required data adapters
Load data into Hadoop from local filesystem,
or HDFS (Hadoop clustered FS)
Read data from Hadoop/MapReduce using
Apache Hive (JDBC) and HiveQL, load
into Oracle RDBMS using
Oracle Loader for Hadoop
Supported by Oracle s Engineered Systems
Exadata
Exalytics
Big Data Appliance (w/Cloudera Hadoop Distrib)
Hadoop Cluster
MapReduce
Hive Server
HiveQL
Oracle RDBMS
ODI 11g
OBIEE 11g, and other Oracle Business Analytics tools, can also make use of big data sources
Oracle Exalytics, through in-memory aggregates and InfiniBand connection to Exadata, can analyze vast (structured)
datasets held in relational and OLAP databases
Endeca Information Discovery can analyze unstructured and semi-structured sources
InfiniBand connector to Big Data Applicance + Hadoop connector in OBIEE supports analysis via Map/Reduce
Oracle R distribution + Oracle Enterprise R supports SAS-style statistical analysis
of large data sets, as part of
Oracle Advanced Analytics Option
OBIEE can access Hadoop
datasource through another
Apache technology called Hive
Opportunities for OBIEE and ODI with Big Data Sources and Tools
What is Hadoop?
What is HDFS?
The filesystem behind Hadoop, used to store data for Hadoop analysis
Unix-like, uses commands such as ls, mkdir, chown, chmod
Fault-tolerant, with rapid fault detection and recovery
High-throughput, with streaming data access and large block sizes
Designed for data-locality, placing data closed to where it is processed
Accessed from the command-line, via internet (hdfs://), GUI tools etc
MapReduce jobs are typically written in Java, but Hive can make this simpler
Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
Approach used by ODI and OBIEE
to gain access to Hadoop data
Allows Hadoop data to be accessed just like
any other data source (sort of...)
Hive Driver
(Compile
Optimize, Execute)
Metastore
HDFS
Managed Tables
External Tables
/user/hive/warehouse/
/user/oracle/
/user/movies/data/
Map
Task
Map
Task
Map
Task
GROUP BY a
Reduce
Task
Reduce
Task
Result
=
=
=
=
0%, reduce =
100%, reduce
100%, reduce
100%, reduce
0%
= 0%
= 33%
= 100%
OBIEE and ODI Access to Hive, Leveraging MapReduce with no Java Coding
You can download your own Hive binaries, libraries etc from Apache Hadoop website
Or use pre-built VMs and distributions from the likes of Cloudera
Cloudera CDH3/4 is used on Oracle Big Data Appliance
Open-source + proprietary tools (Cloudera Manager)
Other tools for managing Hive, HFDS etc including
Hue (HDFS file browser + management)
Beeswax (Hive administration + querying)
Other complementary/required Hadoop tools
Sqoop
HDFS
Thrift
Demonstration
Simple Data Selection and Querying using Hive on Cloudera CDH3
ODI + Big Data Examples : Providing the Bridge Between Hadoop + OBIEE
for example...
/usr/lib/hive/lib/*.jar
/usr/lib/hadoop-0.20/hadoop-*-core*.jar,
/usr/lib/hadoop-0.20/Hadoop-*-tools*.jar
Copy JAR files into userlib directory and (standalone) agent lib directory
c:\Users\Administrator\AppData\Roaming\odi\oracledi\userlib
Registering HDFS and Hive Sources and Targets in the ODI Topology
Demonstration
ODI 11.1.1.6 Configured for Hadoop Access, with Hive/HFDS source and targets registered
Oracle technology for accessing Hadoop data, and loading it into an Oracle database
Pushes data transformation, heavy lifting to the Hadoop cluster, using MapReduce
Direct-path loads into Oracle Database, partitioned and non-partitioned
Online and offline loads
Key technology for fast load of
Hadoop results into Oracle DB
IKM File to Hive (Load Data): Loading of Hive Tables from Local File or HDFS
OK
Time taken: 0.341 seconds!
IKM File to Hive (Load Data): Loading of Hive Tables from Local File or HDFS
IKM Hive Control Append: Loading, Joining & Filtering Between Hive Tables
IKM Hive Transform: Use Custom Shell Scripts to Integrate into Hive Table
to transform data
programmatically using
Python, Perl etc scripts
Options to map output
of script to columns in
Hive table
Useful for more
programmatic and complex
data transformations
Demonstration
Data Integration Tasks using ODIAAH Hadoop KMs
No specific technology or driver for NoSQL databases, but can use Hive external tables
Requires a specific Hive Storage Handler for key/value store sources
Hive feature for accessing data from other DB systems, for example MongoDB, Cassandra
For example, https://github.com/vilcek/HiveKVStorageHandler
Additionally needs Hive collect_set aggregation method to aggregate results
Has to be defined in Languages panel in Topology