Vous êtes sur la page 1sur 19

Learn

How to Query Across


Oracle, Data Lakes and Kafka with
Oracle SQL Solutions
Geeky show me session
Marty Gubar
Big Data SQL PM
Oracle Corporation
October 2018

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, timing and pricing of any
features or functionality described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Big Data SQL Goals

Use Oracle SQL from Any Application


Make Access to All Data Transparent

Deliver Fast Query Performance


Leverage Scale Out Processing

Safeguard Sensitive Data


Use the Most Advanced Security Policies for All Data

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3
Big Data SQL Architecture
• Any application that queries
REST Python node.js SQL Java
R Graph
Oracle Database enhanced
Oracle Database – Seamlessly query external stores
Big Data SQL – Oracle Database Big Data SQL-enabled

• Scale-out, data local processing


Hive – Big Data SQL Cells deployed to Hadoop
Streaming

Big Data SQL Cells


Metadata cluster
Kafka

HDFS – Fan-out data local processing

Hadoop Data Lake


• Uses shared Hadoop metadata
– Hive metastore captures data structure and
location

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 4


Query Server: SQL on Hadoop
Big Data SQL 4.0
• An Oracle query engine deployed to a
Hadoop Cluster
REST Python node.js SQL R Graph Java
• Simple, zero maintenance
Oracle Database – Uses Hive metadata and Hadoop
Big Data SQL authorization
– Oracle data not saved to Query Server

Hive Big Data SQL • Included with Big Data SQL license
Streaming

Metadata Cells Big Data SQL


Kafka

– Limited use Oracle Database license


Query Server
HDFS
• Use in addition to Big Data SQL-
Data Lake enabled Oracle Database

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 5
New: Object Store Support
Big Data SQL 4.0
• Support data captured in
Oracle Database object stores
Oracle Big Data SQL – Oracle Object Storage, Amazon S3,
Azure Blob Storage
• Use new ORACLE_BIGDATA
driver
– Optimized C-mode driver support
– Support text, parquet, avro, json
Limitless, highly available,
economical storage

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Big Data SQL Performance Features
IO Reduction Features Deliver Compound Results

1 2 3

User Query Partition Pruning Storage Indexing Predicate Pushdown


100 TB 10 TB 1 TB 100 GB

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 7


Performance Breakthroughs
Big Data SQL 4.0

• Significant performance enhancements with distributed aggregation


– Utilize processing of Hadoop compute nodes for massive query acceleration (sum,
min, max, avg, count)
– Single table, multi-table joins
• Optimized C-Drivers for Text, Parquet, Enterprise Parquet and Avro
• Cell-based JSON processing for CLOBs

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 8
Aggregation Offload: Major Performance Breakthrough
40.00
34.85
Single table Count(*)
Elapsed (sec)

20.00
5.38 SELECT COUNT(*)
0.00 FROM store_sales
OFF ON

250.0 206.0 Single table: Add columns + Group By:


Elapsed Seconds

200.0 159.0
150.0 124.2 SELECT ss_store_sk
100.0 OFF sum(ss_wholesale_cost),
50.0 8.9 10.5 14.3 ON sum(ss_list_price)
0.0
1 2 4 FROM store_sales
# of SUM columns GROUP BY ss_store_sk

300.0 256.8
Multi-table: Join fact to dimension table
Elapsed Seconds

250.0
181.1 SELECT d_dom
200.0 151.5
150.0 OFF sum(ss_wholesale_cost),
100.0 ON sum(ss_list_price)
50.0 17.1 11.4 15.2
0.0
FROM store_sales, date_dim
1 2 4 WHERE ss_sold_date_sk=d_date_sk
GROUP BY d_dom
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 9
Securing Access to Data
• Support Source Security Rules Single User Application LDAP / DB Support Varied
Users Users Application
– Use access privileges defined on HDFS Employee Dir My HR Direct Access
Authentication
Methods
sources with multiuser authorization
• Extend Protection with Advanced Oracle Big Data SQL Add Oracle
Advanced
Oracle Security Policies Salary Emp security options

– Redaction
– VPD
Salary Emp Automatically
– Database Vault use ACLs on
Hadoop Cluster protected files
– Database Security Assessment Tool

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 10
Save Money: Archive Data to “Cold” Partitions
Run Smart Scan over Tablespaces on HDFS

HDFS: Oracle Database 12c:

HDFS Data Node


TABLESPACE: HOT
HDFS Data Node
BDS Server
HDFS Data Node
BDS Server TABLE: ORDERS
HDFS Data Node
BDS Server


HDFS Data Node
BDS Server
HDFS Data Node
BDS Server JAN 2014 FEB 2014 MAR 2014 OCT 2016 NOV 2016 DEC 2016

BDS Server Cell

Tablespace
COLD TABLESPACE: COLD
TABLE: ORDERS

JAN 2014 FEB 2014 MAR 2014

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 11
Show Me!
Analyze NYC Citi Bikes usage using an end-to-end workflow

• NYC Citi Bikes is a bike sharing


program where customers pick up and
drop off bicycles at fixed stations; the
pickup and drop-off locations do not
need to be the same (a potential
logistics challenge)

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 12


Show Me! Agilely Extend the Warehouse
• Combining Citi Bikes activity data
with weather data, you will
answer questions like:
Oracle Database – Who is using bikes?
Weather Customer Subscriptions Inventory …
– Where are they going?
– How much time do they spend
riding? And under what conditions?
Streaming

Hive Metadata Partitioned Trip Table


Kafka

– Are bikes optimally distributed


Stations
Trip History HDFS (JSON) across stations?
Trip Activity Hadoop Data Lake – How do we ensure that the right
bicycle inventory is deployed to
various stations?

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 13


Show Me! Agilely Extend the Warehouse

1. Add Big Data SQL Tables


– Over complex data in Hadoop
Oracle Database – Over Kafka Streams
Trip Stream Weather Trips Station
2. Query across sources
Trip Activity
Stream 3. Improve Performance
Hive Metadataz Partitioned Trip Table
4. Secure
Streaming

Stations
Kafka

Trip History HDFS (JSON)


– Row level security
Hadoop Data Lake – Redaction

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 14


Show Me! Agilely Extend the Warehouse
1. Create/Query table over JSON
data (Stations)
2. Mini-ETL: Load database table
Oracle Database
Weather Customer Subscriptions Inventory … 3. Create/Query Hive table (Bike
Trips)
4. Create MVs
Streaming

Hive Metadata Partitioned Trip Table


Kafka

Stations 5. Create/Query Kafka Stream of


Trip History HDFS (JSON) Current Bike Activity
Trip Activity Hadoop Data Lake 6. Apply Row-Level Security
7. Redact Data
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 15
Now show me!

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 16
http://cloudcustomerconnect.oracle.com

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


For More Information
• Big Data Lite on OTN
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-
bigdatalite-2104726.html
• Blog Posts: Data Warehouse Insider
https://blogs.oracle.com/datawarehousing/the-data-warehouse-insider
• Blog Posts: Big Data SQL
https://blogs.oracle.com/datawarehousing/big-data-sql-2

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | For Oracle Employees and authorized partners only
Query Server: Manage Using Hadoop Cluster Mgmt Tools

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 19

Vous aimerez peut-être aussi