Vous êtes sur la page 1sur 43

Oracle Autonomous Database

What every DBA should know

Sandesh Rao
VP - Autonomous Database Health & Machine Learning
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 1
Confidential – Oracle Restricted

Safe Harbor Statement


The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 2


C
o
n
f
i Theme
d
e
n 1. Tools or features which provide some function
t
i 2. Automation around some of these tools or features
a
l 3. Components or products which use machine learning to solve some use-cases

O
4. Additional ML tools which can be used on 1,2 or the results of 3 to develop different
r outcomes
a 1. People who know DataScience
c
l
2. People who want to use it – prebuilt models
e
R
e
s
t
r
i Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 3
Agenda

1 Journey to Autonomous Database


2 Machine learning basics & use cases

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 4


Oracle’s Vision for Autonomous Database
• Self-Driving
–User defines service levels, database makes them happen

• Self-Securing
–Protection from both external attacks and malicious internal users

• Self-Repairing
–Automated protection from all downtime Autonomous
Database

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 5


Journey to Autonomous Database
• Oracle has been developing sophisticated database automation for decades

Oracle Database 9i, 10g Oracle Database 11g, 12c


• Automatic Storage Management (ASM) • Automatic SQL Tuning
• Automatic Memory Management • Automatic Workload Replay
• Automatic DB Diagnostic Monitor (ADDM) • Automatic Capture of SQL Monitor
• Automatic Workload Repository (AWR) • Automatic Data Optimization
• Automatic Undo tablespaces • Automatic Storage Indexes
• Automatic Segment Space Management • Automatic Columnar Cache
• Automatic Statistics Gathering • Automatic Diagnostic Framework
• Automatic Standby Management (Broker) • Automatic Refresh of Database Cloning
• Automatic Query Rewrite • Autonomous Health Framework

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 6


Database Operations Runtime Management
Prevention and Recovery Pillars
• Solving these challenges requires a holistic approach
– Prevent problems and optimize solutions in real-time
– Recover from failures and identify root cause quickly with minimal intervention
• Human reactions too late and do not scale
• Manual triage and floods of notifications do not scale
• Applied Machine learning techniques effectively respond in real-time and
without huge impact to operations

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Restricted 7
Journey to Autonomous Database
• Cloud enables Oracle to deliver a Fully Autonomous Database
– Expanded Database Automation
– Integrated with complete infrastructure automation
– With additional automation for operations, HA, security, etc.

Autonomous
Database

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8


One Autonomous Database – Optimized by Use Case

2017 2018 Now


Enterprise
Data OLTP, Departments,
Warehousing Mixed Developers
Workloads

Oracle Autonomous Database

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 9


Autonomous Database Cloud For Data Warehouse
• Easy
– Automatically optimizes Analytic workloads
– Simply “load and go”
– Database tunes itself - No need to define indexes, partitions, materialized views, etc.
– Works with any BI analytics tool

• Fast
– Based on Exadata technology
– Performance matches or exceeds most hand-tuned Data Warehouses

• Elastic
– Instant scaling of compute or storage with no downtime
– Pay for compute when in use only
Expected CY 2017

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 10


Autonomous Database Cloud For OLTP or Mixed Workloads
• Easy
– Configured for Mission Critical workloads
• Full Maximum Availability Architecture with scale-out clustering and disaster recovery
– Or Configured for Low Cost
• Single server for non-critical workloads or test/dev
• Fast
– Based on Exadata technology
• Elastic
– Instant scaling of compute or storage with no downtime

Expected CY 2018

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 11


Full End-to-End Automation Autonomous
• Must automate a large number of tasks Database
– Setup and provision software using Gold images
– Provisioning scale-out clusters and disaster recovery automatically
– Switchovers and failovers with defined parameters
– Patching, upgrading, and backing up online using RAC, ASM and Clusterware
– Monitoring, scaling, diagnosing performance
– Tuning, optimizing and using new ATO features
– Testing and change management of complex applications and workloads
– Automatically handling failures and errors – log file lifecycle management
– Isolation and multitenant setup using Container Databases
– Infrastructure advantages like Containers (Docker , Kubernetes ) for app deployment

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 12


Trace File Analyzer
Smart Collection
• Always on – Enabled by default
• Has improved comprehensive first failure
diagnostics collection
• Efficiently collects, packages and transfers
diagnostic data to Oracle Support
• Reduces round trips between Customers
and Oracle
• Transfers data to centralized storage for
detailed analysis with TFA Service
• Supports Database 10.2 and above
• Included since 11.2.0.4 and 12.1.0.2 and
updated in patchsets & PSUs

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 13


Autonomous Usage
Distributed diagnostics
3 are consolidated and
Diagnostics 2 packaged
are collected

TFA
Oracle Grid Infrastructure Oracle Support
& Databases

5 Diagnostic collection is
uploaded to Oracle
1 4 Support for root cause
TFA detects a fault Notification of fault is sent analysis & resolution

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 14


Faster & Easier SR Data Collection
tfactl diagcollect –srdc <srdc_type> -sr <SR#>
Type of Problem SRDC
Type of Problem SRDC Database storage • asm
• ORA-00020 • ORA-04031
Excessive SYSAUX Space used by the Automatic • dbawrspace
• ORA-00060 • ORA-07445
Workload Repository (AW R)
• ORA-00600 • ORA-27300
ORA Errors • ORA-00700 • ORA-27301 Database startup / shutdown • dbshutdown

• ORA-01555 • ORA-27302 • dbstartup


• ORA-01628 • ORA-30036 Data Guard • dbdataguard
• ORA-04030
Enterprise M anager tablespace usage metric • emtbsmetrics
Other internal database errors • internalerror
• emdebugon
Database performance • dbperf Enterprise M anager general metrics page or
• emdebugoff
• dbpatchinstall threshold problems - Run all three SRDCs
Database patching • emmetricalert
• dbpatchconflict
• emcliadd
Database resource • dbunixresources
• emclusdisc
XDB installation or invalid object • dbxdb
Enterprise M anager target discovery / add • emdbsys
• dbinstall • emgendisc
Database install / upgrade • dbupgrade • emprocdisc
• dbpreupgrade Enterprise M anager OM S restart • emrestartoms

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 15


Oracle RAC 18c
TFA Service
• TFA Service is set up as part of DSC setup
– Runs on first node of a cluster Oracle Domain Services Cluster
• Web admin account is locked at start Management
Service
TFA
Service
RHP
Service
ACFS
Services
ASM
Service
IO Service

– To unlock:
Shared ASM
tfactl receiver reset webadmin

• For general info tfactl receiver info

$ tfactl receiver info


TFA Service URL : http://mys66:7070/tfa/index.html
TFA Service URL (https) : https://mys66:7071/tfa/index.html
TFA Service Admin User : admin
TFA Service Admin Status : active
TFA Service Repository : /scratch/app/oragrid/tfa/repository
TFA Service Port : 7001
TFA Service Members :

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 16


Domain Services Cluster Installation steps

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |


Oracle RAC 18c
TFA Service – Cluster / Host Health View
View cluster heat View timeline
map to see of important
potential issues events
and drill into 1 4
host level

View frequency of
events 2 5
Choose between
component health
or utilization
View recent TFA
Collections
3

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18


Oracle RAC 18c
TFA Service – Cluster / Host Utilization View
View heat map for
utilization hotspots
1

Hover on a
section to see
3 more
information
View utilization
graphs
2

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 19


Why Oracle ORAchk & EXAchk
Automatic proactive warning Health checks for most impactful Runs in your environment
of problems before they reoccurring problems with no need to send
impact you anything to Oracle

Get scheduled health reports Findings can be integrated


sent to you in email Engineered EXAchk
into other tools of choice
Systems
Common Framework
Non
Engineered ORAchk
Systems
Further slide details
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 20
Upgrade to Database 12.2 with confidence
• New checks to help when upgrading the database
to 12.2
• Both pre and post upgrade verification to prevent
problems related to:
• OS configuration
• Grid Infrastructure & Database patch prerequisites
• Database configuration
• Cluster configuration
Pre upgrade orachk -u –o pre

Post upgrade orachk -u –o post

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |


Journey to Autonomous Database Cloud • Automated
repair
Autonomous Health via Machine Learning
• Preemptive fault
prediction & • Automated
correction environment
• Real-time Health Monitoring correlation for
of compliance, performance, fault prioritization
availability & capacity • Automated & flood control
workload
• Automated log
2018+ forecasting
lifecycle management

• Automated analysis &


• Automated Health Checks
• Log masking, reduction &
Anomaly detection 2017
diagnostic collections 2016
2015
2014 • Automated & targeted diagnostic
collections (50+ top areas & growing)
• Integration of database
support tools

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Gartner OPDBMS Vendor Briefing
Machine Learning Use Cases

1 Machine learning basics


2 Log reduction & Anomaly timeline
3 Maintenance slot identification
4 Detect Performance Problems
5 Problem Signatures from Event Paths

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23


3 Key Areas of Machine Learning

Analytics Machine Learning Artificial Intelligence


Knowledge discovery Learn & get better from Simulate human
experience intelligence

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24


Examples of Machine Learning Problem Types

Classifiers Example: Classify if a particular log entry is


Predict a label classification normal or not

Regression Example: Predict when a system will run out


Predict a value of memory

Clustering Example: Group incidents into collections of


Form groups by discovering
similar ones, that share some common
reoccurring patterns attributes

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 25


Machine Learning Categories
Supervised Learning Semi-Supervised Learning
Predict future outcomes with the help of Discover patterns within raw data and make
training data provided by human experts predictions, which are then reviewed by human
experts, who provide feedback which is used to
improve the model accuracy

Unsupervised Learning Reinforcement Learning


Find patterns without any external input other Take decisions based on past rewards for this
than the raw data type of action

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26


Autonomous Health Platform ML Technologies
Real-time Prevention Rapid Recovery
• Data Ingestion • Data Ingestion
– Kernel Smoothing and Moving Average – ELK
– Interpolation and Imputation – Lucene
• Prediction and Pattern Recognition • Prediction and Pattern Recognition
– Multivariate and Auto-Associative Regression – TF-IDF and Bag-of-Words modelling
– Clustering, Similarity Operators and Bayes Networks – Sequence Matcher
– K-nearest Neighbour
• Fault and Anomaly Detection
– Sequential Probability Ratio Tests • Fault and Anomaly Detection
– Conditional Probability Filters & Hidden Markov – Decision Trees and Random Forest
Models – Sequential Pattern Mining
• Prognosis and Diagnosis • Prognosis and Diagnosis
– Bayesian Belief Networks and Probabilistic Inference – Recurrent neural Network
– Remaining Useful Life Regression and GPM Models – Long short-term memory Predictive Analysis

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Restricted 27
Log reduction & Anomaly timeline

Remove the noise from thousands of log


events and metrics to identify key events
revealing what happened, in what order
and why

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 28


Anomaly Detection – High Level Known normal log entry (discard)
Probable anomalous Line (collect)
Anomaly Timeline

File
Type
1

Log File
Log
Collection File
Type
2

Probable
Anomalies
File
Type
n..

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 29


Autonomous Health Analysis - Ex: Trace File Analyzer

Auto
Recommendation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Restricted
Autonomous Health – TFA Anomaly Timeline

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32


Confidential – Oracle
Maintenance slot identification

Find the next best window of time


maintenance can be performed
with minimal service impact

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 33


Maintenance slot identification
• Use case
– Identify appropriate maintenance window for performing maintenance activity based
on historical workload patterns.
• Inputs (Training Data)
– The Average Active Sessions (metric is important because it's best representation of
your database system load) in sliding window format. Preferred last 30days data
points before making the prediction.
• AAS = (DB Time / Elapsed Time)
• In other words, AAS is a time-normalized DB Time
• From DB Tables :
– V$ACTIVE_SESSION_HISTORY => COUNT(*) = DB Time in seconds {Cyclic buffer ~4 Hours}
– DBA_HIST_ACTIVE_SESS_HISTORY => 10 * (COUNT(*)) = DB Time in seconds {Since one in 10 samples}

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |


Maintenance slot identification
• Seasonal Decomposition
– Using an observed time series extract a number of component series where each of
these has a certain characteristic or type of behavior.
– Time Series Decomposition
• Trend
– The trend component at time t, which reflects the long-term progression of the series
– A trend exists when there is a persistent increasing or decreasing direction in the data
• Seasonality
– The seasonal component at time t, reflecting seasonality
– Seasonality occurs over a fixed and known period (e.g., the quarter of the year, the month, or day of the
week)
• Residual
– The irregular component (or "noise") at time t, which describes random, irregular influences
– It represents the residuals or remainder of the time series after the other components have been removed.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |


Maintenance Slot Identification
1 Original observation data 2 Apply convolution filter & average 3 Calculate seasonality
START_TIME CNT START_TIME START_TIME
2018-04-11 15:00:00 290 2018-04-11 15:00:00 5.669881 2018-04-11 15:00:00 -0.226098
2018-04-11 16:00:00 31120 2018-04-11 16:00:00 10.345606 2018-04-11 16:00:00 -0.069821
2018-04-11 17:00:00 21530 2018-04-11 17:00:00 9.977203 2018-04-11 17:00:00 -0.350088
2018-04-11 18:00:00 26240 2018-04-11 18:00:00 10.175040 2018-04-11 18:00:00 -0.187483
2018-04-11 19:00:00 40520 2018-04-11 19:00:00 10.609551 2018-04-11 19:00:00 -0.513240
2018-04-11 20:00:00 54270 2018-04-11 20:00:00 10.901727 2018-04-11 20:00:00 0.019737
2018-04-11 21:00:00 51460 2018-04-11 21:00:00 10.848560 2018-04-11 21:00:00 0.059213
2018-04-11 22:00:00 44310 2018-04-11 22:00:00 10.698966 2018-04-11 22:00:00 -0.011312
2018-04-11 23:00:00 25690 2018-04-11 23:00:00 10.153857 2018-04-11 23:00:00 -0.179156

Current Date : 2018-05-12 15:00:00


Use seasonality to
Current Position in Seasonality : -0.22609829742533585
4 predict best Best Maintenance Period in next Cycle : 2018-05-12 19:00:00
maintenance window Worst Maintenance Period in next Cycle : 2018-05-13 08:00:00

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 36


Anomaly Detection with OS and ASH Data

Detect performance
problems

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 37


Cluster Health Advisor – Applied Machine Learning
Discovers Potential Cluster & DB Problems

• Fault data driven model development


CHA
• Applied purpose-built Applied ML for Feedback
knowledge extraction
• Expert Dev team scrubs data CHA

• Generates Bayesian Network-based Lo g


ASH
Me
tric
s CHA Dev Team
diagnostic root-cause models
ML
• Uses BN-based run-time models to Knowledge
Extraction
perform real-time prognostics CHA
Runtime
Expert
Supervision M odel
Scrub Data
BN
Models

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 38


Cluster Health Advisor
Data Flow Overview

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 39
Models Capture all Normal Operating Modes
Models Capture the Dynamic Behavior of all Normal Operation
400 00 In-Memory Reference Matrix
350 00
4900
(Part of “Normality” Model)
800
300 00 #### 2500 4900 800 ####
4400 IOPS
250 00 21000
IOP S
User Commits #### 10000 21000 4400 ####
use rco mmi ts (/sec)
200 00 2500
l og fil e para l el wri te (u sec) Log File Parallel #### 2350 4100 22050 ####
22050 Write
150 00
10000
l og fil e sync (use c)
4100 Log File Sync #### 5100 9025 4024 ####
100 00
2350
500 0 9025 … … … … … …
5100 4024
0
10: 00 2: 00 6: 00

• Release ships with conservative models to minimize false warnings


• A model captures the normal load phases and their statistics over time, and thus the characteristics for all load
intensities and profiles. During monitoring, any data point similar to one of the vectors is NORMAL.
• One could say that the model REMEMBERS the normal operational dynamics over time

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 40


Problem Signatures from Event Paths

Identify a series of events as


connected and representing the
signature of a problem

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 41


Longest Common Subsequence of Anomalous Entries
1. Start by classifying a problem such as an important ORA
or CRS error
2. Find occurrences of the problem across many different
log files
3. Identify anomalous entries and lifecycle events in
chronological order
4. Compare the repeating anomalous entries to identify the
true anomalous entries
– These represent the problem signature
– Sequence of events are correlated by component, log file, host &
thread
Find the Finite State Automata(FSA)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 42


Generalizing event signatures over the scope of bug
Node Eviction bug Node Eviction bug
New Signature 243645 Timeline 2747747 Timeline
Event Event
Signature 35 Signature 3434
Event Signature
35
Event Event
Event Signature Signature 3435 Signature 3435
3435 Check for weighted
probabilistic match Event

• Bug Signature
Event
Event Signature Signature 494 Signature 4344
3048

Repository
Event Signature Event Event
3948 Signature 3948 Signature 3048

Event Signature Event Event


292 Signature 292 Signature 202

Event Signature Event Signature Event Signature


434933 434933 434983

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 43


Questions ?
• Thank you for your feedback!!
• Please continue to reach out to us via
social media

Twitter @sandeshr
Linkedin
https://www.linkedin.com/in/raosandesh/

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Vous aimerez peut-être aussi