Vous êtes sur la page 1sur 16

Week 1 Unit 1:

Introduction to Data Science


Introduction to Data Science
The next 6 weeks

What to expect in the


next 6 weeks?

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 2


Introduction to Data Science
Curriculum flow (weeks 1-3)

Business & Data


1 Understanding 2 Data Preparation 3 Modeling (1)

Introduction to Data Science Data Preparation Phase Modeling Phase Overview


Introduction to Project Overview Detecting Anomalies
Methodologies Predictive Modeling Association Analysis
Business Understanding Methodology Overview Cluster Analysis
Phase Overview Data Manipulation Classification Analysis with
Defining Project Success Selecting Data Variable and Regression
Criteria Feature Selection
Data Understanding Phase Data Encoding
Overview
Initial Data Analysis &
Exploratory Data Analysis
Weekly Weekly Weekly
Assignment Assignment Assignment

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 3


Introduction to Data Science
Curriculum flow (weeks 4-6)

Deployment &
4 Modeling (2) 5 Evaluation 6 Maintenance
Classification Analysis with Evaluation Phase Overview Deployment Phase
Decision Trees Model Performance Metrics Overview
Classification Analysis with Model Testing Deployment Options
KNN, NN, and SVM Improving Model Monitoring & Maintenance
Time Series Analysis Performance Automating Deployment &
Ensemble Methods Maintenance
Simulation & Optimization Myths & Challenges
Automated Modeling Data Science Applications
and References

Weekly Weekly Weekly


Assignment Assignment Assignment

Final Exam

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 4


Introduction to Data Science Watch the
Cumulative points lead to record of achievement deadlines!

Participate in Weekly Final Exam Record of


Assignment (Weeks1-6) (Week 7) Achievement

6 assignments When results above


180 points 180 points
6 x 30 = 180 points

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 5


Introduction to Data Science
What is data science?

Data science is an
interdisciplinary field about
processes and systems that
enable the extraction of
knowledge or insights from
data.
Data science employs
techniques and theories
drawn from a wide range of
disciplines.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 6


Introduction to Data Science
Data science personas

SAP HANA
Data Analysts / Citizen
Business Users Data Scientists Application
Data Scientists
Developers
Analytics skills from low to high
Business User / Data Analyst Custom Embedded
Embedded Analytics
Driven Analytics Analytics Analytics

SAP Suite / Application Innovation / Industry / LoB / CDP SAP Hybris Marketing, IoT Predictive Maintenance,
Fraud
Application Function
SAP Predictive Analytics Modeler (AFM)

Predictive in SAP HANA PAL, APL, R, AFLs e.g. UDF, OFL

Data Science Solutions from SAP

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 7


Introduction to Data Science
Data science solutions from SAP

SAP
SAP SAP HANA SAP RDS Partner
Industry &
Predictive SAP Lumira Studio / Analytics Analytical
LoB
Analytics AFM Solutions BI & Tools
Solutions

SAP HANA
Predictive Analysis Business Function Automated
Simulation Optimization
Library (PAL) Library Predictive Library
R
Text Analysis and
Text Search Spatial Analysis Graph Engine Rules Engine
Mining

3rd Party Data SAP Data Data


SAP IQ HADOOP SAP ESP Connectors
Source Services

Data types
Connect to SAP HANA directly or via Sybase IQ / Hadoop / ESP / Data Services

Transaction Unstructured Real-Time Location Machine


Others
Data Data Data Data Data

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 8


Introduction to Data Science
SAP HANA Predictive Analysis Library (PAL)

Build High-Performance SAP HANA


Hadoop / Sybase IQ, KNN Regression
Predictive Apps
Sybase ASE, Teradata classification
Main Memory
The SAP HANA Predictive C4.5
K-means
Analysis Library (PAL) is a Virtual decision
built-in C++ library for Tables SQLScript ABC tree
Optimized classification Association
performing in-memory data Query Plan Weighted analysis:
mining and statistical score tables market
Spatial, Machine,
calculations. Real-Time Data Text PAL basket
Analysis
PAL is designed to provide R Scripts R Engine
high performance on large
datasets for real-time Spatial Unstructured
Data
analytics. SAP HANA Studio/AFM,
Apps & Tools

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 9


Introduction to Data Science
SAP HANA Predictive Analysis Library (PAL) algorithms

SAP HANA Predictive Analysis SAP HANA Predictive Analysis Library


Library (PAL) contains a wide range of
Association Analysis
algorithms that can be deployed for
Classification Analysis
in-HANA and standalone data science
applications. Regression
Cluster Analysis
A wide range of algorithms are
Time Series Analysis
available for the following types of
analysis: Probability Distribution
Outlier Detection
Link Prediction
Data Preparation
Statistic Functions (Univariate)
Statistic Functions (Multivariate)

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 10


Introduction to Data Science
SAP HANA Automated Predictive Library (APL) algorithms

SAP HANA APL is an application


function library (AFL) that lets you Classification Clustering
use the data mining capabilities of Models Models
the SAP Predictive Analytics
automated analytics engine on
your customer datasets stored in Regression
SAP HANA. Models Time Series
APL Analysis
You can create a wide range of
models to answer your business
questions.
Social Network Recommendation
Analysis

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 11


Introduction to Data Science
R integration for SAP HANA and standalone

Application
R

SAP HANA Database


SQL Interface
R

Calculation Engine
R
Rserve
Trigger R
Font
R R
R Operator
Client
Write Rserve
Rserve
R R Runtime

Results

Tables Tables

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 12


Introduction to Data Science
SAP Predictive Analytics

SAP Predictive Analytics is built for both


data scientists and business / data
analysts, making predictive analytics
accessible to a broad spectrum of
users.
Automated and expert modes
Used to automate data preparation,
predictive modeling, and deployment
tasks
Rich pre-built modelling functionality
PAL, APL, and R language support
Advanced visualization
Native integration with SAP HANA

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 13


Introduction to Data Science
Application function modeler (AFM)

Graphical tool to build advanced


applications in SAP HANA
Web-based flow-graph editor
Support for AFL, R, SDI, & SDQ
Used to create procedures or task
runtime operations
Interoperability with SAP HANA studio
AFM

SAP HANA studio-based AFM


PAL function support including time
series, clustering, classification, and
statistics
General usability enhancements for an
easier, simpler, and more functional
experience

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 14


Thank you

Contact information:

open@sap.com
2016 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.

Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.

National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 16

Vous aimerez peut-être aussi