Vous êtes sur la page 1sur 17

Data Science And

Big Data Analytics


2

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Agenda

Research Group
Administrative Issues
Content and Aims

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Research Group: Software Engineering


for Distributed Systems

Logo indicates workbench for


systematic software
development and
continuous quality assessment
and quality improvement

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Current Research Topics

Social Network Analysis


Spread of information in social
communities

Quality Assurance (QA)


Managed Software Evolution
Usage based testing
Usability Engineering
Test Languages
TTCN-3, UML Testing Profile
QA for test specifications
Interested in these
topics? Contact us
for students projects,
B.Sc., M.Sc., or
Ph.D. theses.

Interoperability of Grid and Cloud


systems
Reliability of Cloud systems

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Research Group
Scientific staff:
Prof. Dr. Jens Grabowski
Dr. Steffen Herbold
M.Sc. Xiao-Wei Wang
Dipl. Math. Verena Herbold
Dipl.-Inf. Daniel Honsel
M.Sc. Ella Albrecht

Dr. Patrick Harms


M.Sc. Fabian Glaser
M.Sc. Michael Gttsche
M.Sc. Fabian Trautsch

Students projects, Bachelor and Master theses (usually 4-8 students)


Supported by
Annette Kadziora

Dipl.Ing (FH) Gunnar Krull

Web:
http://www.swe.informatik.uni-goettingen.de
2

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Duties of the Research Group

Prof. Jens Grabowski


Dean of Students
Vice Director of the Institute of Computer Science
Speaker of the Ph.D. program for Computer Science

Organization of the students library


Fabian Glaser, Hanna Holderied

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Teaching offered WiSe 2015/2016

Lecture: Data Science and Big Data Analytics


Lecture: Software Testing
Lecture: Software Technik I (B.Sc. only)
Practical course: Software Testing (block course)
Seminar: Advanced Topics in Software Engineering
Seminar: Technologies and Design of Graphical User Interfaces

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

(Planned) Teaching in SoSe 2016

Lecture: Software Evolution


(Prof. Dr. Jens Grabowski)

Lecture: Requirements Engineering


(Dr. Steffen Herbold)

Seminar: Advanced Topics in Software Engineering


(whole group)

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Administration
Time and Place
Lecture
Tuesday, 14:15-15:45 oclock (s.t.)
Room: Ifi 0.101

Exercise
Thursday, 13:15-14:45 oclock (s.t.)
Room: Ifi -1.101

Kind of Course, ECTS


Lecture in the M.Sc. for Applied Computer Science
5 ECTS
M.Inf.1151.Mp: Data Science und Big Data Analytics

Examination
Written exam at the end of the semester
Precondition for participation in the exam
Passing the exercise
2

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

Administration

Announcements and course materials are distributed via Stud.IP


Course material
Material is provided by Dell EMC through the Dell EMC Academic
Alliance
Electronic versions (PDF) of the slides
Slides are protected by copyright and cannot be distributed freely
We recommend to consider the theoretic side of data science

Lectures available as Web stream


https://webconf.vc.dfn.de/datascience-ugoe/
2

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda 10

Exercise

Practical application of concepts from the lecture


NOT weekly!
First exercise to be announced

Divided into two parts


Five programming exercises
Solutions must be presented to a lecturer during the exercise
sessions
50% of points on each exercise sheet required for passing

Final project
Small data analysis project as group work
Presentation of the results required for passing
2

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda 11

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda 12

Overall Course Goal

The goal of the Data Science


And Big Data Analytics Course
is for you to be able to
immediately participate as a
Data Science team member
on big data and other
analytics projects
Data Scientist p-o-v
Open
Practical

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

13

Expected Background

Strong mathematical, quantitative

capability
Experience with statistical methods
and basic proficiency with a statistical
software package, such as R or
RStudio, Minitab, Matlab, SAS, or
SPSS
Experience with the conditioning and
management of business data
including databases
Basic programming skills, preferably
including SQL
2

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda 14

Course Objectives
Upon completion of this course, you should be able to:
Immediately participate and contribute as a data science team
member on big data and other analytics projects by:
Deploy a structured lifecycle approach to data science and big data analytics projects
Reframe a business challenge as an analytics challenge
Apply analytic techniques and tools to analyze big data, create statistical models, and
identify insights that can lead to actionable results
Select optimal visualization techniques to clearly communicate analytic insights to
business sponsors and others
Use tools such as R and RStudio, MapReduce/Hadoop, in-database analytics, and
window and MADlib functions

Explain how advanced analytics can be leveraged to create


competitive advantage and how the data scientist role and skills
differ from those of a traditional business intelligence analyst
2

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda

15

Course Modules and Navigation Icons


Data Science and Big Data Analytics
1.

Introduction to Big Data Analytics

2.

Data Analytics Lifecycle + Lab

3.

Review of Basic Data Analytics Methods Using R +


Labs

4.

Advanced Analytics - Theory & Methods + Labs

5.

Advanced Analytics - Technology & Tools + Labs

6.

The Endgame, or Putting it All Together + Final Lab

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda 16

Topics : Data Science and Big Data Analytics


Introduction
Review of Basic Data
Advanced Analytics Advanced Analytics The Endgame, or
Course
to Big Data
Analytic Methods Using
Theory and
- Technology and
Putting it All Together
Analytics
+
Data
Analytics
Lifecycle

Methods

Tools

+
Final Lab on Big
Data Analytics

Big Data
Overview

Using R to Look at Data


- Introduction to R

K-means
Clustering

Operationalizing an
Analytics Project

State of the
Practice in
Analytics

Analyzing and Exploring


the Data

Association Rules

Analytics for
Unstructured Data
(MapReduce and
Hadoop)

The Data
Scientist

Linear Regression
Statistics for Model
Building and Evaluation

Big Data
Analytics in
Industry
Verticals
Data
Analytics
Lifecycle

The Hadoop
Ecosystem

Logistic Regression
Naive Bayesian
Classifier
Decision Trees
Time Series
Analysis

In-database
Analytics SQL
Essentials
Advanced SQL and
MADlib for Indatabase Analytics

Creating the Final


Deliverables
Data Visualization
Techniques
+ Final Lab
Application of the
Data Analytics
Lifecycle to a Big
Data Analytics
Challenge

Text Analysis

EMC PROVEN PROFESSIONAL


Copyright 2012 EMC Corporation. All Rights Reserved.

Introduction and Course Agenda 17

Vous aimerez peut-être aussi