Vous êtes sur la page 1sur 10

DATA SCIENCE

Introduction to Data Science  ML Fundamentals


 Introduction to Data Science  Understanding Supervised and
 Data Unsupervised Learning Techniques
 Big Data
 Clustering
 Data Science Deep Dive
 Implementing Association rule mining
Intro to R Programming & Advanced R  Understanding Process flow of Supervised
Programming Learning Techniques
 Decision Tree Classifier
 Introduction to R  Random Forest Classifier
 R Programming Concepts  What is Random Forests
 Data Manipulation in R  Naive Bayes Classifier.
 Data Import Techniques in R  Project Discussion
 Exploratory Data Analysis (EDA) using R  Problem Statement and Analysis
 Data Visualization in R  Linear Regression
Python Programming & Machine Learning  Logistic Regression
 Text Mining
 Big Data and Hadoop Introduction  Sentimental Analysis
 Map Reduce Concepts  Support Vector Machines
 PIG  Deep Learning
 HIVE  Time Series Analysis
 Sqoop
Apache Spark using Python
Statistics
 Apache Spark
 Statistics basics  Spark Core Architecture
 Spark Internals Detailed
Machine Learning using R & Python
 Intro to Spark Streaming
 Introduction  Intro to Spark GraphX Programming
 Intro to Spark Mllib
DATA SCIENCE
Introduction to Data Science  Big Data Distributed Computing &
 Need for Data Scientists Complexity
 Foundation of Data Science  Hadoop
 What is Business Intelligence  Map Reduce Framework
 What is Data Analysis  Hadoop Ecosystem
 What is Data Mining
 What is Machine Learning Data Science Deep Dive
 Analytics vs Data Science
 What Data Science is
 Value Chain
 Why Data Scientists are in demand
 Types of Analytics
 What is a Data Product
 Lifecycle Probability
 The growing need for Data Science
 Analytics Project Lifecycle
 Large Scale Analysis Cost vs Storage
Data  Data Science Skills
 Data Science Use Cases
 Basis of Data Categorization  Data Science Project Life Cycle & Stages
 Types of Data  Map Reduce Framework
 Data Collection Types  Hadoop Ecosystem
 Forms of Data & Sources  Data Acuqisition
 Data Quality & Changes  Where to source data
 Data Quality Issues  Techniques
 Data Quality Story  Evaluating input data
 What is Data Architecture  Data formats
 Components of Data Architecture  Data Quantity
 OLTP vs OLAP  Data Quality
 How is Data Stored?  Resolution Techniques
 Data Transformation
Big Data
 File format Conversions
 What is Big Data?  Annonymization
 5 Vs of Big Data
Introduction to R Programming
 Big Data Architecture
 Big Data Technologies  Introduction to R
 Big Data Challenge  Business Analytics
 Big Data Requirements  Analytics concepts
 The importance of R in analytics  Connecting to RDBMS from R using ODBC
 R Language community and eco-system and basic SQL queries in R
 Usage of R in industry  Web Scraping
 Installing R and other packages  Other concepts on Data Import Techniques
 Perform basic R operations using command
Exploratory Data Analysis (EDA) using R
line
 Usage of IDE R Studio and various GUI  What is EDA?
 Why do we need EDA?
R Programming Concepts
 Goals of EDA
 The datatypes in R and its uses  Types of EDA
 Built-in functions in R  Implementing of EDA
 Subsetting methods  Boxplots, cor() in R
 Summarize data using functions  EDA functions
 Use of functions like head(), tail(), for  Multiple packages in R for data analysis
inspecting data  Some fancy plots
 Use-cases for problem solving using R  Use-cases for EDA using R

Data Manipulation in R Data Visualization in R

 Various phases of Data Cleaning  Story telling with Data


 Functions used in Inspection  Principle tenets
 Data Cleaning Techniques  Elements of Data Visualization
 Uses of functions involved  Infographics vs Data Visualization
 Use-cases for Data Cleaning using R  Data Visualization & Graphical functions in
R
Data Import Techniques in R
 Plotting Graphs
 Import data from spreadsheets and text  Customizing Graphical Parameters to
files into R improvise the plots
 Importing data from statistical formats  Various GUIs
 Packages installation for database import  Spatial Analysis
 Other Visualization concepts

==========================================================================================
Python Programming
Getting Started with Python  Errors and Exception Handling
 Handling Multiple Exceptions
 Python Overview  The Standard Exception Hierarchy
 About Interpreted Languages  Using Modules
 Advantages/Disadvantages of Python  The Import Statement
pydoc.  Module Search Path
 Starting Python  Package Installation Ways.
 Interpreter PATH
 Using the Interpreter Regular Expressionsit's Packages and Object
 Running a Python Script Oriented Programming in Python
 Using Variables
 Keywords  The Sys Module
 Built-in Functions  Interpreter Information
 StringsDifferent Literals  STDIO
 Math Operators and Expressions  Launching External Programs
 Writing to the Screen  Paths Directories and Filenames
 String Formatting  Walking Directory Trees
 Command Line Parameters and Flow  Math Function
Control.  Random Numbers
 Dates and Times
Sequences and File Operations  Zipped Archives
 Introduction to Python Classes
 Lists  Defining Classes
 Tuples  Initializes
 Indexing and Slicing  Instance Methods
 Iterating through a Sequence  Properties
 Functions for all Sequences  Class Methods and DataStatic Methods
 Using Enumerate()  Private Methods and Inheritance
 Operators and Keywords for Sequences  Module Aliases and Regular Expressions.
 The xrange() function
 List Comprehensions Debugging, Databases and Project Skeletons
 Generator Expressions
 Dictionaries and Sets  Debugging
 Dealing with Errors
Deep Dive - Functions Sorting Errors and Exception  Creating a Database with SQLite 3
Handling  CRUD Operations
 Creating a Database Object.
 Functions
 Function Parameters Learning NumPy
 Global Variables Pllotting using Matplotlib and Seabron
 Variable Scope and Returning Values. Machine Learning application
Sorting Introduction to Pandas
 Alternate Keys Creating Data Frames
 Lambda Functions GroupingSorting
 Sorting Collections of Collections Plotting Data
 Sorting Dictionaries Creating Functions
 Sorting Lists in Place Converting Different Formats
Combining Data from Various Formats Slicing/Dicing Operations.
Machine Learning Python
First Machine Learning Algorithm Python

 Various machine learning algorithms in Python


 Apply machine learning algorithms in Python

Feature Selection and Preprocessing

 How to select the right data


 Which are the best features to use
 Additional feature selection techniques
 A feature selection case study
 Preprocessing Introduction
 Preprocessing Scaling Techniques
 How to preprocess your data
 How to scale your data
 Feature Scaling Final Project

Which Algorithms Perform Best

 Highly efficient machine learning algorithms


 Bagging Decision Trees
 The power of ensembles
 Random Forest Ensemble technique
 Boosting - Adaboost
 Boosting ensemble stochastic gradient boosting
 A final ensemble technique

Model Selection Cross Validation Score

 Introduction Model Tuning


 Parameter Tuning GridSearchCV
 A second method to tune your algorithm
 How to automate machine learning
 Which ML algo should you choose
 How to compare machine learning algorithms in practice
Spark
Module 1 Apache Spark  Spark for Scalable Systems
Module 2 Introduction to Scala  Spark Execution Context
Module 3 Spark Core Architecture  What is RDD
Module 4 Spark Internals  RDD Deep Dive
 RDD Dependencies
Module 5 Spark Streaming
 RDD Lineage
Module 6 Spark GraphX Programming
 Spark Application In Depth
Module 7 Introducing Mllib  Spark Deployment
 Parallelism in Spark
Module 1: Apache Spark
 Caching in Spark
 Introduction to Apache Spark
 Why Spark Module 4: Spark Internals
 Batch Vs. Real Time Big Data Analytics
 Batch Analytics - Hadoop Ecosystem  Spark Transformations
Overview,  Spark Actions
 Real Time Analytics Options,  Spark Cluster
 Streaming Data - Storm,  Spark SQL Introduction
 In Memory Data - Spark, What is Spark?,         Spark Data Frames
 Spark benefits to Professionals  Spark SQL with CSV
 Limitations of MR in Hadoop  Spark SQL with JSON
 Components of Spark  Spark SQL with Database
 Spark Execution Architecture
 Benefits of Apache Spark Module 5: Spark Streaming
 Hadoop vs Spark
 Features of Spark Streaming
Module 2: Introduction to Scala  Micro Batch
 Dstreams
 Features of Scala  Transformations on Dstreams
 Basic Data Types of Scala  Spark Streaming Use Case 1
 Val vs Var  Spark Streaming Use Case 2
 Type Inference  Spark Streaming Use Case 3
 REPL
 Objects & Classes in Scala Module 6: Spark GraphX Programming
 Functions as Objects in Scala
 Anonymous Functions in Scala  Introduction to Graph Parallel Systems
 Higher Order Functions  Introduction to GraphX
 Lists in Scala  Features of GraphX
 Maps  GraphX Deep Dive
 Pattern Matching  Graph Builder
 Traits in Scala
 Collections in Scala Module 7: Introducing Mllib

 Using Mllib for Movie Recommendations


Module 3: Spark Core Architecture
 Analyzing Recommendation Results using
 Spark & Distributed Systems Spark
Statistics + Machine
Learning
Statistics
Clustering
 What is Statistics  Similarity Metrics
 Descriptive Statistics  Distance Measure Types: Euclidean, Cosine
 Central Tendency Measures Measures
 The Story of Average  Creating predictive models
 Dispersion Measures  Understanding K-Means Clustering
 Data Distributions  Understanding TF-IDF, Cosine Similarity and
 Central Limit Theorem their application to Vector Space Model
 What is Sampling  Case study
 Why Sampling
 Sampling Methods Implementing Association rule mining
 Inferential Statistics  Similarity Metrics
 What is Hypothesis testing  What is Association Rules & its use cases?
 Confidence Level  What is Recommendation Engine & it’s
 Degrees of freedom working?
 what is pValue  Recommendation Use-case
 Chi-Square test  Case study
 What is ANOVA Understanding Process flow of Supervised Learning
 Correlation vs Regression Techniques
 Uses of Correlation & Regression Decision Tree Classifier
 How to build Decision trees
Machine Learning  What is Classification and its use cases?
Introduction  What is Decision Tree?
 Algorithm for Decision Tree Induction
 ML Fundamentals  Creating a Decision Tree
 ML Common Use Cases  Confusion Matrix
 Understanding Supervised and  Case study
Unsupervised Learning Techniques Random Forest Classifier
 What is Random Forests
 Features of Random Forest  Linear SVMs
 Out of Box Error Estimate and Variable  The Kernel Trick
Importance  Non-Linear SVMs
 Case study  The Kernel SVM
Naive Bayes Classifier.
 Case study
Project Discussion Time Series Analysis
Problem Statement and Analysis  Describe Time Series data
 Various approaches to solve a Data Science  Format your Time Series data
Problem  List the different components of Time Series
 Pros and Cons of different approaches and data
algorithms.  Discuss different kind of Time Series
Linear Regression scenarios
 Case study  Choose the model according to the Time
 Introduction to Predictive Modeling series scenario
 Linear Regression Overview  Implement the model for forecasting
 Simple Linear Regression  Explain working and implementation of
 Multiple Linear Regression ARIMA model
Logistic Regression  Illustrate the working and implementation
 Case study of different ETS models
 Logistic Regression Overview  Forecast the data using the respective
 Data Partitioning model
 Univariate Analysis  What is Time Series data?
 Bivariate Analysis  Time Series variables
 Multicollinearity Analysis  Different components of Time Series data
 Model Building  Visualize the data to identify Time Series
 Model Validation Components
 Model Performance Assessment  Implement ARIMA model for forecasting
 Scorecard  Exponential smoothing models
Text Mining  Identifying different time series scenario
 Case study based on which different Exponential
Sentimental Analysis Smoothing model can be applied
 Case study  Implement respective model for forecasting
Support Vector Machines  Visualizing and formatting Time Series data
 Case Study  Plotting decomposed Time Series data plot
 Introduction to SVMs  Applying ARIMA and ETS model for Time
 SVM History Series forecasting
 Vectors Overview  Forecasting for given Time period
 Decision Surfaces  Case Study
TABLEAU
Introduction to Tableau  Understanding Aggregation, Granularity,
 Installing Tableau Desktop and Tableau and Level of Detail
Public (FREE)  Creating an Area Chart & Learning About
 Challenge description + view data in file Highlighting
 Connecting Tableau to a Data file - CSV file  Adding a Filter and Quick Filter
 Navigating Tableau - Measures and  Time series, Aggregation, and Filters
Dimensions
 Creating a calculated field Maps, Scatterplots, and Your First Dashboard
 Adding colors  Joining Data in Tableau
 Adding labels and formatting  Creating a Map, Working with Hierarchies
 Exporting your worksheet  Creating a Scatter Plot, Applying Filters to
Multiple Worksheets
Tableau Basics  Let's Create our First Dashboard!
 How to visualize an ad-hoc A-B test in  Adding an Interactive Action - Filter
Tableau  Adding an Interactive Action - Highlighting
 Working with Aliases  Maps, Scatterplots, and Your First
 Adding a Reference Line Dashboard
 Looking for anomalies
 Creating bins & Visualizing distributions Creating Animations in Tableau
 Creating a classification test for a numeric  Project Brief
variable  Editing Blending Relationships
 Combining two charts and working with  Building the Visualization
them in Tableau  Adding Animation
 Visualizing Balance and Estimated Salary  Manually Sorting Blended Data
distribution  Finalizing the Dashboard
 Working with Time Series  Animations in Tableau

Vous aimerez peut-être aussi