Académique Documents
Professionnel Documents
Culture Documents
Contents [hide]
1 Data Science Training
1.1 Data Science
1.1.1 Introduction to Data Science
1.1.2 Data
1.1.3 Big Data
1.1.4 Data Science Deep Dive
1.1.5 Intro to R Programming
1.1.6 R Programming Concepts
1.1.7 Data Manipulation in R
1.1.8 Data Import Techniques in R
1.1.9 Exploratory Data Analysis (EDA) using R
1.1.10 Data Visualization in R
1.2 HADOOP
1.2.1 Big Data and Hadoop Introduction
1.2.2 Understand Hadoop Cluster Architecture
1.2.3 Map Reduce Concepts
1.2.4 Advanced Map Reduce Concepts
1.2.5 Hadoop 2.0 & YARN
1.2.6 PIG
1.2.7 Module 7
1.2.8 HIVE
1.2.9 Module-9
1.2.10 HBASE
1.2.11 Module-11
1.2.12 SQOOP
1.2.13 Flume & Oozie
1.2.14 Projects
1.2.15 Project in Healthcare Domain
1.2.16 Project in Finance/Banking Domain
1.3 Spark
1.3.1 Apache Spark
1.3.2 Introduction to Scala
1.3.3 Spark Core Architecture
1.3.4 Spark Internals
1.3.5 Spark Streaming
1.4 Statistics + Machine Learning
1.4.1 Statistics
1.4.2 Whats is Statistics
1.4.3 Machine Learning
1.4.4 Machine Learning Introduction
1.5 Python
1.5.1 Getting Started with Python
1.5.2 Sequences and File Operations
1.5.3 Deep Dive Functions Sorting Errors and Exception Handling
1.5.4 Regular Expressionsits Packages and Object Oriented Programming in Python
1.5.5 Debugging, Databases and Project Skeletons
1.5.6 Machine Learning Using Python
1.5.7 Supervised and Unsupervised learning
1.5.8 Algorithm
1.5.9 Application Example.
1.5.10 Scikit and Introduction to Hadoop
1.5.11 Hadoop and Python
1.5.12 Python Project Work
1.5.12.1 share training and course content with friends and students:
Value Chain
Types of Analytics
Lifecycle Probability
Data
Types of Data
OLTP vs OLAP
Big Data
5 Vs of Big Data
Hadoop
Hadoop Ecosystem
Hadoop Ecosystem
Data Acuqisition
Techniques
Data formats
Data Quantity
Data Quality
Resolution Techniques
Data Transformation
Annonymization
Intro to R Programming
Introduction to R
Business Analytics
Analytics concepts
Usage of R in industry
R Programming Concepts
Built-in functions in R
Subsetting methods
Data Manipulation in R
Web Scraping
What is EDA?
Goals of EDA
Types of EDA
Implementing of EDA
Boxplots, cor() in R
EDA functions
Data Visualization in R
Principle tenets
Plotting Graphs
Various GUIs
Spatial Analysis
HADOOP
Big Data and Hadoop Introduction
Hadoop Architecture
Distributed Model
Replication
Fault Tolerance
Why Hadoop?
Hadoop Eco-System
5 Daemons
Hands-On Exercise
Typical Workflow
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
Rack Awareness
What is Mapper?
What is Reducer?
What is Shuffling?
Hands-On Exercise
Hands-On Exercise
What is Combiner?
Hands-On Exercise
What is Partitioner?
Hands-On Exercise
What is Counter?
Hands-On Exercise
InputFormats/Output Formats
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
MR Distributed Cache
Hands-On Exercise
Hands-On Exercise
Configuration of Hadoop
NN Scalability
NN SPOF & HA
Hadoop 2.0 HA
Introduction to Pig
What Is Pig?
Hands-On Exercise
Loading Data
Hands-On Exercise
Field Definitions
Data Output
Hands-On Exercise
Hands-On Exercise
Commonly-Used Functions
Hands-On Exercise
Storage Formats
Hands-On Exercise
Grouping
Hands-On Exercise
Hands-On Exercise
Hands-On Exercises
Hands-On Exercise
PIG
Module 7
Hands-On Exercise
Hands-On Exercise
HIVE
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
Module-9
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
Hands-On Exercise
HBASE
CAP Theorem
Introduction to HBase
Module-11
SQOOP
Introduction to Sqoop
Incremental append
What is Flume?
Oozie
Oozie properties
Hands on exercises
Projects
Hadoop Project
Objective
Problem Definition
Solution
Objective
Problem Definition
Solution
Objective
Problem Definition
Solution
Spark
Apache Spark
Why Spark
Limitations of MR in Hadoop
Components of Spark
Hadoop vs Spark
Introduction to Scala
Features of Scala
Val vs Var
Type Inference
REPL
Lists in Scala
Maps
Pattern Matching
Traits in Scala
Collections in Scala
What is RDD
RDD Dependencies
RDD Lineage
Spark Deployment
Parallelism in Spark
Caching in Spark
Spark Internals
Spark Transformations
Spark Actions
Spark Cluster
Spark Streaming
Micro Batch
Dstreams
Transformations on Dstreams
Statistics
Whats is Statistics
Descriptive Statistics
Dispersion Measures
Data Distributions
What is Sampling
Why Sampling
Sampling Methods
Inferential Statistics
Confidence Level
Degrees of freedom
what is pValue
Chi-Square test
What is ANOVA
Correlation vs Regression
Machine Learning
Machine Learning Introduction
ML Fundamentals
Clustering
Similarity Metrics
Understanding TF-IDF, Cosine Similarity and their application to Vector Space Model
Case study
Case study
Case study
Case study
Case study
Project Discussion
Linear Regression
Case study
Logistic Regression
Case study
Text Mining
Case study
Sentimental Analysis
Case study
Python
Getting Started with Python
Python Overview
Starting Python
Interpreter PATH
Using Variables
Keywords
Built-in Functions
StringsDifferent Literals
String Formatting
Lists
Tuples
Using Enumerate()
List Comprehensions
Generator Expressions
Functions
Function Parameters
Global Variables
Alternate Keys
Lambda Functions
Sorting Dictionaries
Using Modules
Interpreter Information
STDIO
Math Function
Random Numbers
Zipped Archives
Defining Classes
Initializers
Instance Methods
Properties
Debugging
Project Skeleton
Required Packages
Project Directory
CRUD Operations
Why Python
Learning NumPy
Learning Scipy
Classification Problem
Algorithm
Clustering Problem
Application Example.
Introduction to Pandas
GroupingSorting
Plotting Data
Creating Functions
Slicing/Dicing Operations.
Introduction to Scikit-Learn