Vous êtes sur la page 1sur 2

Data Mining Courses

ABBOTT ANALYTICS
P.O. BOX 22536
SAN DIEGO, CA 92192
PHONE (858) 922-9621
HTTP://WWW.ABBOTTANALYTICS.COM

Experiencing Data Mining Using STATISTICA: Day 1


1. Data Mining Overview
1.1. What is Data Mining?
1.2. Data Mining Process
2. Business Understanding
2.1. Business and Data Mining Questions
2.2. Identifying Data for Data Mining
3. STATISTICA Concepts
3.1. Project Navigator, Diagram Workspace, Project Panel, Nodes, Process Flow;
3.2. Creating Projects, Loading Data: flat files, STATISTICA files, databases
3.2.1. Exercise #1: Data Import, Variable Specs
4. Data Understanding
4.1. Summarizing Data: Continuous and Categorical Variables
4.1.1. Exercise #2: Basic Statistics / Tables
4.2. Data Visualization
4.2.1. Exercise #3: 2-D Graphs (Histograms, Scatterplots), Matrix Plots
5. Data Preparation
5.1. Data Cleaning
5.1.1. When Must One Clean Data?
5.1.2. Techniques For Outliers And Missing Data
5.1.2.1. Exercise #4: Data Filtering/Recoding, Recode values, Batch Transformations
5.2. Feature Creation and Selection
5.2.1. Why Create Features?
5.2.2. Correcting Problems with Continuous and Categorical Variables: Transforms
5.2.2.1. Exercise #5: New Variables/Formulas, Batch Transformations
5.3. Variable Selection
5.3.1. Removing Poor Predictors
5.3.1.1. Exercise #6: Feature Selection and Variable Screening, PCA
5.4. Sampling
5.4.1. Why Sample Data; Rules of Thumb in Sampling
5.4.1.1. Exercise #7: Random Sampling, Bootstrap Sampling

Experiencing Data Mining Using STATISTICA: Day 2


6. Modeling: Supervised Learning
6.1. Decision Trees: Principles and Options
6.1.1. Exercise #8: Classification/Regression Trees, CHAID, Interactive Trees
6.2. Linear and Logistic Regression: Principles and Options
6.2.1. Exercise #9: Multiple Regression
6.3. Neural Networks: Principles and Options
6.3.1. Exercise #10: Automated Neural Networks
7. Model Performance Evaluation: Supervised Learning
7.1. How Algorithms Score Models: Traditional Approaches
Exercise #10: Model Comparisons, Lift, ROC
7.1.1.
7.2. Customized Performance Evaluations
7.2.1. Exercise #11: Model Comparison, VB for custom scoring
8. Modeling: Unsupervised Learning
8.1. Clustering: Principles and Options
8.1.1. Exercise #12: E-M and K-Means
8.2. Association Rules: Principles and Options
8.2.1. Exercise #13: Association Rules
9. Model Evaluation: Unsupervised Learning
9.1. How Algorithms Describe Models
9.2. Using Trees to Describe Clustering Models
9.2.1. Exercise #15: Evaluating K-Means or Neural Networks with Trees
9.3. Reporting Results
9.3.1. Exercise #16: Workbooks
10. Model Deployment
10.1. Deployment Considerations
10.1.1. Exercise #17: Rapid Deployment on New Data (PMML), STATISTICA Scoring

Vous aimerez peut-être aussi