Académique Documents
Professionnel Documents
Culture Documents
LDA,KNN and LR
STEP-1:
Collinearity Analysis:
First we performed collinearity analysis to find out, if any, high collinearity exist among dependent
variables. The cut off value for high collinearity is considered to be 0.7
Using the collinearity analysis we dropped following 6 variables from our dataset:
PriceCH
SalePriceCH
PriceDiff
DiscMM
SalePriceMM
PctDiscCH
Collinearity output:
Post variable reduction using collinearity method the data set has following remaining non-collinear
variables:
STEP-2:
To perform further analysis dataset was partitioned in training and validation sets in the proportion of
70:30 percentage, where 70%of original dataset was assigned as training data set and rest 30% as
validation data set.
LOGISTIC REGRESIION ANALYSIS:
The cutoff probability for prediction under this regression analysis was set to 0.5.
Training Confusion Matrix:
Post training and prediction on the logistic model using training dataset following confusion matrix was
obtained:
Prediction Histogram:
The above histogram for CH and MM purchase groups shows significant overlap in the central area.
Therefore from above observation we can conclude that prediction accuracy for LDA model is lower
than Logistic model and also the significant overlap in the histogram among two categories of Purchase
for LDA prediction model shows that this model is not good fit in predictive power for the given dataset.
STEP-4:
KNN ANALYSIS:
Confusion Matrix:
CONCLUSION: Among all the three models Logistic Regression model is most powerful and accurate and
best fit for the given type of dataset. It has good accuracy, lesser misclassification and ROC curve hihly
supports this model.