Vous êtes sur la page 1sur 16

Answer ALL questions. Answer to the point within the space provided.

 
No extra sheets will be provided. Keep in mind good time and space 
management. 

Wherever relevant and not given with the question, you may take 
level of significance as 5 percent. 

Set A ‐ Main 
1. a. Differentiate exploratory research and descriptive research. 
b. Differentiate observation method and focus group discussion method in the context of data 
collection. 
(5+5=10) 
ANSWER:     
a. P.21 & 22 in Text Book. 
b. P.188‐189 & 84 in Text Book 
 
2. a. Differentiate completely randomized design and randomized block design. 
b. Differentiate one‐way ANOVA and two‐way ANOVA. 
(5+5=10) 
ANSWER:     
a. P.497 & 514 in ASW Text Book. 
b. P.502 & 517 in ASW Text Book 
3. a. Differentiate Likert scale and Semantic differential scale used in attitude measurement. 
b. What is survey research? Differentiate cross‐sectional study and longitudinal study. 
(5+5=10) 
ANSWER:     
a. P.270‐271 in Text Book. 
b. P. 142 in Text Book 
 
4. a.    Specify the four keys assumptions of Linear Regression Model.            (4 marks) 
 

ANSWER:      i.            The mean of error term (random variable) is zero. 

ii. The variance of error term is constant. 
iii. The error terms are uncorrelated. 
iv. The error term is normally distributed. 
 
c. Should we use R‐square or adjusted R‐square when judging for the predictive ability of the 
regression model?                                           (1 mark) 
ANSWER: Use Adjusted R‐square as it ensures that all included variables have significant 
contribution to prediction. 
d. Suppose you wish to include a categorical variable with three levels as explanatory variable 
in a regression model. How would you include the variable in the regression model?   
(2 marks) 
ANSWER:  
We should split the category variable X into two dummy variables, X1 and X2 defined as 
follows:. 
X1 = 1 whenever level 1 occurs  
     =  0 otherwise 
And 
X2 = 1 whenever level 2 occurs  
     = 0 otherwise 
Note that inclusion of an X3 for level 3 occurrences on similar lines will lead to perfect 
multicollinearity & hence should be avoided.  
Also, level 3 occurrence is captured when X1 = 0 & X2 = 0. 

Answer (d) to (g), based on the SAS output below:  

Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 2.29579 0.76526 21.31 0.0001
Error 10 0.35916 0.03592
Corrected Total 13 2.65495
 

Root MSE 0.18952 R-Square 0.8647


Dependent Mean 8.63500 Adj R-Sq 0.8241
Coeff Var 2.19475
 

Parameter Estimates
Parameter Standard Standardized Variance
Variable DF Estimate Error t Value Pr > |t| Estimate Tolerance Inflation
Intercept 1 -0.05535 1.12839 -0.05 0.9618 0 . 0
X1 1 0.27602 0.12129 2.28 0.0461 0.33165 0.63692 1.57005
X2 1 0.44687 0.11107 4.02 0.0024 0.49401 0.89732 1.11443
X3 1 0.27028 0.09527 2.84 0.0176 0.40332 0.66935 1.49398
 
Test of First and Second
Moment Specification
DF Chi-Square Pr > ChiSq
9 5.25 0.8116
 

e.  Based on the output above, comment on the model significance (specify the hypothesis and 
the reason).                                                                                                         (2 marks) 
ANSWER:   Model is significant since p‐value (.0001) associated with F statistic is less than 
alpha (.05). The null hypothesis for this test is that all the coefficients/parameters are 
simultaneously equal to zero. 
 
f. Based on the output above, comment on the significance of the individual independent 
variables (specify the hypothesis and the reason).                                          (2 marks) 
ANSWER: We can answer this by taking a look at the p‐values associated with t‐statistic for 
each independent variable. All the three variables x1, x2, x3 have significant influence on 
dependent variable (p‐values .04, .002, .017 respectively are all less than alpha of .05). The 
null hypothesis for each test being that the coefficient of the particular variable (parameter) 
is equal to zero. 
g.  Indicate the criteria used and compare the relative influence of the independent variables. 
(2 marks) 
ANSWER:  By taking a look at the standardized estimates one can infer that the variable X2 is 
having the maximum influence on the dependent variable, followed by X3 and X1. 
h. What is multicollinearity? Does the output above indicate any presence of multicollinearity 
(specify the reason). (2 marks) 
 
 
ANSWER:  Presence of high correlation between independent variables is termed as 
multicollinearity. All VIF values in the output are less than 10, which indicate that 
multicollinearity is not a problem in this data/output. 
 
 
5. A retail company has segmented its customers into two categories according to their
loyalty categories card membership status viz. (i) loyalty card holders, (ii) loyalty card
non holders. The retailer wants to better understand the difference between these two
groups. A random sample of 200 of the retailer’s customers has recently participated in a
survey. The questionnaire included several questions about the customers’ perception of
the loyalty card such as the ease of usage, customer service, monetary benefits, emotional
attachment with the retailer and new product offerings. The responses were elicited on a
10 point Likert scale ranging from “Strongly disagree” to “Strongly agree”. The retail
company is interested in gaining a deeper knowledge about the attribute which
differentiate between a loyalty card holder vis-à-vis a loyalty card non holder. Data
collected was analyzed using SAS Enterprise Guide 4.2 which revealed the following
tables:

Table 1: Univariate Test Statistics


F Statistics, Num DF=1, Den DF=198
R-
Total Pooled Between Square
Standard Standard Standard R- / (1- F
Variable Deviation Deviation Deviation Square RSq) Value Pr > F
Ease of Usage 0.7689 0.7548 0.2199 0.0411 0.0429 8.49 0.0040
Customer Service 1.6552 1.6594 0.008398 0.0000 0.0000 0.00 0.9597
Monetary Benefits 1.3174 1.1331 0.9548 0.2640 0.3586 71.01 <.0001
Emotional Attachment with Retailer 0.8753 0.8774 0.0212 0.0003 0.0003 0.06 0.8094
New Products Offerings 1.4960 1.4977 0.1102 0.0027 0.0027 0.54 0.4628

Table 2: Multivariate Statistics and Exact F Statistics


S=1 M=1.5 N=96
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.66815374 19.27 5 194 <.0001

Table 3: Number of Observations and Percent Classified into Loyalty Card


Membership
into Loyalty Card Membership
From Loyalty Card Loyalty Card Loyalty Card Non
Total
Membership Holders Holders
Loyalty Card Holders x 26 119
Loyalty Card Non
12 y 81
Holders
Total 105 95 200

Table 4: Linear Discriminant Function for Loyalty Card Membership

Variable Loyalty Card Loyalty Card Non


Holders Holders
Constant -48.61049 -50.91007
Ease of Usage 5.09524 4.15141
Customer Service -4.44927 -4.5397
Monetary Benefits 1.54891 2.82456
Emotional Attachment with Retailer 13.79229 13.62917
New Products Offerings 1.81431 1.75714
a) Comment on the overall fit of the discriminant function at 5 % level of significance? Also
state the null hypothesis for the same.

Null Hypothesis: In the population, the means of the discriminant functions in both
the selected groups are equal is rejected as indicated by Table 2 in which the p value of
Wilks’ Lambda is found to be significant.

b) Name the variables significantly differentiating between the two groups.

Based on Table 1, it can be concluded that ‘ease of use’ and ‘monetary benefits’
differentiate between loyalty card holders versus non holders.

c) Provide the values of x and y in Table 3. Comment on the validity of the discriminant
function.

x= 93; y=69. So hit ratio = (93+69)/148 = 162/200 = 0.81 which provides the predictive
accuracy of the discriminant function and indicates that 81 % of the cases were
correctly classified by the discriminant function. Since, the discriminant function
considerably improves the classification of the customers into groups compared to
apriori classification. The validity of the discriminant analysis can be deemed as
satisfactory.

d) Mr. Vishal Jain, a new customer, has the following perceptions on the response variables:

Ease of Usage Customer Monetary Emotional New Products


Service Benefits Attachment with Offerings
Retailer
9 1 5 5 7

Based on the above responses, can we consider him as a potential loyalty card holder?
Justify.

(2+2+2+2= 8 marks)
Variable Responses LoyaltyLoyalty
Card Card
Holders Non
Holders Score 1  Score 2 
Constant -48.61049 -50.91007 -48.61049 -50.91
Ease of Usage 9 5.09524 4.15141 45.85716  37.36269
Customer Service 1 -4.44927 -4.5397 ‐4.44927  ‐4.5397
Monetary Benefits 5 1.54891 2.82456 7.74455  14.1228
Emotional Attachment with Retailer 5 13.79229 13.62917 68.96145  68.14585
New Products Offerings 7 1.81431 1.75714 12.70017  12.29998
            82.20357  76.48155
Yes, Mr. Vishal Jain is a potential loyalty card holder based on the discriminant scores.

6. Organiz Retail Stores is part of Delhi based diversified “Vedkrishna group”. It operates
a Chain of grocery stores which cater to the everyday needs of customers, under the
brand name “Door-to-door”. Recently, company decided to expand its operations into
Hyderabad. Before launching its grocery store in Hyderabad, the company decided to
analyze consumers’ attitude towards shopping. Based on past research, six variables
were identified. A survey was conducted among consumers. During this survey
consumers were asked to express their degree of agreement with the following six
statements measured on a 7-point Likert type scale (1 = totally disagree, 7 = totally
agree).

Item No. 1: Shopping is fun

Item No. 2: Shopping is bad for your budget

Item No. 3: I combine shopping with eating out

Item No. 4: I try to get the best buys when shopping

Item No. 5: I don’t care about shopping

Item No. 6: You can save a lot of money by comparing prices

Data was obtained from a sample of 20 respondents. The company conducted a series of
analysis using SAS EG, such as cluster, frequency analysis, and discriminant analysis. The
following are the selected SAS EG cluster analysis output.

Table 1: Cluster History

NCL Clusters Joined FREQ SPRSQ RSQ T


Table 1: Cluster His
story

i
e
19 OB9
O OB14 2 0.0081 0.992
18 OB15
O OB16 2 0.0084 0.983
17 OB8
O OB13 2 0.0114 0.972
16 OB7
O OB11 2 0.0141 0.958
15 OB2
O OB18 2 0.0182 0.94
14 CL17
C OB19 3 0.02 0.92
13 CL19
C OB12 3 0.0204 0.899
12 OB10
O OB20 2 0.0234 0.876
11 CL15
C OB5 3 0.0251 0.851
10 OB4
O CL
L16 3 0.0285 0.822
9 OB3
O CL
L18 3 0.0374 0.785
8 OB1
O CL
L13 4 0.0454 0.74
7 CL9
C OB17 4 0.0501 0.689
6 CL8
C CL
L10 7 0.0508 0.639
5 CL11
C OB6 4 0.0612 0.577
4 CL14
C CL
L12 5 0.0949 0.483
3 CL5
C CL
L7 8 0.1104 0.372
2 CL6
C CL
L4 12 0.1231 0.249
1 CL2
C CL
L3 20 0
0.249 0

 
 

a. What is the difference between cluster analysis and discriminant analysis?


In cluster analysis groups are not pre-defined, one uses cluster analysis to identify the
groups from the selected cases/objects. In discriminant analysis groups are pre-defined.
Discriminant analysis generally used to understand discriminating ability of the pre-
specified groups based on selected predictors.
a. How is semi-partial R-square (SPRSQ) used in identifying number of clusters to be
formed?
A large difference between two consecutive SPRSQs can be indicative of the number of
clusters to be formed. As a large indicates greater loss of homogeneity, we will prefer the
number of clusters corresponding to the lower SPRSQ, which will correspond to the
higher number of clusters.

b. Looking at the output, how many clusters will you recommend to the company? Justify
your answer.
Based on SPRSQ, one can take a call at 5 clusters. But keeping in mind that there are
only 20 observations / objects, one may take a call at 4 or 3 clusters.

Alternatively, in dendrogram drawing an imaginary line (here horizontal) will help the
researcher to identify number of cluster solutions. Here we can go with three cluster solutions.

(2+2+3=7 marks)

 7.

(a) In what ways, factor analysis is different from cluster analysis? Explain with an example.

(4)

Factor analysis aims at explaining correlation among a set of variables, i.e., we


try to group “like” variables in to a factor. On the other hand, Cluster analysis
groups “like” objects/respondents. Both techniques are used heavily in natural,
behavioral and market research context. While FA characterizes variables, CA
segments objects. Both techniques are used for data reduction – FA for variable
reduction and CA for separate analysis of each segment, post-segmentation.

Example: the Elan Case or Corpus Electronics Case can be looked from this
perspective.

(THE ABOVE IS THE MAIN DIFFERENCE.)


Cluster analysis and factor analysis are two statistical methods of data analysis. These two forms of
analysis are heavily used in the natural and behavior sciences. Both cluster analysis and factor analysis
allow the user to group parts of the data into "clusters" or onto "factors," depending on the type of
analysis. Some researchers new to the methods of cluster and factor analyses may feel that these two
types of analysis are similar overall. While cluster analysis and factor analysis seem similar on the
surface, they differ in many ways, including in their overall objectives and applications.

Objective: Cluster analysis and factor analysis have different objectives. The usual objective of factor
analysis is to explain correlation in a set of data and relate variables to each other, while the objective of
cluster analysis is to address heterogeneity in each set of data. In spirit, cluster analysis is a form of
categorization, whereas factor analysis is a form of simplification.

Complexity: Complexity is one question on which factor analysis and cluster analysis differ: data size
affects each analysis differently. As the set of data grows, cluster analysis becomes computationally
intractable. This is true because the number of data points in cluster analysis is directly related to the
number of possible cluster solutions. For example, the number of ways to divide twenty objects into 4
clusters of equal size is over 488 million. This makes direct computational methods, including the
category of methods to which factor analysis belongs, impossible.

Solution: Even though the solutions to both factor analysis and cluster analysis problems are subjective
to some degree, factor analysis allows a researcher to yield a "best" solution, in the sense that the
researcher can optimize a certain aspect of the solution (orthogonality, ease of interpretation and so on).
This is not so for cluster analysis, since all algorithms that could possibly yield a best cluster analysis
solution is computationally inefficient. Hence, researchers employing cluster analysis cannot guarantee
an optimal solution.

Applications: Factor analysis and cluster analysis differ in how they are applied to real data. Because
factor analysis has the ability to reduce an unwieldy set of variables to a much smaller set of factors, it is
suitable for simplifying complex models. Factor analysis also has a confirmatory use, in which the
researcher can develop a set of hypotheses regarding how variables in the data are related. The
researcher can then run factor analysis on the data set to confirm or deny these hypotheses. Cluster
analysis, on the other hand, is suitable for classifying objects according to certain criteria. For example,
a researcher can measure certain aspects of a group of newly-discovered plants and place these plants
into species categories by employing cluster analysis.

Read more : http://www.ehow.com/info_8175078_difference-between-cluster-factor-analysis.html

(b) The “Rotated Factor Pattern” table as obtained from a Factor Analysis conducted using SAS
EG is provided below:

Rotated Factor Pattern

Factor1 Factor2 Factor3 Factor4 Factor5

Availability 0.875 -0.192 0.191 0.107 0.096


Variety 0.803 -0.271 -0.022 -0.007 -0.012

Easy Care 0.784 0.301 0.005 -0.021 -0.050

Price 0.781 0.243 0.025 -0.188 0.054

Ads -0.068 0.834 -0.057 0.025 -0.002

Fit 0.343 0.510 0.096 -0.097 0.325

Celebendorse 0.456 -0.210 0.725 -0.014 0.056

Salespromotio 0.008 0.455 0.665 0.069 -0.081

Style 0.402 0.162 -0.578 0.474 -0.112

Personality -0.170 0.183 0.097 0.753 -0.102

Brand 0.048 -0.231 -0.148 0.694 0.210

Colour 0.034 0.004 0.050 0.021 0.912

Different -0.069 0.467 -0.477 0.150 0.564

i. What does each entry in the table indicate?

Factor Loadings, i.e., the correlation between the particular variable and the
particular Factor.
ii. Between Factor 2 and Factor 4, which factor is more important? Justify with

indicators and computations.

Compute the variance explained by Factor 2 (i.e., sum of squares of factor


loadings under Factor 2 column) and Factor 4 (i.e., sum of squares of factor
loadings under Factor 4 column).
iii. How much of variability in “Variety” is captured by Factors 1 to 5?

Sum of squares of Factor Loadings against the “Variety” row for all Factors.
iv. What is the general criterion for identifying variables explaining each factor?

Identify the variables explaining each of the factors from the given table.

Criterion: Variables with higher Factor Loadings, say above 0.6. Factor
Loadings should be at least 0.4 for consideration.
Factor 1: Availability, Variety, Easy Care & Price
Factor 2: Ads & Fit
Factor 3: Celebendorse, Salespromotion & Style (-ve influence for style)
Factor 4: Personality & Brand
Factor 5: Colour & Different
v. Name each Factor for this output.

Suitable nomenclatures. They may vary from person to person.

(1+4+2+3+1=11)  

8.  The Gorman Manufacturing Company must decide whether to manufacture a component part 
at its Michigan plant or purchase the part from a supplier. The resulting profit is dependent 
upon the demand for the product. The following payoff table shows the projected profit (in $ 
000’s): 
Decision Alternative  State of Nature 
Low Demand (s1)  Medium Demand (s2)  High Demand (s3) 
Manufacture (d1)  ‐20  40  100 
Purchase (d2)  10  45  70 
 

The state of nature probabilities are: P(s1) = 0.35 ,  P(s2) = 0.35  and  P(s31) = 0.30 

a. What should Gorman Manufacturing Company do – manufacture or purchase? 
 
Decision  State of Nature  EMV 
Alternative/  Low Demand (s1)  Medium Demand (s2)  High Demand (s3) 
Probability  (0.35)  (0.35)  (0.30) 
Manufacture (d1)  ‐20  40  100  37 
Purchase (d2)  10  45  70  40.25 
Conclusion: PURCHASE 
 
b. Should Gorman attempt to obtain a better estimate of demand? (Hint: Use EVPI.) 
Here,            EVwoPI   =   40.25    &      EVwPI    =    49.25           Hence, EVPI = 9 
Gorman should attempt to obtain a better estimate of demand if obtaining it costs less than 
$9000. 
c. A test market study of the potential demand for the product is expected to report either a 
favorable (F) or unfavorable (U) condition. The relevant conditional probabilities are as 
follows: 
P(F/s1) = 0.10                                 P(U/s1) = 0.90                 
P(F/s2) = 0.40                                 P(U/s2) = 0.60                 
P(F/s3) = 0.60                                 P(U/s3) = 0.40  
What is the probability that the market research report will be favorable? What is the 
probability that the market research report will be unfavorable? 
 
For FAVORABLE Condition (F) 
State of Nature  P(si)  P(F given si)  P(F & si)  P(si given F) 
s1  0.35  0.1  0.035  0.0986 
s2  0.35  0.4  0.14  0.3944 
s3  0.30  0.6  0.18  0.5070 
      P(F) = 0.355  1 
 
For UNFAVORABLE Condition (U) 
State of Nature  P(si)  P(U given si)  P(U & si)  P(si given U) 
s1  0.35  0.9  0.315  0.488 
s2  0.35  0.6  0.21  0.325 
s3  0.30  0.4  0.12  0.186 
      P(F) = 0.645  1 
 
 
d. What should be Gorman’s optimal decision strategy with market study?  Use decision tree 
to answer.  
A decision tree can be drawn with the above revised posterior probabilities for different states of 
nature under Favorable and Unfavorable market study conditions. To go for the market study or 
not also can be brought into the tree.  
 
Assuming the market study is taken up with the given probabilities, the CONCLUSION is to go for 
manufacturing decision as the projected profit would be $64504. 
                                                                                                                                           (1+1+3+5= 10)                
 

9. Given below is a Linear Programming Problem.

Maximize Z = 5 X1 + 3 X2
Subject to
4 X1 + 2 X2 <= 8
X1 >= 4
X2 >= 6
And X1 , X2 >= 0

Plot the problem on a graph sheet and comment on the feasible region. What is the
solution to the problem?
(5)
ANSWER R: Graph can c be plottted. There is no comm
mon feasiblle region satisfying alll the
constrainnts. Infeasiblle situation.

10. A company manufactures


m s four variannts of the sam
me product and
a in the finnal part of thhe
m
manufacturin ng process thhere are assem mbly, polishhing and packking operations. For eacch
variant the tim
me required for these operations is shhown below w (in minutess) as is the prrofit
per unit sold.

Assem
mbly Polissh Pack Profit(£)
Variant1 2 3 2 1.50
2 4 2 3 2.50
3 3 3 2 3.00
4 7 4 5 4.50

Given thee current staate of the labbor force the company esstimate that, each year, thhey have 100000
minutes ofo assembly time, 500000 minutes off polishing tim me and 600000 minutes ofo packing tiime
availablee.
a Given thaat the compaany wishes too optimize itts profits, foormulate the above as a
a.
Linear Prrogramming Problem (L LPP).
(3)
Maximize 1.5x1 + 2.5x2 + 3.0x3 + 4.5x4
subject to
2x1 + 4x2 + 3x3 + 7x4 <= 100000 (assembly) 


3x1 + 2x2 + 3x3 + 4x4 <= 50000 (polish) 


2x1 + 3x2 + 2x3 + 5x4 <= 60000 (pack)

x1, x2, x3, x4 >= 0

b. Suppose the company is free to decide how much time to devote to each of the three
operations (assembly, polishing and packing) within the total available time of
210000 (= 100000 + 50000 + 60000) minutes. Reformulate the LPP for this changed
scenario. (2)

Maximize 1.5x1 + 2.5x2 + 3.0x3 + 4.5x4

subject to

(2x1 + 4x2 + 3x3 + 7x4) + (3x1 + 2x2 + 3x3 + 4x4) + (2x1 + 3x2 + 2x3 + 5x4 ) <= 210000

i.e., 7x1 + 9x2 + 8x3 + 16x4 <= 210000

x1, x2, x3, x4 >= 0

c. For the LPP formulated under (a), Excel Solver output is given below:

Target Cell (Max) 
Original 
Cell  Name  Value  Final Value 
$F$4  Profit Total  58000 58000

Adjustable Cells 
Original 
Cell  Name  Value  Final Value 
$B$3  Variant 1  0 0
$C$3  Variant 2  16000 16000
$D$3  Variant 3  6000 6000
$E$3  Variant 4  0 0
Constraints 
Cell  Name  Cell Value  Formula  Status  Slack 
$F$5  Assembly Total  82000 $F$5<=$G$5  Not Binding  18000 
$F$6  Polishing Total  50000 $F$6<=$G$6  Binding  0 
$F$7  Packing Total  60000 $F$7<=$G$7  Binding  0 

Sensitivity Report
Adjustable Cells 
      Final  Reduced Objective  Allowable  Allowable 
Cell  Name  Value  Cost  Coefficient Increase  Decrease 
$B$3  Variant 1  0  ‐1.5 1.5 1.5  1E+30 
$C$3  Variant 2  16000  0 2.5 2  0.142857143 
$D$3  Variant 3  6000  0 3 0.75  0.5 
$E$3  Variant 4  0  ‐0.2 4.5 0.2  1E+30 

Constraints 
      Final  Shadow  Constraint  Allowable  Allowable 
Cell  Name  Value  Price  R.H. Side  Increase  Decrease 
$F$5  Assembly Total  82000  0 100000 1E+30  18000 
$F$6  Polishing Total  50000  0.8 50000 40000  10000 
$F$7  Packing Total  60000  0.3 60000 15000  26666.66667 

Answer the following questions based on the above output:

i. How many units of each variant should the company produce per year and what
is the associated profit?
V1 = 0, V2 = 16000, V3 = 6000, V4 = 0, Z = 58000 pounds
ii. Which are the processes on which time availability are exhausted?
Polish & Pack.
iii. By how many pounds should the unit profit of Variant 1 increase so that it
becomes profitable to produce Variant 1?
By at least 1.5 pounds, other things remaining unchanged.
iv. In the constraints table under the Sensitivity Report, the ‘allowable increase’ and
the ‘allowable decrease’ are indicated as 40000 and 10000 respectively against
total polish. What do they indicate?
The optimum solution will not change as long as the availability of polish time
lies between 40000 (=50000 – 10000) and 90000 (= 50000 + 40000), other
things remaining unchanged.
v. In continuation of (iv) above, what does the value of 0.8 as shadow price against
‘total polish’ indicate?
Other things remaining unchanged, an increased availability of one minute on
polishing time will generate additional revenue of 0.8 pounds.

     ( 1 * 5 = 5)

    

Vous aimerez peut-être aussi