Académique Documents
Professionnel Documents
Culture Documents
No extra sheets will be provided. Keep in mind good time and space
management.
Wherever relevant and not given with the question, you may take
level of significance as 5 percent.
Set A ‐ Main
1. a. Differentiate exploratory research and descriptive research.
b. Differentiate observation method and focus group discussion method in the context of data
collection.
(5+5=10)
ANSWER:
a. P.21 & 22 in Text Book.
b. P.188‐189 & 84 in Text Book
2. a. Differentiate completely randomized design and randomized block design.
b. Differentiate one‐way ANOVA and two‐way ANOVA.
(5+5=10)
ANSWER:
a. P.497 & 514 in ASW Text Book.
b. P.502 & 517 in ASW Text Book
3. a. Differentiate Likert scale and Semantic differential scale used in attitude measurement.
b. What is survey research? Differentiate cross‐sectional study and longitudinal study.
(5+5=10)
ANSWER:
a. P.270‐271 in Text Book.
b. P. 142 in Text Book
4. a. Specify the four keys assumptions of Linear Regression Model. (4 marks)
ANSWER: i. The mean of error term (random variable) is zero.
ii. The variance of error term is constant.
iii. The error terms are uncorrelated.
iv. The error term is normally distributed.
c. Should we use R‐square or adjusted R‐square when judging for the predictive ability of the
regression model? (1 mark)
ANSWER: Use Adjusted R‐square as it ensures that all included variables have significant
contribution to prediction.
d. Suppose you wish to include a categorical variable with three levels as explanatory variable
in a regression model. How would you include the variable in the regression model?
(2 marks)
ANSWER:
We should split the category variable X into two dummy variables, X1 and X2 defined as
follows:.
X1 = 1 whenever level 1 occurs
= 0 otherwise
And
X2 = 1 whenever level 2 occurs
= 0 otherwise
Note that inclusion of an X3 for level 3 occurrences on similar lines will lead to perfect
multicollinearity & hence should be avoided.
Also, level 3 occurrence is captured when X1 = 0 & X2 = 0.
Answer (d) to (g), based on the SAS output below:
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 2.29579 0.76526 21.31 0.0001
Error 10 0.35916 0.03592
Corrected Total 13 2.65495
Parameter Estimates
Parameter Standard Standardized Variance
Variable DF Estimate Error t Value Pr > |t| Estimate Tolerance Inflation
Intercept 1 -0.05535 1.12839 -0.05 0.9618 0 . 0
X1 1 0.27602 0.12129 2.28 0.0461 0.33165 0.63692 1.57005
X2 1 0.44687 0.11107 4.02 0.0024 0.49401 0.89732 1.11443
X3 1 0.27028 0.09527 2.84 0.0176 0.40332 0.66935 1.49398
Test of First and Second
Moment Specification
DF Chi-Square Pr > ChiSq
9 5.25 0.8116
e. Based on the output above, comment on the model significance (specify the hypothesis and
the reason). (2 marks)
ANSWER: Model is significant since p‐value (.0001) associated with F statistic is less than
alpha (.05). The null hypothesis for this test is that all the coefficients/parameters are
simultaneously equal to zero.
f. Based on the output above, comment on the significance of the individual independent
variables (specify the hypothesis and the reason). (2 marks)
ANSWER: We can answer this by taking a look at the p‐values associated with t‐statistic for
each independent variable. All the three variables x1, x2, x3 have significant influence on
dependent variable (p‐values .04, .002, .017 respectively are all less than alpha of .05). The
null hypothesis for each test being that the coefficient of the particular variable (parameter)
is equal to zero.
g. Indicate the criteria used and compare the relative influence of the independent variables.
(2 marks)
ANSWER: By taking a look at the standardized estimates one can infer that the variable X2 is
having the maximum influence on the dependent variable, followed by X3 and X1.
h. What is multicollinearity? Does the output above indicate any presence of multicollinearity
(specify the reason). (2 marks)
ANSWER: Presence of high correlation between independent variables is termed as
multicollinearity. All VIF values in the output are less than 10, which indicate that
multicollinearity is not a problem in this data/output.
5. A retail company has segmented its customers into two categories according to their
loyalty categories card membership status viz. (i) loyalty card holders, (ii) loyalty card
non holders. The retailer wants to better understand the difference between these two
groups. A random sample of 200 of the retailer’s customers has recently participated in a
survey. The questionnaire included several questions about the customers’ perception of
the loyalty card such as the ease of usage, customer service, monetary benefits, emotional
attachment with the retailer and new product offerings. The responses were elicited on a
10 point Likert scale ranging from “Strongly disagree” to “Strongly agree”. The retail
company is interested in gaining a deeper knowledge about the attribute which
differentiate between a loyalty card holder vis-à-vis a loyalty card non holder. Data
collected was analyzed using SAS Enterprise Guide 4.2 which revealed the following
tables:
Null Hypothesis: In the population, the means of the discriminant functions in both
the selected groups are equal is rejected as indicated by Table 2 in which the p value of
Wilks’ Lambda is found to be significant.
Based on Table 1, it can be concluded that ‘ease of use’ and ‘monetary benefits’
differentiate between loyalty card holders versus non holders.
c) Provide the values of x and y in Table 3. Comment on the validity of the discriminant
function.
x= 93; y=69. So hit ratio = (93+69)/148 = 162/200 = 0.81 which provides the predictive
accuracy of the discriminant function and indicates that 81 % of the cases were
correctly classified by the discriminant function. Since, the discriminant function
considerably improves the classification of the customers into groups compared to
apriori classification. The validity of the discriminant analysis can be deemed as
satisfactory.
d) Mr. Vishal Jain, a new customer, has the following perceptions on the response variables:
Based on the above responses, can we consider him as a potential loyalty card holder?
Justify.
(2+2+2+2= 8 marks)
Variable Responses LoyaltyLoyalty
Card Card
Holders Non
Holders Score 1 Score 2
Constant -48.61049 -50.91007 -48.61049 -50.91
Ease of Usage 9 5.09524 4.15141 45.85716 37.36269
Customer Service 1 -4.44927 -4.5397 ‐4.44927 ‐4.5397
Monetary Benefits 5 1.54891 2.82456 7.74455 14.1228
Emotional Attachment with Retailer 5 13.79229 13.62917 68.96145 68.14585
New Products Offerings 7 1.81431 1.75714 12.70017 12.29998
82.20357 76.48155
Yes, Mr. Vishal Jain is a potential loyalty card holder based on the discriminant scores.
6. Organiz Retail Stores is part of Delhi based diversified “Vedkrishna group”. It operates
a Chain of grocery stores which cater to the everyday needs of customers, under the
brand name “Door-to-door”. Recently, company decided to expand its operations into
Hyderabad. Before launching its grocery store in Hyderabad, the company decided to
analyze consumers’ attitude towards shopping. Based on past research, six variables
were identified. A survey was conducted among consumers. During this survey
consumers were asked to express their degree of agreement with the following six
statements measured on a 7-point Likert type scale (1 = totally disagree, 7 = totally
agree).
Data was obtained from a sample of 20 respondents. The company conducted a series of
analysis using SAS EG, such as cluster, frequency analysis, and discriminant analysis. The
following are the selected SAS EG cluster analysis output.
i
e
19 OB9
O OB14 2 0.0081 0.992
18 OB15
O OB16 2 0.0084 0.983
17 OB8
O OB13 2 0.0114 0.972
16 OB7
O OB11 2 0.0141 0.958
15 OB2
O OB18 2 0.0182 0.94
14 CL17
C OB19 3 0.02 0.92
13 CL19
C OB12 3 0.0204 0.899
12 OB10
O OB20 2 0.0234 0.876
11 CL15
C OB5 3 0.0251 0.851
10 OB4
O CL
L16 3 0.0285 0.822
9 OB3
O CL
L18 3 0.0374 0.785
8 OB1
O CL
L13 4 0.0454 0.74
7 CL9
C OB17 4 0.0501 0.689
6 CL8
C CL
L10 7 0.0508 0.639
5 CL11
C OB6 4 0.0612 0.577
4 CL14
C CL
L12 5 0.0949 0.483
3 CL5
C CL
L7 8 0.1104 0.372
2 CL6
C CL
L4 12 0.1231 0.249
1 CL2
C CL
L3 20 0
0.249 0
b. Looking at the output, how many clusters will you recommend to the company? Justify
your answer.
Based on SPRSQ, one can take a call at 5 clusters. But keeping in mind that there are
only 20 observations / objects, one may take a call at 4 or 3 clusters.
Alternatively, in dendrogram drawing an imaginary line (here horizontal) will help the
researcher to identify number of cluster solutions. Here we can go with three cluster solutions.
(2+2+3=7 marks)
7.
(a) In what ways, factor analysis is different from cluster analysis? Explain with an example.
(4)
Example: the Elan Case or Corpus Electronics Case can be looked from this
perspective.
Objective: Cluster analysis and factor analysis have different objectives. The usual objective of factor
analysis is to explain correlation in a set of data and relate variables to each other, while the objective of
cluster analysis is to address heterogeneity in each set of data. In spirit, cluster analysis is a form of
categorization, whereas factor analysis is a form of simplification.
Complexity: Complexity is one question on which factor analysis and cluster analysis differ: data size
affects each analysis differently. As the set of data grows, cluster analysis becomes computationally
intractable. This is true because the number of data points in cluster analysis is directly related to the
number of possible cluster solutions. For example, the number of ways to divide twenty objects into 4
clusters of equal size is over 488 million. This makes direct computational methods, including the
category of methods to which factor analysis belongs, impossible.
Solution: Even though the solutions to both factor analysis and cluster analysis problems are subjective
to some degree, factor analysis allows a researcher to yield a "best" solution, in the sense that the
researcher can optimize a certain aspect of the solution (orthogonality, ease of interpretation and so on).
This is not so for cluster analysis, since all algorithms that could possibly yield a best cluster analysis
solution is computationally inefficient. Hence, researchers employing cluster analysis cannot guarantee
an optimal solution.
Applications: Factor analysis and cluster analysis differ in how they are applied to real data. Because
factor analysis has the ability to reduce an unwieldy set of variables to a much smaller set of factors, it is
suitable for simplifying complex models. Factor analysis also has a confirmatory use, in which the
researcher can develop a set of hypotheses regarding how variables in the data are related. The
researcher can then run factor analysis on the data set to confirm or deny these hypotheses. Cluster
analysis, on the other hand, is suitable for classifying objects according to certain criteria. For example,
a researcher can measure certain aspects of a group of newly-discovered plants and place these plants
into species categories by employing cluster analysis.
(b) The “Rotated Factor Pattern” table as obtained from a Factor Analysis conducted using SAS
EG is provided below:
Factor Loadings, i.e., the correlation between the particular variable and the
particular Factor.
ii. Between Factor 2 and Factor 4, which factor is more important? Justify with
Sum of squares of Factor Loadings against the “Variety” row for all Factors.
iv. What is the general criterion for identifying variables explaining each factor?
Identify the variables explaining each of the factors from the given table.
Criterion: Variables with higher Factor Loadings, say above 0.6. Factor
Loadings should be at least 0.4 for consideration.
Factor 1: Availability, Variety, Easy Care & Price
Factor 2: Ads & Fit
Factor 3: Celebendorse, Salespromotion & Style (-ve influence for style)
Factor 4: Personality & Brand
Factor 5: Colour & Different
v. Name each Factor for this output.
(1+4+2+3+1=11)
8. The Gorman Manufacturing Company must decide whether to manufacture a component part
at its Michigan plant or purchase the part from a supplier. The resulting profit is dependent
upon the demand for the product. The following payoff table shows the projected profit (in $
000’s):
Decision Alternative State of Nature
Low Demand (s1) Medium Demand (s2) High Demand (s3)
Manufacture (d1) ‐20 40 100
Purchase (d2) 10 45 70
The state of nature probabilities are: P(s1) = 0.35 , P(s2) = 0.35 and P(s31) = 0.30
a. What should Gorman Manufacturing Company do – manufacture or purchase?
Decision State of Nature EMV
Alternative/ Low Demand (s1) Medium Demand (s2) High Demand (s3)
Probability (0.35) (0.35) (0.30)
Manufacture (d1) ‐20 40 100 37
Purchase (d2) 10 45 70 40.25
Conclusion: PURCHASE
b. Should Gorman attempt to obtain a better estimate of demand? (Hint: Use EVPI.)
Here, EVwoPI = 40.25 & EVwPI = 49.25 Hence, EVPI = 9
Gorman should attempt to obtain a better estimate of demand if obtaining it costs less than
$9000.
c. A test market study of the potential demand for the product is expected to report either a
favorable (F) or unfavorable (U) condition. The relevant conditional probabilities are as
follows:
P(F/s1) = 0.10 P(U/s1) = 0.90
P(F/s2) = 0.40 P(U/s2) = 0.60
P(F/s3) = 0.60 P(U/s3) = 0.40
What is the probability that the market research report will be favorable? What is the
probability that the market research report will be unfavorable?
For FAVORABLE Condition (F)
State of Nature P(si) P(F given si) P(F & si) P(si given F)
s1 0.35 0.1 0.035 0.0986
s2 0.35 0.4 0.14 0.3944
s3 0.30 0.6 0.18 0.5070
P(F) = 0.355 1
For UNFAVORABLE Condition (U)
State of Nature P(si) P(U given si) P(U & si) P(si given U)
s1 0.35 0.9 0.315 0.488
s2 0.35 0.6 0.21 0.325
s3 0.30 0.4 0.12 0.186
P(F) = 0.645 1
d. What should be Gorman’s optimal decision strategy with market study? Use decision tree
to answer.
A decision tree can be drawn with the above revised posterior probabilities for different states of
nature under Favorable and Unfavorable market study conditions. To go for the market study or
not also can be brought into the tree.
Assuming the market study is taken up with the given probabilities, the CONCLUSION is to go for
manufacturing decision as the projected profit would be $64504.
(1+1+3+5= 10)
Maximize Z = 5 X1 + 3 X2
Subject to
4 X1 + 2 X2 <= 8
X1 >= 4
X2 >= 6
And X1 , X2 >= 0
Plot the problem on a graph sheet and comment on the feasible region. What is the
solution to the problem?
(5)
ANSWER R: Graph can c be plottted. There is no comm
mon feasiblle region satisfying alll the
constrainnts. Infeasiblle situation.
Assem
mbly Polissh Pack Profit(£)
Variant1 2 3 2 1.50
2 4 2 3 2.50
3 3 3 2 3.00
4 7 4 5 4.50
Given thee current staate of the labbor force the company esstimate that, each year, thhey have 100000
minutes ofo assembly time, 500000 minutes off polishing tim me and 600000 minutes ofo packing tiime
availablee.
a Given thaat the compaany wishes too optimize itts profits, foormulate the above as a
a.
Linear Prrogramming Problem (L LPP).
(3)
Maximize 1.5x1 + 2.5x2 + 3.0x3 + 4.5x4
subject to
2x1 + 4x2 + 3x3 + 7x4 <= 100000 (assembly)
b. Suppose the company is free to decide how much time to devote to each of the three
operations (assembly, polishing and packing) within the total available time of
210000 (= 100000 + 50000 + 60000) minutes. Reformulate the LPP for this changed
scenario. (2)
subject to
(2x1 + 4x2 + 3x3 + 7x4) + (3x1 + 2x2 + 3x3 + 4x4) + (2x1 + 3x2 + 2x3 + 5x4 ) <= 210000
c. For the LPP formulated under (a), Excel Solver output is given below:
Target Cell (Max)
Original
Cell Name Value Final Value
$F$4 Profit Total 58000 58000
Adjustable Cells
Original
Cell Name Value Final Value
$B$3 Variant 1 0 0
$C$3 Variant 2 16000 16000
$D$3 Variant 3 6000 6000
$E$3 Variant 4 0 0
Constraints
Cell Name Cell Value Formula Status Slack
$F$5 Assembly Total 82000 $F$5<=$G$5 Not Binding 18000
$F$6 Polishing Total 50000 $F$6<=$G$6 Binding 0
$F$7 Packing Total 60000 $F$7<=$G$7 Binding 0
Sensitivity Report
Adjustable Cells
Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease
$B$3 Variant 1 0 ‐1.5 1.5 1.5 1E+30
$C$3 Variant 2 16000 0 2.5 2 0.142857143
$D$3 Variant 3 6000 0 3 0.75 0.5
$E$3 Variant 4 0 ‐0.2 4.5 0.2 1E+30
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$F$5 Assembly Total 82000 0 100000 1E+30 18000
$F$6 Polishing Total 50000 0.8 50000 40000 10000
$F$7 Packing Total 60000 0.3 60000 15000 26666.66667
i. How many units of each variant should the company produce per year and what
is the associated profit?
V1 = 0, V2 = 16000, V3 = 6000, V4 = 0, Z = 58000 pounds
ii. Which are the processes on which time availability are exhausted?
Polish & Pack.
iii. By how many pounds should the unit profit of Variant 1 increase so that it
becomes profitable to produce Variant 1?
By at least 1.5 pounds, other things remaining unchanged.
iv. In the constraints table under the Sensitivity Report, the ‘allowable increase’ and
the ‘allowable decrease’ are indicated as 40000 and 10000 respectively against
total polish. What do they indicate?
The optimum solution will not change as long as the availability of polish time
lies between 40000 (=50000 – 10000) and 90000 (= 50000 + 40000), other
things remaining unchanged.
v. In continuation of (iv) above, what does the value of 0.8 as shadow price against
‘total polish’ indicate?
Other things remaining unchanged, an increased availability of one minute on
polishing time will generate additional revenue of 0.8 pounds.
( 1 * 5 = 5)