Vous êtes sur la page 1sur 13

S.Y. B.Sc. (IT) : Sem.

IV
Computer Oriented Statistical Techniques
Time : 2½ Hrs.] Prelim Question Paper Solution [Marks : 75

Q.1 Attempt any THREE of the following : [15]


Q.1(a) Compute the median. [5]
Size 5 6 7 8 9 10 11 12 13
Frequency 48 52 56 60 63 57 55 50 52
Ans.:
Cumulative N 1
th
Size Frequency Here, median is the value of  
frequency  2 
5 48 48 th
 493  1 
6 52 100 =   = 247th item
 2 
7 56 156
So from the table we observe the cumulative
8 60 216
frequency greater than 247 is 279.
9 63 279
Hence the value corresponding to 279 is 9.
10 57 336 Therefore, Median value would be 9.
11 55 391
12 50 441
13 52 493
Total N = 493

Q.1(b) Define Geometric Mean. How to calculate it? [5]


Find the G.M. of 250, 12, 4.5, 119.5, 42, 35.4, 75, 30
Ans.: The geometric mean G of a set ‘n’ positive numbers X1, X2, , Xn is the nth root of the
product of the numbers given as
G = n X1 X2 ....Xn
Calculation of geometric mean is done with the help of calculator or sometimes with
logarithms.
Taking log both sides in the formula of geometric mean.
We get, log G = log (X1 X2  Xn)1/n
1
log G = (log X1 + log X2 +  + log Xn)
n
  log x 
G = Antilog  
 n 
For given 8 numbers Geometric mean can be calculated as
8
G = 250  12  4.5  119.5  42  35.4  75  30 = 39.04

Q.1(c) From the following data, calculate Q1 and Q3. [5]


Marks more than 10 20 30 40 50 60 70
No. of Students 12 30 54 76 91 101 102
Ans.:
Class Interval Frequency Cumulative Frequency
10-20 12 12
20-30 30 42
30-40 54 96
40-50 76 172
50-60 91 263
60-70 101 364
70-80 102 466
Total 466

-1-
Vidyalankar : S.Y. B.Sc. (IT)  COST

N 466
To calculate Q1, we need value i.e. = 116.5
4 4
Since this value lies in between 96 and 172, we select 40-50 class interval for computation
of Q1
N 
  (cf) 
Formula to compute, Q1 = L1 +  4  c
 f 
Where, L1 : lower limit of selected class interval
cf: value of cumulative frequency above selected class
f : frequency of selected class interval
c : Class size
 466 
  (96) 
Therefore, Q1 = 40 +  4   10 = 42.69
 76 
3N 3  466
To calculate Q3, we need value i.e. = 349.5
4 4
Since this value lies in between 263 and 364, we select 60-70 class interval for computation
of Q3
 3N 
  (cf) 
Formula to compute, Q3 = L1 +  4  c
 f 
Where, L1 : lower limit of selected class interval
cf : value of cumulative frequency above selected class
f : frequency of selected class interval
c : Class size
 349.5  (263) 
Therefore, Q3 = 60 +    10 = 68.56
 101 

Q.1(d) Calculate the standard deviation of the heights of 10 students gives as, [5]
Height
161 162 160 163 160 163 164 164 170 164
(in cms)
Ans.: First we will find mean of given data,
161  162  160  163  160  163  164  164  170  164
X = = 163.1
10
Standard deviation,
(X  X)2
S =
10
(161  163.1)2  (162  163.1)2  .......  (164  163.1)2
= = 2.7367
10

Q.1(e) Find the quartile deviation for the following data. [5]
Marks, x 5 10 15 20 25 30
No. of Student, f 2 3 8 7 6 4
Ans.:
Marks x No of Cumulative
students frequency
5 2 2
10 3 5
15 8 13
20 7 20
25 6 26
30 4 30

-2-
Prelim Question Paper Solution

th
N 1 th
To find Q1 we need   observation i.e. 7.75 observation. Cumulative frequency more
 4 
than 7.75 is 13, therefore marks corresponding to that is value of Q1 = 15.
th
N 1 th th
Now to find Q3, we need 3    observation i.e. 3  7.75 = 23.25 observation.
 4 
Cumulative frequency more than 23.25 is 26, therefore marks corresponding to that is value
of Q3 = 25.
Q  Q1 25  15
Therefore, Quartile Deviation = 3 = =5
2 2

Q.1(f) Define Factors and Data Frames in ‘R’. How to create them in ‘R’? [5]
Ans.: Factors are the data objects which are used to categorize the data and store it as levels.
They can store both strings and integers. They are useful in the columns which have a
limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in
data analysis for statistical modelling.
Factors are created using the factor () function by taking a vector as input.
Example : data = c("East", "West", "south", "North")
factdata = factor(data)
print(factdata)

A data frame is a table or a two-dimensional array-like structure in which each column


contains values of one variable and each row contains one set of values from each column.

Data frames are created in R as follows:


emp.data=data.frame(emp_id=c (1:3), emp_name= c("Rick","Dan","Michelle"),salary =
c(1000,1500,2000))
print(emp.data)

Q.2 Attempt any THREE of the following: [15]


Q.2(a) Explain the Relation between Raw Moments and Central Moments. [5]
Ans.: The rth moment about the mean x (central moments) is defined as
N

 Xj  X 
r

j 1
  X  X
r

mr = = = (X  X)r
N N
The rth moment about any origin A (raw moments) is defined as
N

 Xj  A
r

j 1
   X  A
r

mr = = = (X  A)r
N N
The relation between raw moments and central moments is given by:
m2 = m2  m
m3 = m3  3m1m2  2m13
m4 = m4  4m1m3 6m12m2  3m14

Q.2(b) Define Skewness. Compute Coefficient of Skewness for the following observations [5]
2, 3, 5, 7, 4, 8, 1.
Ans.: Skewness is the degree of asymmetry of a distribution. If the frequency curve of a
distribution has a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right, or to have positive skewness.
If the reverse is true, it is said to be skewed to the left, or to have negative skewness.
3  Mean - Median 
Pearson’s 2nd coefficient of skewness =
 Standard deviation 

-3-
Vidyalankar : S.Y. B.Sc. (IT)  COST

For given data,


(2 + 3 + 5 + 7 + 4 + 8 + 1) 30
Mean X = = = 4.28
7 7
Median of given data is 4
Standard deviation,
2 - 4.28  + 3 - 4.28  + 5 - 4.28  +  7.4.28 +  4 - 4.28 + 8 - 4.28  + 1 - 4.28
2 2 2 2 2 2 2

= = 2.37
7
3  4.28 - 4 
Coefficient of skewness = = 0.3544
2.37

Q.2(c) A random variable X has the following probability distribution values of X [5]
X 0 1 2 3 4 5 6 7
P(X) 0 k 2k 2k 3K k2 2k2 7k2 + k
Find (i) k (ii) P(X<6) (iii) P(X6) (iv) P(0<X<5)
Ans.: We know that   x  =1
Here adding all the probabilities, we will get,
1
10k2  9k = 1  k= or k  1
10
but k = 1 is not possible since 0 < k < 1
1
 k=
10

Therefore, the given probability distribution can be rewritten as,


X 0 1 2 3 4 5 6 7
P(X) 0 1/10 2/10 2/10 3/10 1/100 2/100 17/100

1
(i) k =
10
(ii) P(X < 6) = P(X = 0) + P(X = 1) +  + P(X = 7)
1 2 2 3 1 81
=0      =
10 10 10 10 100 100
2 17 19
(iii) P(X  6) = (P(X = 6) + P(X = 7) =  =
100 100 100
(iv) P(0 < X < 5) = P(X = 1) +  + P(X = 4)
1 2 2 3 8
=    =
10 10 10 10 10

Q.2(d) The data from a survey of 140 students showed that 37 study Music, 103 play a [5]
sport and 25 do neither. Create a Venn diagram to illustrate the data collected
and then determine the probability that if a student is selected at random : (i)
He or she will study music, (ii) He or she will study music given that he or she
play sport.
Ans.: Let M represent the set of students who study music and S represent the set of students
who play sports. First let’s determine the number of students that study music and play a
sport to fill in the overlapping region in the diagram and then we can find the other values.
n(M) +n(S)  n(M  S) = n(M  S)
37+ 103 n(M  S) = 115
n(M  S) = 25

(i) The probability that a randomly selected student studies music is the number of
students who study music divided by the total numbers of students surveyed.

-4-
Prelim Question Paper Solution

n M  37
P(M) = =
140 140
(ii) The probability that a randomly selected student will study music given that he/she
plays a sport is
n(MS) 25
P(MS) = =
n S 103

Q.2(e) It has been found that 2% of the tools produced by a certain machine are [5]
defective. What is the probability that in a shipment of 400 such tools :
(i) 3% or more, (ii) 2% or less, will prove defective?
(0.02)(0.98) pq
Ans.: p = p = 0.02 and P = =
= 0.007
N 400
(i) Using the correction for discrete variables,
1 1
= = 0.00125
2N 800
We have,
= (0.03  0.000125) in standard units
0.03  0.00125  0.02
= = 1.25
0.007
Required probability = (area under normal curve to right of z = 1.25)
= 0.1056
0.02  0.00125  0.02
(ii) (0.02 + 0.00125) in standard units = 0.18
0.007
Required probability = (area under normal curve to left of z = 0.18)
= 0.50 + 0.0714 = 0.5714

Q.2(f) The electric light bulbs of manufacturer A have a mean lifetime of 2800 hours [5]
with a standard deviation of 400 hours. While those of manufacturer B have a
mean lifetime of 2400 hours with standard deviation of 200 hours. If random
samples of 250 bulbs of each brand are tested, what is the probability that the
brand A bulbs will have a mean lifetime that is at least, (i) 320 hours, (ii) 500
hours more than the brand B bulbs?
Ans.: Let XA and XB denote the mean lifetimes of samples A and B respectively,
then, X = X  X = 2800  2400 = 400 hr
A XB A B

2A 2B 4002 2002


and X =  =  = 28.28 hr
A XB NA NB 250 250
The standardized variable for the difference in means is

z =
X A
 XB  200
28.28

320  400
(i) The difference 320 hrs in standard units is = 2.8288
28.28
Thus, Required probability
= (area under normal curve to right of z =  2.8288)
= 0.5 + 0.4946 = 0.9976

500  400
(ii) The difference 500 hrs in standard units is =3.53
28.28
Thus, Required probability
= (area under normal curve to the right of z = 3.53)
= 0.5  0.4998 = 0.0002

-5-
Vidyalankar : S.Y. B.Sc. (IT)  COST

Q.3 Attempt any THREE of the following: [15]


Q.3(a) In measuring reaction time, a psychologist estimates that the standard deviation [5]
is 0.05 seconds (s). How large a sample of measurements must he take in order
to be (i) 95% and (ii) 99% confident that the error of his estimate will not
exceed 0.01 s?
 
Ans.: (i) The 95% confidence limits are X  1.96 where 1.96 is the error of the estimate.
N N

This error will be less than 0.01 if 1.96  0.01
N
0.05 0.05
 1.96  0.01  1.96  N
 9.8 N
N 0.01
 96.04N
Thus we can be 95% confident that the error of the estimate will be less than 0.01
seconds if N is 97 or larger.


(ii) The 99% confidence limits are X  2.58 error will be less than 0.01
N
 0.05
if 2.58  0.01  2.58  N  166.4 < N
N 0.01
Thus we can be 99% confident that the error of the estimate will be less than 0.01
seconds if N is 167 or larger.

Q.3(b) A measurement was recorded as 216.480 grams (g) with a probable error of [5]
0.272 g. What are the 95% confidence limits for the measurement?
Ans.: The probable error is 0.272
Now, 0.272 = 0.6745 X
0.272
 X =  X = 0.4033
0.6745
Thus the 95% confidence limits are X1.96X
 216.480  1.96 (0.4033)
 216.480  0.79
 The confidence limits are (215.69, 217.27)

Q.3(c) A sample poll of 100 voters chosen at random from all voters in a given district [5]
indicated that 55% of them were in favor of a particular candidate. Find the (a)
95%, (b) 99%, and (c) 99.73% confidence limits for the proportion of all the
voters in favor of this candidate.
Ans.: Here, p = 0.55  q = 0.45
(i) The 95% confidence limits for the population are
pq (0.55)(0.45)
p  1.96p = p  1.96 = 0.55  1.96
N 100
= 0.55  0.10

(ii) The 99% confidence limits for the population are


pq (0.55)(0.45)
p  2.58p = p  2.58 = 0.55 2.58
N 100
= 0.55  0.13

(iii) The 99.73% confidence limits for the population are


pq (0.55)(0.45)
p  3 p = p3 = 0.55  3
N 100
= 0.55  0.15

-6-
Prelim Question Paper Solution

Q.3(d) Explain Type I and Type II errors and Level of Significance. [5]
Ans.: Type – I error and Type – II error:
When we reject a hypothesis when it should be accepted, we will say that Type – I error has
been made when we accept a hypothesis when it should be rejected we say that Type – II
error has been made.

In order for decision rules (or tests of hypotheses) to be good, they must be designed so as
to minimize error of decision. This is not a simple matter, because for any given sample size,
an attempt to decrease one type of error is generally accompanied by an increase in the
other type of error. In practice, one type of error may be more serious than the other and
so a compromise should be reached in favour of limiting the more serious error. The only
way to reduce both types of error is to increase the sample size, which may or may not be
possible.

Level of Significance :
The maximum probability with which we are ready to risk type – I error is called the level of
significance or significance level.
This probability is denoted by  and is specified before a sample is drawn.
A significance level of 0.05 (5%) or 0.01 (1%) is common.
The significance level is 5% it means that there are 5 chances in 100 that we would reject a
hypothesis when actually it should be accepted. This means we are 95% confident that the
decision taken is right.

Q.3(e) The breaking strengths of cables produced by a manufacturer have a mean of [5]
1800 pounds (lb) and a standard deviation of 100 lb. By a new technique in the
manufacturing process, it is claimed that the breaking strengths can be
increased. To test this claim, a sample of 50 cables is tested and it is found
that the mean breaking strength is 1850 lb. Can we support the claim at the
0.01 significance level?
Ans.: Let,
H0 :  = 1800 lb and there is no change in breaking strength
H1 :  > 1800 lb and there is a change in breaking strength
One tailed test + Reject H0
 = 100, N = 50, X = 1850. L.O.S.  = 0.01 Reject
X 1850  1800
Z= = = 3.5355 > 2.33
/ N 100 / 50
By new technique breaking strength has increased.

Q.3(f) Two groups, A and B, consist of 100 people each who have a disease. A serum [5]
is given to group A but not to group B (which is called the control); otherwise,
the two groups are treated identically. It is found that in groups A and B, 75
and 65 people, respectively, recover from the disease. At significance levels of
(a) 0.01, (b) 0.05, and (c) 0.10, test the hypothesis that the serum helps cure
the disease. Compute the p-value and show that p-value>0.01, p-value4\>0.05,
but p-value<0.10.
Ans.: Let, H0 : p1 = p2
H1 : p1 > p2
One tailed test (L.O.S.  = 0.01, table value = 2.33)
75 65
N1 = 100, p1 = = 0.75 N2 = 100, p2 = = 0.65
100 100
N P  N2 P2 (1000.75)(1000.65)
p = 11 = = 0.7
N1  N2 100100
 q =1–p = 0.3

-7-
Vidyalankar : S.Y. B.Sc. (IT)  COST

(P1  P2 ) (p1 p2 ) (0.75  0.65)  0


 Z = = = 1.5430 < 2.33
 1 1   1 1 
pq   0.7  0.3   
N
 1 N2   100 100 
Reject H0
 Accept H0

Serum doesn’t help to cure the disease.

Q.4 Attempt any THREE of the following: [15]


Q.4(a) A random sample of 10 boys had the following I.Qs: [5]
70, 120, 110, 101, 88, 83, 95, 98, 107, 100.
Do these data support the assumption of a population mean I.Q. of 100? Find a
reasonable range in which most of the mean I.Q. values of samples of 10 boys
lie.
Ans.: For given data let us first compute mean and standard deviation
70  120  110  101  88  83  95  98  107  100
X = = 97.2
10
(X  X)2 (70  97.2)2  (120  97.2)2 ...(100  97.2)2
SD = =
N 10
= 13.54
 95% confidence limits are
S 13.54
X t0.975  = 97.22.26 … + t0.975 = 2.26 at d.f. = 0
N 1 10  1
= 97.2  10.2
 The required confidence limits are (87, 107.4)
Thus we can say that population mean IQ can be 100.

Q.4(b) Pumpkins were grown under two experimental conditions. Two random samples of [5]
11 and 9 pumpkins show the sample standard deviations of their weights as 0.8
and 0.5 respectively. Assuming that the weight distributions are normal, test the
hypothesis that the true variances are equal, against the alternative that they
are not, at the 10% level. [Assume that P (F10, 5 =  3.35) = 0.05 and P (F8, 10 
3.07) = 0.05.
Ans.: For the two samples 1 and 2
we have, N1 = 11, N2 = 9
S1 = 0.8, S2 = 0.5

Let H0 : No. difference in true variances


H1 : difference in variances
S12 (0.8)2
F = = = 2.56
S22 (0.5)2
df of numerator = N1 – 1 = 10
df of denominator = N2 – 1 = 8

This is two – tailed test :


 at 10% confidence level we see 95th percentile value for the F–distribution
Table value = 3.35
Since 2.56 < 3.35
 Accept H0
 No difference in true variances

-8-
Prelim Question Paper Solution

Q.4(c) The standard deviation of the heights of 16 male students chosen at random in a [5]
school of 1000Male students is 2.40 in. Find the (i) 95% and (ii) 99% confidence
limits of the standard deviation for all male students at the school.
Ans.: (i) The 95% confidence limits are given by
S N S N
 
0.975 0.025
From table 20.975 = 27.5 corresponding to v = 15
 0.975 = 5.24
Also, 20.025 = 6.26 corresponding to v = 15
 0.025 = 2.50
2.40 16 2.40 16
 The 95% confidence limits are and
5.24 2.50
 The 95% confidence limits are 1.83 and 3.84

(ii) The 99% confidence limits are given by


S N S N

0.995 0.005
From table, 20.995 = 32.8 corresponding to v = 15
 0.995 = 5.73
Also, 20.005 = 4.60  0.005 = 2.14
2.40 16 2.40 16
 The 99% confidence limits are and
5.73 2.14
 The 99% confidence limits are 1.68 and 4.49

Q.4(d) Calculation the chi-square value for the following data. [5]
Colour Red Green Yellow
Observed Frequency 12 16 20
Expected Frequency 16 8 15
Ans.: Given data :
Colour Red Green Yellow (0  e1 )2 (02  e2 )2 (03  e3 )2
2 = 1  
Observed e1 e2 e3
12 16 20
Frequency (12  16)2 (16  8)2 (20  15)2
=  
Expected 16 8 15
16 8 15
Frequency = 10.667

Q.4(e) Acme Toy Company prints baseball cards. The company claims that 30% of the [5]
cards are rookies, 60% veterans but not All-Stars, and 10% are veteran All-Stars.
Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-
Stars. Is this consistent with Acme’s claim? Use a 0.05 level of significance.
(Use chi-square goodness of fit).
Given P(2 > 19.58) = 0.0001
Ans.: Given data can be tabulated as
Observed Frequency 50 45 5
Expected Frequency 30 60 10
(01  e1 )2 (O2  e2 )2 (O3  e3 )2
 2 =  
e1 e2 e3
2 2
(50  30) (45  60) (5  10)2
=  
30 60 10
= 19.58
At 0.05 level of significance 20.95 = 7.81 at df = 3
Since 19.58 > 7.81
Acme’s claim should be rejected.

-9-
Vidyalankar : S.Y. B.Sc. (IT)  COST

Q.4(f) A survey of 320 families with 5 children each revealed the following distribution: [5]
Boys 5 4 3 2 1 0
Girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
In this result consistent with the hypothesis that male and female births are
equally probable?
Ans.: Let p : probability of male birth
q : probability of female birth
 (p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5
1
If p = q =
2
5
1 1
P(5 boys and 0 girls) =   =
2 32
4
1 1 5
P(4 boys and 1 girl) = 5    =
2 2 32
3 2
1 1 10
P(3 boys and 2 girls) = 10     =
2 2 32
2 3
1 1 10
P(2 boys and 3 girls) = 10     =
2 3 32
4
 1  1  5
P(1 boy and 4 girls) = 5    =
 2  2  32
5
1 1
P(0 boys and 5 girls) =   =
2
  32
The expected number of families with 5, 4, 3, 2, 1 and 0 boys are obtained by multiplying
the above probabilities by 320 and the results are 10, 50, 100, 100, 50 and 10.
Hence,
(18  10)2 (56  50)2 (110  100)2 (88  100)2 (40  50)2 (8  10)2
2 =     
10 50 100 100 50 10
= 12.0

20.95 = 11.1 at d.f = 6 – 1 = 5

Thus we can conclude that the results are probably significant and male and female births
are not equally probable.

Q.5 Attempt any THREE of the following: [15]


Q.5(a) Find the regression lines of equation for the following. [5]
Advertising Expenditure (‘000 Rs.) 3 5 7 9 11
Quarterly Sales (‘0000 units) 9 12 16 14 15
Ans.:
x y x2 y2 xy
3 9 9 81 27
5 12 25 144 60
7 16 49 256 112
9 14 81 196 126
11 15 121 225 165
Total 35 66 285 902 490

Least square line where X is independent variable Y = a0 + a1 X is independent variable.

- 10 -
Prelim Question Paper Solution

Normal equations are


Y = a0N + a1X and XY = a0 X + a1 + X2
 66 = 5a0 + 35a1 and 490 = 35a0 + 285a1
 a0 = 8.3 and a1 = 0.7
Therefore the required least square line is Y = 8.3 + 0.7X

Least square line where Y is independent variable X = b0 + b1 Y


Normal equations are
X = b0 N + b1 Y and XY = b0 Y + b1 Y2
 35 = 5b0 + 66b1 and 490 = 66b0 + 902b1
 b0 = 5 and b1 = 0.9090
Therefore the required least square line is X = 5 + 0.9090Y.

Q.5(b) Fit a second curve Y = a + bx + cx2 to the following data : [5]


x 5 10 15 20 25 30
y 11 13 16 20 28 36
Ans.: The least squares parabola approximating the set of points (X1, Y1), (X2, Y2),  , (XN, YN)
has the equation Y1 = a0 + a1X + a2X2
Normal equations,
Y = a0N + a1X + a2 X2
XY = a0X + a1X2 + a2X3
X Y = a0X2 + a1X3 + a2X4
2

x y x2 x3 x4 xy x2y
5 11 25 125 625 55 275
10 13 100 1000 10000 130 1300
15 16 225 3375 50625 240 3600
20 20 400 8000 160000 400 8000
25 28 625 15625 390625 700 17500
30 36 900 27000 810000 1080 32400
Total 105 124 2275 55125 1421875 2605 63075

Putting values in the above equation we get,


a0 = 11.6, a1 = 0.2557, a2 = 0.0357
Y = 11.6  0.2557X + 0.0357X2 is required second degree curve.

Q.5(c) Calculate the Coefficient of Correlation between the Age and Blood pressure of [5]
given people from a colony.
Age in Years 60 65 70 40 45 50 55
Blood Pressure 145 160 160 125 140 140 145
Ans.: Correlation coefficient,
N  XY  X   Y 
r =
N X2 
  X   N Y2    Y  
2 2

 

=
 7  56575   385  1015 
 7  21875   385   7  148075   1015 
2 2

= 0.9449

- 11 -
Vidyalankar : S.Y. B.Sc. (IT)  COST

x y x2 y2 xy
60 145 3600 21025 8700
65 160 4225 25600 10400
70 160 4900 25600 11200
40 125 1600 15625 5000
45 140 2025 19600 6300
50 140 2500 19600 7000
55 145 3025 21025 7975
Total 385 1015 21875 148075 56575

Q.5(d) Given the following data estimate the linear trend equation. Find trend value and [5]
calculate the trend value of
Year 2010 2011 2012 2013 2014
No. of cars (in Thousand) 11 30 38 50 56
Ans.:
Year X Y x = X - Xbar y = Y - Ybar x2 xy
2010 1 11 -2 -26 4 11
2011 2 30 -1 -7 1 60
2012 3 38 0 1 0 114
2013 4 50 1 13 1 200
2014 5 56 2 19 4 280
Total 15 185 0 0 10 665

  xy 
For computations of finding the trend line we will use y=  x
  x2 
 
Where, y = Y  Y and x = X  X
Here, Y = 37 and X = 3
  xy   665 
Therefore, Now, y =  x y=  x  y = 66.5x
  x2   10 
 
But y = Y  Y and x = X  X
 (Y37) = 66.5(X3)
 Y = 66.5X 162.5 is the required trend line.

Q.5(e) Find (i) x, (ii) y, (iii) V(x), (iv) V(y) and (v) cov (x, y) for the following data: [5]
x 1 2 3 5 4 3
y 2 4 5 5 3 1
Also verify r = xy/x y
Ans.:
x y (x - xbar) (y - ybar) xy (x - xbar)2 (y - ybar)2
1 2 -2 -1.33 2 4 1.7689
2 4 -1 0.67 8 1 0.4489
3 5 0 1.67 15 0 2.7889
5 5 2 1.67 25 4 2.7889
4 3 1 -0.33 12 1 0.1089
3 1 0 -2.33 3 0 5.4289
Total 18 20 0 0.02 65 10 13.3334

(i) x =
 (x  x)
2

= 1.2909
N

- 12 -
Prelim Question Paper Solution

(ii) y =
 (y  y) 2

= 1.4905
N
(iii) V(x) = 2x = 1.666
2
(iv) V(y) =  y
= 2.2216

(v) Cov(x, y) =
 xy  xy =
65
 (3) 3.33 = 0.8433
n 6

Now, r =
  x  x  y  y  = 0.4335
Nx x

Q.5(f) The two regression lines between x and y are given below. Find x, y and r [5]
100y  45x  1400 = 0, 4y – 5x + 200 = 0
Ans.: Solving both the equations simultaneously,
We get, x = 80, y  = 50

Assume that 1st equation gives the regression equation of y on x which can be rewritten as
y = 0.45x + 14  bxy = 0.45

And let 2nd equation be regression equation of x on y which can be rewritten as x = 0.8y + 40
 bxy = 0.8
Now, r byx  bxy = 0.45  0.8 = 0.6



- 13 -

Vous aimerez peut-être aussi