Académique Documents
Professionnel Documents
Culture Documents
By the end of this course students will be able to apply statistical concepts and methodologies while performing data analysis. Course Structure $ature and role of statistics for management. %escriptive Statistics! &easures of 'entral Tendency, &easures of %ispersion. (ntroduction to probability theory. )robability Theory! )reliminary concepts in )robability, Basic Theorems and rules for dependent*independent events, +andom ,ariable, -#pected value and, ,ariance of random ,ariable. )robability distributions. Sampling distributions. -stimation and hypothesis testing! t-tests, .$/,., 'hi-s uare tests, $on-parametric statistics. 'orrelation and regression analysis. (ntroduction to S)SS and its use for statistical modeling. 0ecture )lan! "ee# $ $ature and role of statistics for management. "ee# % Descri&ti!e Statistics: &easures of 'entral Tendency. "ee# 3 &easures of %ispersion. 1eek 2 3 4 (ntroduction to probability theory. )robability Theory! )reliminary concepts in )robability. "ee# '( 7 Basic Theorems and rules for dependent*independent events, +andom ,ariable, -#pected value and, ,ariance of random ,ariable. "ee# )( 9 )robability distributions. Sampling distributions. "ee# $*( $$( $% -stimation and hypothesis testing! t-tests, .$/,., 'hi-s uare tests, $on-parametric statistics.
"ee# $3 'orrelation and regression analysis. "ee# $+ (ntroduction to S)SS and its use for statistical modeling. Major Test
,!a-uation .part from the major e#am at the end of the semester, students would be e#amined throughout the semester via minor e#ams, case studies and home assignments. The distribution of the marks would be as follows! &ajor -#am Two 7ui"s .ssignments 3 'lass )articipation Suggested .eadings 9. +ichard (. 0evin and %avid S. +ubin :Statistics for &anagement;, )<(, $ew %elhi 9==>. 8. .nderson, Sweeney and 1illiams,; Statistics for Business and -conomics;, South-1estern 'ollege )ublishing, /hio 9==?. @. &urray +. Spiegel, :Theory and )roblems of ST.T(ST('S;, 8*ed. Schuam's /utline Series, &cAraw-<ill Book 'ompany, 0ondon 9==8. 2. &ark 0. Berenson and %avid &. 0evine, ; Basic Business Statistics! 'oncepts and .pplications;, 4*ed, )rentice <all -nglewood 'liffs, $ew Bersey 9==8. 4. +.). <ooda, :Statistics for Business and -conomics;, &ac&illan (ndia 0td., $ew %elhi 9==>. 5. S.'. Aupta,; Cundamentals of Statistics;, <)(, $ew %elhi 9==5. 56 &arks 86 &arks 86 &arks
STATISTICS
Statistics refers to the body of techniques used for collecting, organizing, presenting & analyzing the data as well as drawing valid conclusions & making reasonable decisions on the basis of such analysis. The data may be quantitative, with values expressed numerically, or it may be qualitative, with the characteristics of observations being tabulated. ORIGIN AND DEVELO MENT The great statisticians, Sir rancis !alton "#$%%&#'%(), *arl +earson "#$(,#'-.) and /.S. !osset had pioneered the regression analysis, the correlation analysis as well as chi&square test and t&test respectively. 0onald 1. isher who is rightly termed as ather of Statistics has developed statistics to a variety of fields such as biometry, genetic, psychology, education and agriculture. 2e is also a pioneer in 3stimation Theory, Sampling 4istribution Theory, 1nalysis of 5ariance and 4esign of 3xperiments. or his contributions to statistics, isher is described as the real giant in the development of the T!EOR" O# STATISTICS.
REAL$*ORLD RO+LEM
6anagerial ormulation of problem 6anagerial question relating to problem Statistical formulation of question
STATISTICAL ANAL"SIS 9ew question
8ne of the most difficult steps in the decision&making process&one that requires a cooperative effort among managers and statisticians&is the translation of the managerial question into statistical. This statistical question must be formulated so that, when answered, it will provide the key to the answer to the managerial question.
#RE/,ENC" DISTRI+,TIONS
;t is useful to distribute the data into classes & to determine the number of individuals belonging to each class, called class frequency. 1 tabular arrangement of data by classes together with the corresponding frequencies is called a frequency distribution. & ;ndividual Series & 4iscrete Series & <ontinuous Series
T" ES O# DATA
Statistics is concerned with measurements of one or more variables of a sample of units drawn from a population. These measurements are referred to as data. 4ata is generally classified into four types7 9ominal
8rdinal ;nterval 0atio Nominal 9ominal data "also referred to as categorical data) are labels or names that identify the category to which each unit belongs. 3xample7 the gender of each individual in a sample of seven applicants for a computer programming =og. 9ominal data are often reported as nonnumerical labels, such as male "or female) in our example. Or-inal ;ndicates the relative amount of property possessed by the units. 3xamples7 #. The size of the car rented by each individual in a sample of %> business travelers7 <ompact, midsize or full&size. %. 1 taste&tester?s ranking of four brands of tomato sauce. -. 1 supervisor?s annual ranking of the performance of his -> employees using a scale of # "worst performance) to #> "best performance). 2ence ordinal data simply provides the ordering or ranking of the units in a sample or population. Inter)al The interval data are measurements that enable the determination of how much more or less of the measured characteristic is possessed by one unit than another. 3xample7 The temperature "in degrees ahrenheit) at which each of a sample of %> pieces of heat&resistant plastic begins to melt. The key feature of this type of data is that the zero point "origin) does not indicate the absence of the characteristic of interest e.g. the origin on the temperature scale does not indicate the absence of heat. Temperatures lower than >o "e.g. &#>o< and &#>o ) indicate that less heat is present, so >o does not mean ?no heat?. Ratio 0atio data are measurements that enable the determination of how many times as much of the measured characteristic is possessed by one unit than another. 0atio data are always numerical. @ero point or origin indicates the absence of the characteristic measured. 3xamples #. Sales revenue for each firm in a sample of %> firms.
%. The number of unemployed people in a city or unemployment rate. -. The number of cars sold in a country in a particular year. The ratio data represents the highest level of measurement. 6ost numerical business data are measured on scales for which the origin is meaningful. Thus, most numerical measurements encountered in business are ratio data. The four types of data are often combined into two classes that are sufficient for most statistical applications. 9ominal and ordinal data are often referred to as qualitative data, whereas interval and ratio data are called quantitative data.
Measures of Dis'ersion
A.solute Measures Relati)e Measures
1 Measure of Central Ten-enc& is the single value, which represents the entire series of data. Discussion Area4 #. 2ow to compute different measures of central tendency in case of different frequency distridutionsE %. ;s there any relationship between the different measuresE
Measures of Dis'ersion
The degree to which individual values tend to scatter around the average value is called the dispersion or variation of the data. A.solute Measures4
0ange Standard 4eviation 5ariance Relati)e Measure4 <oefficient of 5ariation 1bsolute measures depend on unit of measurement of data whereas the relative measures are independent of the same. Therefore, relative measure unlike the absolute measure can be used to compare the variability or uniformity of the two or more distributions.
Stan-ar- De)iation4
,ngrou'e- Data
S.D. 5
Grou'e- Data
(X X )
N
S.D. 5
f X X f
)% =
fx % f
Variance C % C 1S.D23
Coefficient of Variance 1C.V.24 C.V. 5
#>> X
#.
ollowing is a sample of yields for #> shares traded on the 9ew Bork Stock 3xchange. Issuer 1rgosy <hase 6anhattan ;:6 6obil 0G0 9abisco "iel-s 162 #%.. .., ,.> ,.$.# Issuer <aterpillar 4ow Fucent +acific :ell Service 6dse "iel- 162 ....$ .., .., $..
<ompute the following descriptive statistics. i. 6ean, median, and mode ii. The variance and standard deviation. iii. <oefficient of variation. %. 1 survey was conducted concerning the ability of computer manufacturers to handle problems quickly. The following results were obtained. Com'an& <ompaq +ackard :ell Auantex 4ell 93< 1ST 1cer Da&s to Resol)e ro.lems #%, ## #D #D #, #. Com'an& !ateway 4igital ;:6 2ewlett&+ackard 1T&T Toshiba 6icron Da&s to Resol)e ro.lems %# %, #% #D %> -, #,
a. /hat are the mean and median number of days needed to resolve problemsE b. /hat is the variance and standard deviationE c. /hich manufacturer has the best recordE
-.
+ublic transportation and the automobile are two methods an employee can use to get to work each day. Samples of times recorded for each method are shown. Times are in minutes. Public Transportation %$ %' Automobile: %' -# % -, --% -D %( -> %' -% D# -# -% -( -D --
a. <ompute the sample mean time to get to work for each method. b. <ompute the sample standard deviation for each method. c. 8n the basis of your results from "a) and "b), which method of transportation should be preferredE 3xplain.
RO+A+ILIT" T!EOR"
Preliminary Concepts
Ran-om E7'eriment4 1n experiment, which can result into, more than one outcome is called 0andom 3xperiment or Statistical 3xperiment. E)ent7 3ach distinct outcome of an experiment is called a simple event. Sam'le S'ace4 <ollection of all possible distinct outcomes of an experiment is called the sample space of outcomes. e.g. tossing an unbiased coin is a random experiment. Sample Space C H2,T) /hereas, 2, T are two events. ;n case of two unbiased coins7 Sample Space C H22, 2T, T2, TTI Mutuall& E7clusi)e E)ents7 Two or more events are called mutually exclusive if the occurrence of any one of them excludes the occurrence of the others. e.g. in an experiment of tossing of a coin, occurrence of head and tail are mutually exclusive. Jpcoming of head excludes the outcome of tail and vice versa. E8uall& Li%el& E)ents7 ;f events are said to be equally likely if none of them is expected to occur in preference to other. ;f we roll a die, any number out of #, %, -, K.. can come. Therefore, all six numbers are equally likely to come or have equal chances of selection. In-e'en-ent E)ents7 Two events are said to be independent if happing of one is not affected by and does not effect the happening of the other event. e.g. suppose we have a bag with ( red and ( green balls. 9ow suppose a green ball is taken out and then replaced also. Thereafter, a red ball is taken out. 9ow the outcome of the red ball is not getting affected due to the previous experiment involving the green ball, therefore the two events i.e. outcome of green ball first and then the red ball are independent. ;f the green ball had not replaced back then the chances of occurrence of red ball have been affected because the total balls left in the bag are ' instead of #>.
m n
+ " Ac ) C # L + "1)
+"1)
t C N N
LA*S O# RO+A+ILIT"
Suppose we define two events 1 and : on a sample space say, S. Then the addition and multiplicative law can be stated as 7
La( of A--ition4
"1:)
1i2 1ii2
1A,+2 5 1A2 9 1+2 E)ents A an- + are mutuall& e7clusi)e 1A +2 5 1A2 9 1+2 : 1A +2 E)ents are not mutuall&
e7clusi)e
1ii2
1A +2 5 1A2 1+2 If A; + are In-e'en-ent 1A +2 5 1A2. 1+<A2 If A; + are -e'en-ent e)ents 1A +2 5 1+2. 1A<+2
+ ":N1) and + "1N:) known as condition probabilities. E7am'le A. 1 bag contains D white, ( red and . !reen ablls. Three balls are drawn at random. /hat is the chance that a white, a red and a green ball is drawn E Solution7 There are DM ( M. C #( balls in the bag. Three balls can be drawn out of #( in
#(! - ways.
8ne white ball can be drawn out of the D white balls in 4C1, ways O one red ball can be drawn out of the ( red balls in =<0, ways and one green ball can be drawn out of the . green balls in ><0, ways.
4
2ence required probability C /uestions for self stu-&
C1 * =<0 ? ><0
#(! -
#. 1n urn contains $ white and - red balls. ;f two balls are drawn at random, find the probability that "i) both are white, "ii) :oth are red, "iii) one is of each colour.
%.The following data show the length of life of wholesale grocers in a particular city Fength of Fife +ercentage of "years) /holesalers >&( .( (&#> #. #>&#( ' #(&%( ( %( and over ( Total #>> "i) 4uring the period studied, what is the probability that an entrant to this profession will fail within five years E "ii) That he will survive at least %( yearsE "ii) 2ow many years would he have to survive to be among the #> P percent longest survivorsE -. 1 <ommittee of D persons is to be appointed from - officers of the production department, D officers of the purchase department, two of the sales department and one chartered accountant. ind the probability of forming the committee in the following manner7 "i) There must be one from each category. "ii) ;t should have atleast one from the purchase department. "iii) The chartered accountant must be in the committee. D. 1 chartered accountant applies for a =ob in two firms Q and B. 2e estimates that the probability of his getting selected in firm Q is >.,, and being re=ected at B is >.( and the probability of at least one of his applications being re=ected is >... /hat is the probability that he will be selected in one of the firms E There are - economists, D engineers, % statisticians and ; doctor. 1 committee of D from among them is to be formed. ind the probability that the committee 7 "i) <onsists of one of each kind. "ii) 2as at least one economist. "iii) 2as the doctor as a member and three others. Two vacancies exist at the =unior executive level of a certain company. Twenty people, fourteen men and six women, are eligible and equally qualified. The company has decided to draw two names at random from the list of the eligible people. /hat is the probability that 7 "a) :oth positions will be filled by womenE "b) 1t least one of the positions will be filled by womanE "c) 9either of the positions will be filled by women.
(.
..
Ran-om Varia.le
Definition7
1 finite real valued measurable function defined on a sample space is called a random variable. ;ts value is determined by the outcome of its experiment. e.g. toss of two coins7 S 5 1!!; T!; !T; TT2 would be the sample space of the experiment of tossing. 9ow say we toss two coins and Q denotes the number of 2eads Therefore Q will take four values as follows7 @ 5 1A; 0; 0; 32 Types of the random variable7 &Discrete Ran-om Varia.le 1DRV2 &Continuous Ran-om Varia.le 1CRV) 405 assumes finite values whereas <05 takes all possible value in particular limits.
'17i2
1ii2 '172 5 0 ;n simple words, tabular presentation of all different values of a discrete random variable along with its respective probabilities is known as 4iscrete +robability 4istribution provided the above conditions are fulfilled. or exampleK. Fet Q be a discrete random variable denoting the number of heads in a toss of two coins. Then the sample space of the experiment would be7 S C " TT, T2, 2T, 22)
1nd Q will take the following values along with its respective probabilities7 Q p"x) > # %
# D # % # D
;n the above table we found that all p"x) s are greater than zero and sum total of all probabilities is one. Therefore the above distribution is a discrete probability distribution. :inomial 4istribution and piosson 4istribution are examples of discrete distributions.
p"xi)
"iv)
'172 dxC #
>
3"Q) C xip"xi) C
2ence, /here
+roperties of 3xpectation7 "i) 3"<) C < where < is a constant "ii) 3"aQMb) C a 3"Q) M b where a and b are constants "iii) 3"QMB) C 3"Q) M 3"B) where Q and B are random variables
% % C E[ X E " X )]
CE
C E" X )% + C
[X
+( E " X )) % XE " X )
%
E " X % ) + % % % C E" X % ) %
or
3"Q) C Q+"x) C #R#N. M%R#N. M-R#N. M DR#N. M(R#N. M .R#N. C #N."#M%M-MDM(M.) C %#N. Therefore, 3"Q) C ,N% 9ow, ro.lems for self stu-& #. 1 random variable Q is defined as the sum of faces when a pair of dice is thrown. ind the expected value of x. 1lso find its variance. %. 1n urn contains , white and - red balls. Two balls are drawn together, at random, from this urn. /hat is the expected number of white balls drawnE -. 1 die is tossed twice. !etting a number greater than D is considered a success. ind the mean and variance of the probability distribution of the number of successes.
Normal Distri.ution
9ormal 4istribution is one of the most important continuous theoretical distributions in Statistics. Definition4 ;f Q is a continuous random variable following 9ormal +robability 4istribution with mean density function "p.d.f.) is given by7
f " x) =
# x % # e % %
&S T
xTS
and
/here, &
5ariance C
F (x ) 0 .5 0 .5
X =
%. The curve is symmetrical on both axis. -. Since the distribution is symmetrical therefore, 6ean C 6edian C 6ode "1ll coincide at a point). D. The whole area under the curve is divided into two equal parts. (. The maximum probability occurring at Q C U is given by7
f " x) =
# %
6 8 .2 7 %
& - & % &
9 DD.EB 5Q.C4 4% 6
M %
M-
The following table gives the areas under the normal probability curve for some important values of @ 7 4istance from the mean 8rdinates in terms of @ C >..,D( @ C #.>> @ C #.'. @ C %.> @ C %.($ @ C -.> 1rea under the curve (>P C >.(> .$.%.P C >..$%. '(P C >.'( '(.DDP C >.'(DD ''P C >.'' ''.,-PC >.'',.-
!o( to Com'ute Areas un-er Normal ro.a.ilit& Cur)eF 6athematically, the area bounded by the curve pf"x), Q&axis and the ordinates at Q C a and Q C b is given by the definite integral7
b
+"x) d"x)
+"x) d"x)
+ "a T x T b)
QCa QC QCb Fet us now try to compute the areas under the normal probability curve.
a
+ " TQTa) C
p "x) d x
;s the area under the normal curve enclosed by x&axis and the ordinates at Q C anx Q C a as shown below7
+ " T x T a)
QC @C>
QCa @C#
/henC /hen Q C a,
@C QV
C>
@C
a L C C @# "say) @#
>
"@) dz
% " e N % d"
E7am'le
A.1 Sales Tax 8fficer has reported that the average sales of the (>> businesses that he has to deal with during a year amount to 0s. -.>>> with a standard deviation of 0s. #>>>>. 1ssuming that the sales in these businesses are normally distributed, find7
The number of businesses, the sales of which is over 0s. D>>>>. The percentage of businesses, the sales of which are likely to range between 0s. ->>>> and 0s. D>>>>. The probability that the sales of a business selected at random will be over 0s. ->>>>. +roportions of area under the 9ormal <urve z >.%( >.D> >.(> >..> 1rea >.>'$, >.#((D >.#'#( >.%%(,
Solution7 Fet the variable Q denote the sales "in 0s.) of the businesses during a year. Then we are given that7 QW9 ", %), where C -.>>> and % C#>>>>. "i)The probability that the sales of a business is over 0s. D>>>> is given by + "QXD>>>>)7
#=
/hen Q C D>>>>,
@ >.D)
2ence in a group of (>> businesses, the expected number of businesses with annual sales over 0s. D>>>>7 (>>R >.-DD.C#,% +art "ii) and "iii) would be discussed in the class. /uestions for self stu-&4 #. The sizes of components produced by a machine are normally distributed. ;t is required that the size should lie between #(..- cm. 1nd #(.$D cm. 1nd it is found that %.$,P of the production is re=ected for being oversize and #.>,%P of the production is re=ect for being undersize. ind the mean and the standard deviation of the distribution of the component sizes.
%. ;ndicate which brand you will choose and whyE :rand 1 :rand : 6ean #.,>>> *m. %>,>>> *m. Standard deviation %,>>> *m. D,>>> *m.
normal distribution, also indicate what percentage of brand : might be expected to run more than %D,>>> kms. -. This lifetime of a certain type of battery has a mean life of D>> hours and a standard deviation of (> hours. 1ssuming normality for the distribution of life& time, find7 "i) The percentage of batteries which have life&time of more than -(> hours. "ii) The life&time value above which the best %( per cent of the batteries will have their life, and "iii) The proportion of batteries that have a life&time between ->> hours and (>> hours.
Sam'ling
The process of sampling involves drawing a sample from a given population and using the sample data to make statistical inferences about the parameters of the population. These inferences may consist of7 3stimation of population parameter from the sample information. Testing the hypothesis related to a given population parameter in the light of sample data. o'ulation +opulation is the aggregate of items or individuals under study in any statistical investigation. inite +opulation ;nfinite +opulation Sam'le 1 finite subset of the population selected from it, with the ob=ective of studying its characteristics is known as sample. 9umbers of units in the sample are known as sample size. arameters
Statistical constants of the population like 6ean " correlation coefficient "
).
), variance " % ),
Statistics
Statistical constants of the sample e.g. sample mean " X ), Sample variance " $ % ), <orrelation coefficient "r).
% ) X W 9 " , n
Sam'ling Distri.ution other than of Mean4 Stu-entHs t$-istri.ution Sne-ecorHs #$-istri.ution
Chi$s8uare 1
% 2 -istri.ution
1lthough we cannot eliminate the possibility of errors in hypothesis testing, we can consider the probability of their 8ccurrence. Jsing common statistical notation, we denote the probabilities of making the two errors as follows. C The probability of making Type ; error 5 The probability of making Type ;; error
The maximum allowable limit for making Type ; " known as level of significance) error has to be specified before conducting the hypothesis testing. <ommon choices for level o significance are >.>( and >.>#.
J5
X E " X ) $.E." X )
/here 3 "
N 1A; 02
@ can be defined as a Standard 9ormal 5ariate "S.9.5.) of any statistic. Ste' V4 Conclusion4
25K;
/e compare the computed value of @ in step "iv) with the significant value @ "tabulated value) at given level of significance Z[.
;f @ T @, then @ is not significant i.e. difference between the statistic and the parameter is =ust due to sampling fluctuations. 2> can be accepted & 5ice&5ersa.
Different Tests un-er !&'othesis Testing4 J$test T$test #$test test 1These tests (oul- .e -iscusse- in -etail in the class.2
%
C #%>. ;n other words, there is no significant difference between the population mean \ C #%> and sample mean
>
C##..
%
x
]
>
%
]
>
@C
x
s n
C &%.., Since ^@^ C %..,, which is greater than #.'., the value of @ is significant at (P level of significance and hence null hypothesis is re=ected. 2ence stenographer_s claim is not true. /uestions for self stu-& #. 1 random sample of #>> students gave a mean weight of (> kgs. with standard deviation of D kgs. Test the hypothesis that the mean weight in the population is .> kgs. %. ;t is claimed that a random sample of #>> tyres with a mean life of #(%.' kms. is drawn from a population of tyres which has a mean life of #(%>> kms. and a standard deviation of #%D$ kms. Test the validity of the claim.
t$test
E7am'le 07 Ten cartons are taken at random from an automatic filling machine. The mean net weight of the #> cartons is ##.$ oz and standard deviation is >.#( oz. 4oes the sample mean differ significantly from the intended weight of #% oz.E Bou are given that for degree of freedom C ', Solution4 /e are given7 9C#>,
>.>(
C %.%..
x C##,
s C >.#(
Null h&'othesis7
differ significantly from the population mean C #% Alternate h&'othesis7 Test statistic. Jnder tC
% 7 #%
#
>
] %
x $
%
n
'
x Wt = t n # ' s "n #)
>.% >.#(
tC
##.$ #% >.#(
C & D.> Tabulated value of t for ' d.f. at (P level of significance is %.%.. Since calculated ^ t ^ is much greater than tabulated t, it is highly significant. 2ence, null hypothesis is re=ected at (P level of significance and we conclude that the sample mean differs significantly from the mean C #%oz. E7am'le 37 1 machine is designed to produce insulating washers for electrical devices of average thickness of >.>%( cm. 1 random sample of #> washers was found to have an average thickness of >.>%D cm with a standard deviation of >.>>% cm. Test the significance of the deviation. 5alue of t for ' d.f. at (P level is %.%.%.
Null h&'othesis7
>
deviation between sample mean population mean C >.>%(. Alternate h&'othesis7 Jnder
]
x C >.>%D and
% 7 >.>%(
#
%
n =
>
tC
x
$
s n #
#> #
= t'
>.>># = #.( >.>>%
]
9ow t C
Tabulated value of t for 'd.f. C %.%.%. Since ^ t ^ T %.%.%, it is not significant at (P level of significance. 2ence the deviation " Therefore the null hypothesis is accepted.
E7am'le B4 The mean weekly sales of the chocolate bar in a candy stores were #D..- bars per store. 1fter an advertising campaign the mean weekly sales in %% stores for a typical week increased to #(-., and showed a standard deviation of #,.%. /as the advertisement campaign successfulE Solution4 /e are given7 n C %%, Null h&'othesis7
C#(-.,, s C #,.%
>
is not significant. ;n other words the advertising campaign is not successful. Alternate h&'othesis7
and
Test statistic7 Jnder the null hypothesis the test statistic is7 tC 9ow t C
s
x W t %%# = t %#
n #
#,.% %# = ,.D %# ,.D D.($%. --.'##% = = =#.',#. #,.% #,.% #,.%
#(-., #D..-
Tabulated value of t for %#d.f. at (P level of significance for single tailed test is #.,%#. Since the calculated value of t is greater than the tabulated value, it is significant. 2ence advertisement campaign was successful in promoting sales. E7am'le G4 1 soap&manufacturing unit was distributing a particular brand of soap through a large number of retail shops. :efore a heavy advertisement campaign, the mean sale per week per shop was #D> dozens. 1fter the campaign, a sample of %. shops was taken and the mean sale was found to be #D, dozens with standard deviation #.. <an you consider the advertisement effectiveE Solution4 /e are given7 n C %., Null h&'othesis7
and is =ust due to fluctuations of sampling. ;n other words advertisement is not effective. Alternate h&'othesis7
Test statistic7 Jnder the null hypothesis the test statistic is7 tC tC
x
$
#.
=
%(
n #
n #
= t %(
#D, #D>
, ( = %.#' #.
Tabulated value of t for %( d.f. at (P level of significance for single right tail test is #.,>$. This is the value of Since the calculated value of t is greater than the tabulated value, it is significant. 2ence the increase in sales cannot be attributed to fluctuations of sampling and we conclude that the advertisement is certainly effective in increasing the sales.
>.#>
Chi$s8uare Test
E7am'le 4 The number of automobile accidents per week in a certain community was as follows7 #%, $, %>, %, #D, #>, #(, ., ', D 1re these frequencies in agreement with the belief that accidents in a certain community were the same during this #>&week periodE Some more problems for chi&square test and &test would be done in the class.
Regression Anal&sis
Study of the functional relationship between the variables is known as regression analysis. <orrelation analysis brings out the degree of association between the variables and, the existing cause and effect relationship is explored by the regression analysis. The regression e8uations are useful for 're-icting the )alue of the -e'en-ent )aria.le for gi)en )alue1s2 of the in-e'en-ent )aria.le1s2. The linear regression model 8rdinary least squares estimation The linear regression mo-el ;n the linear regression model, the dependent variable is assumed to be a linear function of one or more independent variables plus an error introduced to account for all other factors7
;n the above regression equation, &i is the dependent variable, 7i0; ...., 7iL are the independent or explanatory variables, and ui is the disturbance or error term. The goal of regression analysis is to obtain estimates of the unknown parameters
,...........
#
'
variables affects the values taken by the dependent variable. 1pplications of regression analysis exist in almost every field. ;n economics, the dependent variable might be a family?s consumption expenditure and the independent variables might be the family?s income, number of children in the family, and other factors that would affect the family?s consumption patterns. ;n education, the dependent variable might be a student?s score on an achievement test and the independent variables characteristics of the student?s family, teachers, or school. Or-inar& least s8uares estimation The usual method of estimation for the regression model is ordinary least squares "8FS). Fet .0; ...; .L denote the 8FS estimates of # ,........... ' . The predicted value of &i is7 The error in the 8FS prediction of yi, called the residual, is7
,...........
#
'
` =a + ( bX
9ow for any set of data related to Q and B, it is possible to specify a line that approximates the mean of the B for given values of Q by using least s8uare techni8ue. :y revealing how the mean of the B changes as the various values of Q change, this line is understood to describe the regression of B on Q. The regression line is the predicted value of B for each value of Q. ;t is noteworthy that for the same set of related variables there is always a second regression line that describes the regression of Q on B. oints of Discussion 0. !o( to fit the regression line in case of t(o )aria.le an- three )aria.le mo-elF 3. !o( to inter'ret the regression coefficientsF B. !o( to fin- the coefficient of -eterminationF G. !o( to chec% the significance of regression coefficientsF =. !o( to test the o)erall significance of the regressionF
CASE ST,D"$I
NATIONAL !EALT! CARE ASSOCIATION The 9ational 2ealth <are 1ssociation is concerned about the shortage of nurses the health care profession is pro=ecting for the future. To learn the current degree of =ob satisfaction among the nurses, the association has sponsored a study of hospital nurses throughout the country. 1s part of this study, a sample of (> nurses was asked to indicate their degree of satisfaction in their work, their pay, and their opportunities for promotion. 3ach of the three aspects of satisfaction was measured on a scale from > to #>>, with larger values indicating higher degrees of satisfaction. The data is shown in the following table7 *or% ,# $D $D a& D' (,D romotion ($ .-, *or% ,% ,# .' a& ,. %( D, romotion -, ,D #.
Jse methods of descriptive statistics to summarize the data. +resent the summaries that will be beneficial in communicating the results to others. 4iscuss your findings. Specifically, comment on the following questions. #. 8n the basis of the entire data set and the three =ob&satisfaction variables, what aspect of the =ob is most satisfying for the nursesE /hat appears to be the least satisfyingE ;n what area"s), if any, do you feel improvements should be madeE 4iscuss. %. 8n the basis of descriptive measures of variability, what measure of =ob satisfaction appears to generate the greatest difference of opinion among the nursesE 3xplain.
standard deviation for these data was >.%#O hence, the population standard deviation was assumed to be >.%#. Auality 1ssociates then suggested that random sample of size -> is taken periodically to monitor the process on an ongoing basis. :y analyzing the new samples, the client could quickly learn whether the process was operating satisfactorily. /hen the process was not operating satisfactorily corrective action could be taken to eliminate the problem. The design specification indicated the mean for the process should be #%. The hypothesis test suggested by Auality 1ssociates follows. !A4 5 03 !04 03 <orrective action will be taken any time 2o is re=ected.
The following samples were collected at hourly intervals during the first day of operation of the new statistical process control procedure. Sam'le I Sam'le 3 Sam'le B Sam'le G ##.(( ##..% ##.'# #%.>% ##..% ##..' ##.-. #%.>% ##.(% ##.(' ##.,( #%.>( ##.,( ##.$% ##.'( #%.#$ ##.'> ##.', #%.#D #%.## ##..D ##.,# ##.,% #%.>, ##.$> ##.$, ##..# #%.>( #%.>#%.#> ##.$( ##..D ##.'D #%.># #%.#. #%.-' ##.'% ##.'' ##.'# ##..( #%.##%.%> #%.#% #%.## #%.>' #%.#. ##..# ##.'> ##.'#%.>> #%.%# #%.%% #%.%# #%.%$ ##.(. ##.$$ #%.-% #%.-' ##.'( #%.>##.'#%.>> #%.># #%.-( ##.$( ##.'% #%.>. #%.>' ##.,. ##.$##.,. ##.,, #%.#. #%.%##.$% #%.%> ##.,, ##.$D #%.#% ##.,' #%.>> #%.>, ##..> #%.-> #%.>D #%.## ##.'( #%.%, ##.'$ #%.>( ##.'. #%.%' #%.-> #%.-, #%.%% #%.D, #%.#$ #%.%( ##.,( #%.>##.', #%.>D ##.'. #%.#, #%.#, #%.%D ##.'( ##.'D ##.$( ##.'% ##.$' ##.', #%.-> #%.-, ##.$$ #%.%#%.#( #%.%% ##.'#%.%( <ase Auestions7 0. <onduct the hypothesis test for each sample at the >.# level of significance and determine what action.
if any, should be taken. +rovide the test statistic for each test.
%.
around
exceeds the
corrective action will be taken. These limits are referred to as upper and lower control limits for quality control purposes. -. 4iscuss the implications of changing the level of significance to a larger value. /hat mistake or error could increase if that was doneE
CASE ST,D"$III
;ncreasing use of <hemical fertilisers has played a profound role in increasing the productivity of ;ndian 1griculture. To meet the needs of the ever&growing population, ;ndia needs to increase the availability of fertilisers to its farmers. 6oreover the constraint of limited land emphasizes the need for optimal use of the fertilisers in order to maintain the fertility of the land as well as to enhance its productivity. ;n order to achieve this ob=ective ;ndian !overnment introduced 0etention +ricing Scheme "0+S) in #',, to encourage the production as well as consumption of the fertilisers. This two tier pricing system involved a price to the farmer controlled at a low level on one hand, and a fair price to the producer to fully cover the reasonable cost of production including a reasonable margin of profit on the other hand. /e can see from the following table that there has been a tremendous increase in fertiliser consumption in the post 0+S period.
%##., (-.# #'.>&.# #D,'.(D#.> #',>&,# -.,$.# #%#-.. #'$>&$# ,'',.% -%%#.> #''>&'# $>D..-%%#.% #''#&'% $D%..$ 3NGB.N #''%&'$,$$.3>>D.B #''-&'D '(>,.# %'-#., #''D&'( '$%%.$ 3NDE.= #''(&'. #>->#.$ %',..$ #''.&', #>'>(.> -'#,.% #'',&'$ ##-(-.$ D##%.% #''$&'' ##.%>., D$>D.# #'''& %>>> I-eal Consum'tion Ratio
,...(.' ..> (.' '.( '., $.( $.( #>.> ,.' $.( ..' 3
Sources7
1t the beginning of '>s, the macro&economic situation had worsened with fiscal&deficit reaching as high as about $ percent of !4+ during #''>&'#. oreign exchange reserves were barely sufficient to meet =ust about % weeks imports and the inflation rate was running into double digits. 1ll this led the economy to a stage whereby ;ndia was almost on the verge of defaulting on its external payment obligations. The government of ;ndia was thus forced to approach the ;nternational 6onetary und ";6 ) for financial assistance. The ;6 in turn had laid down stiff conditions including amongst others removal of subsidies meant for providing the necessary support to the farmers. 2ence the fertiliser subsidies also came under attack and the +&* sector was decontrolled. This initiative steeply rasied the prices of + & * fertilizers, which led to fall in their consumption. 8n the other hand, urea continued to remain under 0+S and its selling price was reduced by #> P. Though a scheme of concession on the decontrolled fertilisers was announced in Gune #''- and urea price was raised by about %> P w.e.f. Gune#> #''D, the 9+* ratio could not improve much because of the still prevailing massive difference in prices of nitrogenous, phosphatic and potassic fertilizers
Estimation of #ertiliser consum'tion in In-ia4 The following loglinear model has been taken for estimating the demand for nitrogenous fertiliser in ;ndia7 ln
! C ! M
t t #
>
ln
) M
t
ln
% M
t
ln
P M
t
ln
ln B /hereas
!
year.
C <onsumption of the nitrogenous fertiliser in tth year. C +ercentage of gross irrigated area to gross cropped area in tth C +ercentage of area under 2B5 to gross cropped area in tth year. C +rice of the nitrogenous fertilizer in tth year. C Fagged dependent variable.
% P
t
t #
#,
C !4+ of 1griculture, "it has been taken as a proxy variable for income of the farming community).
%
4ata on all variables except the income variable have been collected from various issues of ertiliser Statistics, 1;, 9ew 4elhi. The data on the !4+ of 1griculture, which has been used as a proxy for the income variable has been obtained from ;ndian 3conomic Survey %>>>&%>>#. The study covers the period from #',-&,D to #'''&%>>>.
TIME
"
9=?8-?@ 9=?@-?2 9=?2-?4 9=?4-?5 9=?5-?> 9=?>-?? 9=??-?= 9=?=-=6 9==6-=9 9==9-=8 9==8-=@ 9==@-=2 9==2-=4 9==4-=5 9==5-=> 9==>-=?
2828.4 4862.2 42?5.9 4556.? 4>95 4>95.? >849 >@?4.= >==>.8 ?625.@ ?285.? ?>??.@ =46>.9 =?88.? 96@69.? 96=64
@6 @6 @6.= @6.2 @9.5 @8.? @@.4 @@.= @@.5 @4.> @5 @5.5 @>.4 @? @?.> @?.>
8>.2=852 8=.=8?>9 @6.>6@>= @9.6424> @9.?28?5 @9.5?>@5 @8.=>@== @@.44>=6 @2.=448= @4.49468 @4.8@4?6 @4.=68@4 @>.5==>> @?.4>98= 26.@8882 26.6=522
8946 8946 8946 8946 8@46 8@46 8@46 8@46 8@46 @656 8>56 8>56 @@86 @@86 @@86 @556
9>>@66 9=@46? 9=5@4@ 9=?@4@ 9=?>26 9=5>@4 88>6=4 8@9@?= 828698 8@=84@ 848864 85864= 8>562= 8>494@ 8==259 8=4646
The model has been estimated by using the above data applying 8FS to the equation using the statistical package S21@16. The results are as under7 Results for #ertiliser Consum'tion Mo-el for Nitrogen " #',-&,D to #'''&%>>>) ramete 3stimated <oefficient T&ratio r
A.>DGBN A.D>0DN
t
$A.3AAE A.3N0NG
P
!
B
it
it #
A.BEB3A $3.G>>B
<onstant
4&/ statistic
6oving along the policy of economic liberalization and reforms, the deregulated regime in all the three types of fertilizers is no doubt the long& term goal. :ut any contemplated move towards a free market economy has to be gradual so as to prevent adverse impact on consumption of fertilisers and ultimately on the productivity of the foodgrains. Therefore, the government should come out with a stable and clear&cut policy without further loss of time. The policy must aim at protecting the domestic industry as well as the interests of the farmers and it would benefit the community at large. Auestions7 "To be discussed in the class) CASEST,D"$IV 1 university is typically required to prepare operating budget well in advance of actually receiving its revenues and incurring the expenditures. 1n important source of revenue is student tuition, which is obviously a function of the number of student enrolled. 1 university was having problems in preparing accurate budgets because past forecasts of enrollment, made each ebruary before the start of the academic year in September, were sub=ect to considerable error. 8nce aspect of the problem was determining the, relationship between the numbers of applications received by ebruary # and the number of new students entering the university in the following September. The data tabulated below were collected on September registrations and ebruary # applications. Bear #''> #''# #''% #''9umber of 1pplications 9umber of 9ew Students 0eceived by ebruary# 3nrolled in September "2undreds) "2undreds) %$ %D %. %> %$ #$ %$ %%
-. -. D% D. D. (>
-% --D -D -( -$
a. !iven the nature of the forecasting problem, which variable would be the dependent variable and which would be the independent variableE b. +lot the data. c. 4etermine the estimated regression line. !ive an economic interpretation of the slope ") coefficient. d. Test the hypothesis that there is not relationship "that is, C>) between the 5ariables. e. <alculate the coefficient of determination. f. +erform an analysis of variance on the regression, including an &test of the overall significance of the results. g. Suppose D,%>> applications&are received by ebruary #. /hat is the best estimate, based on the regression model, of the number of new students that will be, enrolled in the following SeptemberE Use SPSS for the entire analysis ]]]]]]]]]]]]]]]]]]]]