Vous êtes sur la page 1sur 5

Unit -2&4

Hypothesis Testing
Hypothesis testing begins with an assumption, called a Hypothesis that we make about a population parameter. A hypothesis in statistics is simply a quantitative statement about a population Procedure of Testing Hypothesis The procedure of testing hypotheses is briefly described below: 1. Set up a hypothesis The first thing in hypothesis testing is to set up a hypothesis about a population parameter. Then we collect sample data, produce sample statistics, and use this information to decide how likely it is that our hypothesized population parameter is correct. These hypotheses must be so constructed that if one hypothesis is accepted, the other- is rejected and vice versa. Null hypothesis The null hypothesis is a very useful tool in testing the Significance of difference. In its simplest form the hypothesis asserts that there is no real difference in the sample and the population in the particular matter under consideration. For example, if we want to find out whether extra coaching has benefited the students or not, we shall set up a null hypothesis that "extra coaching has not benefited the students". Similarly if we want to find out whether a particular drug is effective in curing malaria we will take the null hypothesis that "the drug is not effective in curing malaria". Alternative hypothesis As against the null hypothesis, the alternative hypothesis specifies those values that the researcher believes to hold true, and. of course he hopes that the sample data lead to acceptance of this hypothesis as true. The alternative hypothesis may embrace the whole range of values rather than single point. It is opposite to null hypothesis. The null and alternative hypotheses are distinguished by the use of two different symbols. Ho representing the null hypothesis and H1 the alternative hypothesis. 2. Set up a suitable significance level Having set up the hypothesis, the next step is to test the validity of Ho against that of Ha at as certain level of significance. The significance level is customarily expressed as a percentage such as 5 per cent, is the probability of rejecting the null hypothesis if it is true. When the hypothesis in question is accepted at the 5 per cent level, the statistician is running the risk that, in the long run, he will be making the wrong decision about 5 per cent of the time. By rejecting the hypothesis at the same level he runs the risk of rejecting a true hypothesis in 5 out of every 100 occasions.


Unit -2&4

3. Setting a test criterion The third step in hypotheses testing procedure is to construct a test criterion. This involves selecting an appropriate probability distribution for the particular test, that is, a probability distribution which can properly be applied. Some probability distributions that are commonly used in testing procedures are t, F and X2. 4. Doing computations Having taken the first three steps, we have completely designed a statistical test. These calculations include the testing statistic and the standard error of the testing statistic. 5. Making decisions Finally, as a fifth step, we may draw statistical conclusions and take' decisions. A statistical conclusion or statistical decision is a decision either to reject or to accept the null hypothesis. The decision will depend on whether the computed value of the test criterion falls in the region of rejection or the region of acceptance.
Two Types of Errors in Testing of Hypothesis

When a statistical hypothesis is tested there are four possibilities: 1. 'The hypothesis is true but our test rejects it. (Type I error) 2. The hypothesis is false but our test accepts it. (Type II error) 3. The hypothesis is true and our test accepts it. (Correct decision) 4. The hypothesis is false and our test rejects it. (Correct decision) Obviously, the first two possibilities lead to errors. In a statistical hypothesis testing experiment, a Type I error is committed by rejecting the null hypothesis when it is true. On the other hand a Type II error is committed by not rejecting (i.e., accepting) the null hypothesis when it is false.

The standard deviation of the sampling distribution is called the standard error.

When data are collected by sampling from a population, the most important objective of statistical analysis is to draw inferences about that population from the information embodied in the sample. Statistical estimation, or briefly estimation, is concerned with the methods by which population characteristics are estimated from sample information. It may be pointed out that the true value of a parameter is an unknown constant that can be correctly ascertained only by an exhaustive study of the population.


Unit -2&4

Types of estimates
1. Point Estimates A point estimate is a single number which is used as an estimate of the unknown population parameter. The estimator is a single point on the real number scale and thus the name point estimation

2. Interval Estimates

An interval estimate of a population parameter is a statement of two values between which it is estimated that the parameter lies. An interval estimate would always be specified by two values, i.e. the lower one and the upper one. In more technical terms, interval estimation refers to the estimation of a parameter by a random interval, called the Confidence interval. On comparing these two methods of estimation we find that point estimation has an advantage as much as it provides an exact value for the parameter under investigation. Properties of a Good Estimator A good estimators quality is to be evaluated in terms of the following properties (i) Unbiasedness An estimator is said to be unbiased if its expected value is identical with the population parameter being estimated. Many estimators are "Asymptomatically unbiased" in the sense that the biases reduce to practically insignificant values zero when n 'becomes sufficiently large.
(ii) Consistency. If an estimator, say approaches the parameter closer and closer as the sample size n increases. is said to be a consistent estimator of . Stating somewhat more rigorously, the estimator If is said to be a consistent estimator of if, as n approaches infinity, the probability approaches 1 that will differ from the parameter by not more than an arbitrary small constant

In case of large samples consistency is a desirable property for an estimator to possess. However in small samples, consistency is of little importance unless the limit of probability defining consistency is reached even with a relatively small size of the simple.

(iii) Efficiency. The

concept of efficiency refers to the sampling variability of an estimator. If two competing estimators are both unbiased the one with the smaller variance (for a given sample size) is said to be relatively more efficient. If the population is symmetrically distributed then both the sample mean and the sample median are consistent and unbiased estimators of .Yet the sample mean is better than the sample median as an estimator of . This claim is made in terms of efficiency.
(iv) Sufficiency

An estimator is said to be sufficient if it conveys as much information as is possible about the parameter which is contained in the sample. The significance of sufficiency lies in the fact idc

Unit -2&4 that if a sufficient estimator exists, it is absolutely unnecessary to consider any other estimator; a sufficient estimator ensures that all information a sample can furnish with respect to the estimation of a parameter is being utilized.

Correlation and regression analysis

Definitions of correlation analysis Correlation analysis deals with the association between two or more variables. Correlation is an analysis of the co variation between two or more variables.

Positive and Negative Correlation Whether correlation is positive (direct) or negative (inverse) would depend upon the direction of change of the variables. If both the variables are varying in the same direction i.e. If as one variable is increasing the other, on an average, is also increasing or, if as one variable is decreasing the other, on an average, is also decreasing, correlation is said to be positive. Ex. Height and weight, income and expenditure .If, on the other hand the variables are varying in opposite directions, i.e. As one variable is increasing the other is decreasing or vice versa. Correlation is said to be negative. Ex. Volume and pressure, price and demand etc Simple Partial and Multiple Correlations
The distinction between simple, partial and multiple correlations is based upon the number of variables studied. When only two variables arc studied it is a problem of simple correlation. When three or more variables are studied it is a problem of either multiple or partial correlation. In multiple correlations three or more variables are studied simultaneously. For example, when we study the relationship between the yield of rice per acre and both the amount of rainfall and the amount of fertilizers used. It is a problem of multiple correlations. On the other hand, in partial correlation we recognize more than two variables but consider only two variables to be influencing each other the effect of other influencing variables being kept constant. For example, in the rice problem taken above if we limit our correlation analysis of yield and rainfall to periods when a certain average daily temperature existed it becomes a problem relating to partial correlation only.

Linear and Non-Linear (Curvilinear) Correlation The distinction between linear and nonlinear correlation is based upon the constancy of the ratio of change between the variables. If the amount of change in one variable tends to bear constant ratio to the amount of change in the other variable then the correlation is said to be linear. Correlation would be called non-linear or curvilinear if the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable. Regression analysis Regression is the measure of the average relationship between two or more variables in terms of the original units of the data. The term 'regression analysis' refers to the methods by which estimates are made of the values of a variable from a knowledge of the values of one or more other variables and to the measurement of the errors involved in this estimation process."

Unit -2&4

Uses of regression analysis Regression analysis provides estimates of values of the dependent variable from values of the independent variable A second goal of regression analysis is to obtain a measure of the error involved in using the regression line as a basis for estimation. With the help of regression coefficients we can calculate the correlation coefficient. Coefficient of determination One very convenient and useful way of interpreting the value of coefficient of correlation between two variables is to use square of coefficient of correlation, which is called coefficient of determination. The coefficient of determination is defined as the ratio of the explained variance to the total variance.

Whereas coefficient of correlation is a measure of degree of co variability between X and Y, the objective of regression analysis is to study the 'nature of relationship' between the variables so that we may be able to predict the value of one on the basis of another. Correlation is merely a tool of ascertaining the degree of relationship between two variables and, therefore, we cannot say that one variable is the cause and other the effect. In regression analysis one variable is taken as dependent while the other as independent thus making it possible to study the cause and effect relationship. There may be nonsense correlation between two variables which is purely due to chance and has no practical relevance such as increase in income and increase in weight of a group of people. However, there is nothing like nonsense regression. Correlation coefficient is independent of change of scale and origin. Regression coefficients are independent of change of origin but not of scale.